Re[2]: Disk space used by optimize

2005-01-30 Thread Yura Smolsky
Hello, Otis.

There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.

Now I do use non compound file format. It needs like twice as much
disk space.

OG Have you tried using the multifile index format?  Now I wonder if there
OG is actually a difference in disk space cosumed by optimize() when you
OG use multifile and compound index format...

OG Otis

OG --- Kauler, Leto S [EMAIL PROTECTED] wrote:

 Our copy of LIA is in the mail ;)
 
 Yes the final three files are: the .cfs (46.8MB), deletable (4
 bytes),
 and segments (29 bytes).
 
 --Leto
 
 
 
  -Original Message-
  From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
  
  Hello,
  
  Yes, that is how optimize works - copies all existing index 
  segments into one unified index segment, thus optimizing it.
  
  see hit #1:
 http://www.lucenebook.com/search?query=optimize+disk+space
  
  However, three times the space sounds a bit too much, or I 
  make a mistake in the book. :)
  
  You said you end up with 3 files - .cfs is one of them, right?
  
  Otis
  
  
  --- Kauler, Leto S [EMAIL PROTECTED] wrote:
  
   
   Just a quick question:  after writing an index and then calling
   optimize(), is it normal for the index to expand to about 
  three times 
   the size before finally compressing?
   
   In our case the optimise grinds the disk, expanding the index
 into 
   many files of about 145MB total, before compressing down to three
 
   files of about 47MB total.  That must be a lot of disk activity
 for 
   the people with multi-gigabyte indexes!
   
   Regards,
   Leto
 
 CONFIDENTIALITY NOTICE AND DISCLAIMER
 
 Information in this transmission is intended only for the person(s)
 to whom it is addressed and may contain privileged and/or
 confidential information. If you are not the intended recipient, any
 disclosure, copying or dissemination of the information is
 unauthorised and you should delete/destroy all copies and notify the
 sender. No liability is accepted for any unauthorised use of the
 information contained in this transmission.
 
 This disclaimer has been automatically added.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 


OG -
OG To unsubscribe, e-mail: [EMAIL PROTECTED]
OG For additional commands, e-mail:
OG [EMAIL PROTECTED]


Yura Smolsky,




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching with words that contain % , / and the like

2005-01-30 Thread Robinson Raju
Hi ,
  Yes. Analyzer was the culprit behind eating away some of the letters
in the search string . StandardAnalyser has 'a' and 's' as stop words
(amongst others).
Since i want to search on these (specifically , i want to search on
words like a/s , e/p , 15% ,  15'  ..etc). so i commented the
following lines in StandardAnalyser .. (filtering of Standard tokens
and stop words)

  public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
//result = new StandardFilter(result);
result = new LowerCaseFilter(result);
//result = new StopFilter(result, stopSet);
return result; 

now stop words are not getting filtered but / still goes off.
so a/s is read as a s

Regards
Robin

On Thu, 27 Jan 2005 02:50:13 -0600, Chris Lamprecht
[EMAIL PROTECTED] wrote:
 Without looking at the source, my guess is that StandardAnalyzer (and
 StandardTokenizer) is the culprit.  The StandardAnalyzer grammar (in
 StandardTokenizer.jj) is probably defined so x/y parses into two
 tokens, x and y.  s is a default stopword (see
 StopAnalyzer.ENGLISH_STOP_WORDS), so it gets filtered out, while p
 does not.
 
 To get what you want, you can use a WhitespaceAnalyzer, write your own
 custom Analyzer or Tokenizer, or modify the StandardTokenizer.jj
 grammar to suit your needs.  WhitespaceAnalyzer is much simpler than
 StandardAnalyzer, so you may see some other things being tokenized
 differently.
 
 -Chris
 
 On Thu, 27 Jan 2005 12:12:16 +0530, Robinson Raju
 [EMAIL PROTECTED] wrote:
  Hi ,
 
  Is there a way to search for words that contain / or % .
  if my query is test/s , it is just taken as test
  if my query is test/p , it is just taken as test p
  has anyone done this / faced such an issue ?
 
  Regards
  Robin
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 
Regards,
Robin
9886394650
The merit of an action lies in finishing it to the end

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]