Re[2]: Disk space used by optimize
Hello, Otis. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. OG Have you tried using the multifile index format? Now I wonder if there OG is actually a difference in disk space cosumed by optimize() when you OG use multifile and compound index format... OG Otis OG --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] OG - OG To unsubscribe, e-mail: [EMAIL PROTECTED] OG For additional commands, e-mail: OG [EMAIL PROTECTED] Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching with words that contain % , / and the like
Hi , Yes. Analyzer was the culprit behind eating away some of the letters in the search string . StandardAnalyser has 'a' and 's' as stop words (amongst others). Since i want to search on these (specifically , i want to search on words like a/s , e/p , 15% , 15' ..etc). so i commented the following lines in StandardAnalyser .. (filtering of Standard tokens and stop words) public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new StandardTokenizer(reader); //result = new StandardFilter(result); result = new LowerCaseFilter(result); //result = new StopFilter(result, stopSet); return result; now stop words are not getting filtered but / still goes off. so a/s is read as a s Regards Robin On Thu, 27 Jan 2005 02:50:13 -0600, Chris Lamprecht [EMAIL PROTECTED] wrote: Without looking at the source, my guess is that StandardAnalyzer (and StandardTokenizer) is the culprit. The StandardAnalyzer grammar (in StandardTokenizer.jj) is probably defined so x/y parses into two tokens, x and y. s is a default stopword (see StopAnalyzer.ENGLISH_STOP_WORDS), so it gets filtered out, while p does not. To get what you want, you can use a WhitespaceAnalyzer, write your own custom Analyzer or Tokenizer, or modify the StandardTokenizer.jj grammar to suit your needs. WhitespaceAnalyzer is much simpler than StandardAnalyzer, so you may see some other things being tokenized differently. -Chris On Thu, 27 Jan 2005 12:12:16 +0530, Robinson Raju [EMAIL PROTECTED] wrote: Hi , Is there a way to search for words that contain / or % . if my query is test/s , it is just taken as test if my query is test/p , it is just taken as test p has anyone done this / faced such an issue ? Regards Robin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Regards, Robin 9886394650 The merit of an action lies in finishing it to the end - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]