Re: Latest StopAnalyzer.java
On Jul 6, 2004, at 1:08 AM, Karthik N S wrote: Can SomeBody Tell me Where Can I find Latest copy of StopAnalyzer.java which can be used with Lucene1_4-final, On Lucene-Sandbox I am not able to Find it. [ My Company Prohibits me from using CVS ] http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/src/java/org/apache/ lucene/analysis/StopAnalyzer.java?rev=1.6view=auto All of Jakarta's CVS can be browsed this way. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Running OutOfMemory while optimizing and searching
Note that force is really just 'suggest'. Regardless, I have seen apps running under 1.3.1 JVM where this worked. Otis --- David Spencer [EMAIL PROTECTED] wrote: This in theory should not help, but anyway, just in case, the idea is to call gc() periodically to force gc - this is the code I use which tries to force it... public static long gc() { long bef = mem(); System.gc(); sleep( 100); System.runFinalization(); sleep( 100); System.gc(); long aft= mem(); return aft-bef; } Mark Florence wrote: Thanks, Jim. I'm pretty sure I'm throwing OOM for real, and not because I've run out of file handles. I can easily recreate the latter condition, and it is always reported accurately. I've also monitored the OOM as it occurs using top and I can see memory usage climbing until it is exhausted -- if you will excuse the pun! I'm not familiar with the new compound file format. Where can I look to find more information? -- Mark -Original Message- From: James Dunn [mailto:[EMAIL PROTECTED] Sent: Friday, July 02, 2004 01:29 pm To: Lucene Users List Subject: Re: Running OutOfMemory while optimizing and searching Ah yes, I don't think I made that clear enough. From Mark's original post, I believe he mentioned that he used seperate readers for each simultaneous query. His other issue was that he was getting an OOM during an optimize, even when he set the JVM heap to 2GB. He said his index was about 10.5GB spread over ~7000 files on Linux. My guess is that OOM might actually be a too many open files error. I have seen that type of error being reported by the JVM as an OutOfMemory error on Linux before. I had the same problem but once I switched to the new Lucene compound file format, I haven't had that problem since. Mark, have you tried switching to the compound file format? Jim --- Doug Cutting [EMAIL PROTECTED] wrote: What do your queries look like? The memory required for a query can be computed by the following equation: 1 Byte * Number of fields in your query * Number of docs in your index So if your query searches on all 50 fields of your 3.5 Million document index then each search would take about 175MB. If your 3-4 searches run concurrently then that's about 525MB to 700MB chewed up at once. That's not quite right. If you use the same IndexSearcher (or IndexReader) for all of the searches, then only 175MB are used. The arrays in question (the norms) are read-only and can be shared by all searches. In general, the amount of memory required is: 1 byte * Number of searchable fields in your index * Number of docs in your index plus 1k bytes * number of terms in query plus 1k bytes * number of phrase terms in query The latter are for i/o buffers. There are a few other things, but these are the major ones. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Hit Score
Hi Dev Guys Apologies. I have 3 Questions for u. 1) I have a situation in here where I am suppose to group unique indexerd Documents depending upon the number of hit's per document. To Breifly Explain this All documet with n hits for a Search word would be grouped under Catagory A and all document with hits n+1 for the same Search Word should be grouped under Catagory B. Can Lucene provide some means internally to handle this situation. 2) What is this weight /Boost factor avaliable for the hits ,and how to use this Effectively. 3) Is there any thing in Lucene Core which reveles the version numbering of current used jar files something like on command prompt Java -version displaying the version. with regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 06, 2004 4:22 PM To: Lucene Users List Subject: Re: Latest StopAnalyzer.java On Jul 6, 2004, at 2:53 AM, Morus Walter wrote: Karthik N S writes: Can SomeBody Tell me Where Can I find Latest copy of StopAnalyzer.java which can be used with Lucene1_4-final, On Lucene-Sandbox I am not able to Find it. [ My Company Prohibits me from using CVS ] There is no lucene 1.4 final but org.apache.lucene.analysis.StopAnalyzer is part of the lucene core. Actually Doug did create Lucene 1.4 final: http://jakarta.apache.org/lucene/docs/index.html I'll try to squeeze in some time today to make it more official by ensuring the binaries are mirrored and such. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
upgrade from Lucene 1.3 final to 1.4rc3 problem
Hi! I'm using Lucene 1.3 final currently, all things were working fine. But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it) We can re-compile it successfuly. but when will try to index the document. It give the error as below: java.lang.NullPointerException at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146) at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173) Which wrong? Pls help. Thanks. Regards, Alex
Upgrade from Lucene 1.3 final to 1.4 problem
Hey Apologies Same with me tooo... The no of Hits on set of Documents indexed using 1.3-final is not same on 1.4-final version [ The only modification done to the src is , I have upgraded my CustomAnalyzer on basis of StopAnalyzer avaliable in 1.4 ] Does doing this effect the performance. Some body please explain. with regards Karthik -Original Message- From: Alex Aw Seat Kiong [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 9:50 AM To: Lucene Users List Subject: upgrade from Lucene 1.3 final to 1.4rc3 problem Hi! I'm using Lucene 1.3 final currently, all things were working fine. But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it) We can re-compile it successfuly. but when will try to index the document. It give the error as below: java.lang.NullPointerException at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146) at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173) Which wrong? Pls help. Thanks. Regards, Alex - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Most efficient way to index 14M documents (out of memory/file handles)
I'm trying to burn an index of 14M documents. I have two problems. 1. I have to run optimize() every 50k documents or I run out of file handles. this takes TIME and of course is linear to the size of the index so it just gets slower by the time I complete. It starts to crawl at about 3M documents. 2. I eventually will run out of memory in this configuration. I KNOW this has been covered before but for the life of me I can't find it in the archives, the FAQ or the wiki. I'm using an IndexWriter with a mergeFactor of 5k and then optimizing every 50k documents. Does it make sense to just create a new IndexWriter for every 50k docs and then do one big optimize() at the end? Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]