Re[2]: Is IndexSearcher thread safe?
Hello, Volodymyr. VB Additional question. VB If I'm sharing one instance of IndexSearcher between different threads VB Is it good to just to drop this instance to GC. VB Because I don't know if some thread is still using this searcher or done VB with it. It is safe to share one instance between many threads and it should be safe to drop old object to GC. But I have discovered one strange fact. When you have indexSearcher on big index, so IndexSearcher object takes a lot of memory (900Mb) and when you create new IndexSearcher after deletion of all references to old IndexSearcher then memory consumed my old IndexSearcher will not be ever freed. What can community answer on this strange fact? Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
IndexSearch and IndexWriter on 2 CPU's
Hello. I have Dual CPU's box with RH Linux. I run two processes on this box. 1. IndexWriter which adds new documents into index constantly 24/7/365 :) 2. IndexSearcher, which perform searchers from this index. Sometimes writer begins to merge index (this caused by mergeFactor and structure of Lucene index) inside addDocument method. And if merge begins then my writer process takes both CPU's time (180-200% totally). Actually most time time goes to IO operations. When merge operation begins then all searches performed by IndexSearcher on this computer are very-very slowed down b/c all CPU time is under first process. How can I give second process more CPU time or how can I reduce IO time of first process? Maybe I can tweak something about index configuration. I have set writer.mergeFactor = 2 writer.minMergeDocs = 2500 Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: sorted search
Hello, Erik. if i need to store hour and minute then I need to place date into following integer format: MMDDHHII ? Will it be faster than current solution? And will I have ability to do Ranged queries (from Date A to Date B)? EH Sorting by String uses up lots more RAM than a numeric sort. If you EH use a numeric (yet lexicographically orderable) date format (e.g. EH MMDD) you'll see better performance most likely. EH Erik EH On Feb 24, 2005, at 1:01 PM, Yura Smolsky wrote: Hello, lucene-user. I have index with many documents, more than 40 Mil. Each document has DateField (It is time stamp of document) I need the most recent results only. I use single instance of IndexSearcher. When I perform sorted search on this index: Sort sort = new Sort(); sort.setSort( new SortField[] { new SortField (modified, SortField.STRING, true) } ); Hits hits = searcher.search(QueryParser.parse(good, content, StandardAnalyzer()), sort); then search speed is not good. Today I have tried search without sort by modified, but with sort by Relevance. Speed was much better! I think that Sort by DateField is very slow. Maybe I do something wrong about this kind of sorted search? Can you give me advices about this? Thanks. Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] EH - EH To unsubscribe, e-mail: [EMAIL PROTECTED] EH For additional commands, e-mail: EH [EMAIL PROTECTED] Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: sorted search
Hello, Erik. about memory usage... DateField takes string of 9 bytes in memory ('000ic64p7') How much memory will be taken by this string? How much memory will be taken by integer? EH Sorting by String uses up lots more RAM than a numeric sort. If you EH use a numeric (yet lexicographically orderable) date format (e.g. EH MMDD) you'll see better performance most likely. EH Erik EH On Feb 24, 2005, at 1:01 PM, Yura Smolsky wrote: Hello, lucene-user. I have index with many documents, more than 40 Mil. Each document has DateField (It is time stamp of document) I need the most recent results only. I use single instance of IndexSearcher. When I perform sorted search on this index: Sort sort = new Sort(); sort.setSort( new SortField[] { new SortField (modified, SortField.STRING, true) } ); Hits hits = searcher.search(QueryParser.parse(good, content, StandardAnalyzer()), sort); then search speed is not good. Today I have tried search without sort by modified, but with sort by Relevance. Speed was much better! I think that Sort by DateField is very slow. Maybe I do something wrong about this kind of sorted search? Can you give me advices about this? Thanks. Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] EH - EH To unsubscribe, e-mail: [EMAIL PROTECTED] EH For additional commands, e-mail: EH [EMAIL PROTECTED] Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: Search Performance
Hello, Michael. btw, you can recreate IndexSeacher every 5|10|30|60|X minutes MC My index is changing in real time constantly... in this case I guess this MC will not work for me any suggestions... MC Michael MC -Original Message- MC From: David Townsend [mailto:[EMAIL PROTECTED] MC Sent: Friday, February 18, 2005 11:50 AM MC To: Lucene Users List MC Subject: RE: Search Performance MC IndexSearchers are thread safe, so you can use the same object on multiple MC requests. If the index is static and not constantly updating, just keep one MC IndexSearcher for the life of the app. If the index changes and you need MC that instantly reflected in the results, you need to check if the index has MC changed, if it has create a new cached IndexSearcher. To check for changes MC use you'll need to monitor the version number of the index obtained via MC IndexReader.getCurrentVersion(Index Name) MC David MC -Original Message- MC From: Stefan Groschupf [mailto:[EMAIL PROTECTED] MC Sent: 18 February 2005 16:15 MC To: Lucene Users List MC Subject: Re: Search Performance MC Try a singleton pattern or an static field. MC Stefan MC Michael Celona wrote: I am creating new IndexSearchers... how do I cache my IndexSearcher... Michael -Original Message- From: David Townsend [mailto:[EMAIL PROTECTED] Sent: Friday, February 18, 2005 11:00 AM To: Lucene Users List Subject: RE: Search Performance Are you creating new IndexSearchers or IndexReaders on each search? MC Caching your IndexSearchers has a dramatic effect on speed. David Townsend -Original Message- From: Michael Celona [mailto:[EMAIL PROTECTED] Sent: 18 February 2005 15:55 To: Lucene Users List Subject: Search Performance What is single handedly the best way to improve search performance? I have an index in the 2G range stored on the local file system of the searcher. Under a load test of 5 simultaneous users my average search time is ~4700 ms. Under a load test of 10 simultaneous users my average search time is ~1 ms.I have given the JVM 2G of memory and am a using a dual 3GHz Zeons. Any ideas? Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] MC - MC To unsubscribe, e-mail: [EMAIL PROTECTED] MC For additional commands, e-mail: MC [EMAIL PROTECTED] MC - MC To unsubscribe, e-mail: [EMAIL PROTECTED] MC For additional commands, e-mail: MC [EMAIL PROTECTED] MC - MC To unsubscribe, e-mail: [EMAIL PROTECTED] MC For additional commands, e-mail: MC [EMAIL PROTECTED] Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
big index and multi threaded IndexSearcher
Hello. I use PyLucene, python port of Lucene. I have problem about using big index (50Gb) with IndexSearcher from many threads. I use IndexSearcher from PyLucene's PythonThread. It's really a wrapper around a Java/libgcj thread that python is tricked into thinking it's one of its own. The core of problem: When I have many threads (more than 5) I receive this exception: File /usr/lib/python2.4/site-packages/PyLucene.py, line 2241, in search def search(*args): return _PyLucene.Searcher_search(*args) ValueError: java.lang.OutOfMemoryError No stacktrace available When I decrease number of threads to 3 or even 1 then search works. How do many threads can affect to this exception?.. I have 2 Gb of memory. So with one thread the process takes like 1200-1300Mb. Andi Vajda suggested that There may be overhead involved in having multiple threads against a given index. Does anyone here have experience in handling big indexes with many threads? Any ideas are appreciated. Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: big index and multi threaded IndexSearcher
Hello, PA. Does anyone here have experience in handling big indexes with many threads? P What about turning the problem around and spitting your index in P several chunks? Then you could search those (smaller) indices in P parallel and consolidate the final result, no? Well, I have not 6 CPU in one box :) Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: big index and multi threaded IndexSearcher
Hello, Erik. EH Are you using multiple IndexSearcher instances?Or only one and EH sharing it across multiple threads? EH If using a single shared IndexSearcher instance doesn't help, it may be EH beneficial to port your code to Java and try it there. I have single instance of IndexSearcher and I pass reference of it to each thread. I will port code to Java if no other ideas will come my mind... EH On Feb 16, 2005, at 3:04 PM, Yura Smolsky wrote: Hello. I use PyLucene, python port of Lucene. I have problem about using big index (50Gb) with IndexSearcher from many threads. I use IndexSearcher from PyLucene's PythonThread. It's really a wrapper around a Java/libgcj thread that python is tricked into thinking it's one of its own. The core of problem: When I have many threads (more than 5) I receive this exception: File /usr/lib/python2.4/site-packages/PyLucene.py, line 2241, in search def search(*args): return _PyLucene.Searcher_search(*args) ValueError: java.lang.OutOfMemoryError No stacktrace available When I decrease number of threads to 3 or even 1 then search works. How do many threads can affect to this exception?.. I have 2 Gb of memory. So with one thread the process takes like 1200-1300Mb. Andi Vajda suggested that There may be overhead involved in having multiple threads against a given index. Does anyone here have experience in handling big indexes with many threads? Yura Smolsky. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Highlighter: how to specify text from external source?
Hello, lucene-user. If I do not store text fields in the index, is there a way to specify values for Highlighter from external source and how? Thanks in advance. Yura Smolsky - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
ParallelMultiSearcher and many RemoteSearchers
Hello, lucene-user. Does anyone have idea will ParallelMultiSearcher and many RemoteSearchers be a way to get fast search on distributed index on many servers. For example I have 5 servers with indexes of 50Gb on each server. Indexes are updated interactively. I want to run on 6th server ParallelMultiSearcher which will be connected to other 5 server through RemoteSearcher. Does it okay to go with RemoteSearcher class based on RMI in this case?.. I am concerned about reponse time and speed of the system... Yura Smolsky - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: Disk space used by optimize
Hello, Doug. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. DC Perhaps we should add something to the javadocs noting this? Sure. I was a bit confused about optimizing compound file format b/c I had not info about space usage when optimizing. More info in the javadocs will save somebody's time :) Yura Smolsky - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: Disk space used by optimize
Hello, Otis. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. OG Have you tried using the multifile index format? Now I wonder if there OG is actually a difference in disk space cosumed by optimize() when you OG use multifile and compound index format... OG Otis OG --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] OG - OG To unsubscribe, e-mail: [EMAIL PROTECTED] OG For additional commands, e-mail: OG [EMAIL PROTECTED] Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
IndexWriter.addIndexes()
Hello, lucene-user. Is there a way to do index merge without optimization?.. Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: RemoteSearcher
Hello, Otis. Interesting. Nutch doesnt use RemoteSearchable b/b RemoteSearchable is not very useful? I mean does it suitable for distibuting index process in parallel on many services or not? Will it give us good performance. We have RemoteSearchable in the sources, but anyone does not use it. :) I ask this question, b/c I use PyLucene (very good port in Python) and I need to realize a lot of things about implementation of RemoteSearchable in omniORBpy (CORBA). I have big index (3,000,000 docs) and many fields. I have noticed, that search becomes slower. I want to distribute index on many servers. Does RemoteSearchable worse of it? BTW, Is there working demo of nutch with big index? OG Nutch (nutch.org) has a pretty sophisticated infrastructure for OG distributed searching, but it doesn't use RemoteSearcher. Does anyone know application which based on RemoteSearcher to distribute index on many servers? Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RemoteSearcher
Hello. Does anyone know application which based on RemoteSearcher to distribute index on many servers? Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
InderWriter.optimize()
Hello, lucene-user. I used FSDirectory as storage for index. And I have used optimize() method of IndexWriter to optimize index for faster access. Now I use DbDirectory (Berkley DB) as storage. Does it make sense to use optimize method on index stored in this storage?.. What does optimize do actually? Yura Smolsky - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]