RE: Lucene Book in UK
You could buy the ebook from Manning.. It only costs $22,50 and that makes 16 euro :) I have bought it there.. 2 minutes later I was reading it. -Oorspronkelijk bericht- Van: David Townsend [mailto:[EMAIL PROTECTED] Verzonden: donderdag 6 januari 2005 19:24 Aan: Lucene Users List (E-mail) Onderwerp: Lucene Book in UK Sorry if this is the wrong forum but I wondered what's happened to 'Lucene In Action' in the UK. Looking forward to reading it but amazon.co.uk report it as a 'hard to find' item and are now quoting a 4-6 week delivery time and tacking on a rare book charge. Amazon.com are quoting shipping in 24hrs. Is this a new 'Boston Tea Party'? cheers David - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
questions
Hi I am a newbie and i just installed Tomcat on my machine. May I know, when i placed the Luceneweb folder in the webapps folder of Tomcat, how come I couldn't conduct the search operation when i test the website? Did I missed out anything? It prompts me that there is no c:\opt\index\segment folder... I created but i still couldnt get Lucene to work... At http://jakarta.apache.org/lucene/docs/demo.html: under the Indexing file instruction where should I do the following "type "java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". "??? Is it a must to install ant? Please kindly help!!! Thanks very much in advance regards, jac - Do you Yahoo!? The all-new My Yahoo! What will yours do?
Re: RemoteSearcher
Nutch (nutch.org) has a pretty sophisticated infrastructure for distributed searching, but it doesn't use RemoteSearcher. Otis --- Yura Smolsky <[EMAIL PROTECTED]> wrote: > Hello. > > Does anyone know application which based on RemoteSearcher to > distribute index on many servers? > > Yura Smolsky, > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Book in UK
The book is $44.95 USD - it's printed on the back cover. Amazon had the correct price (minus their discount) until recently. They are just very slow with their site/book info updates, but I'm sure they'll fix it eventually. Otis --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Jan 6, 2005, at 3:49 PM, Chris Hostetter wrote: > > B&N agrees that the list price is $60.95 ... which may be what > Manning > > is > > citing to resellers. > > This is incorrect information that has somehow gotten out. Amazon > and > B&N are slow to update their information, but Manning assures me that > > they have provided the correct information to Amazon to update. The > actual price you're paying is certainly not indicative of a $60.95 > list > price - Amazon doesn't discount 50%, I'm sure. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: reading fields selectively
Hi John, There is no API for this, but I recall somebody talking about adding support for this a few months back. I even think that somebody might have contributed a patch for this. I am not certain about this, but check the patch queue (link on Lucene site). If there is a patch there, even if the patch no longer applies cleanly, you'll be able to borrow the code for your own patch. Also note that the CVS version has support for field compression, which should help with performance if you are working with large fields. Otis --- John Wang <[EMAIL PROTECTED]> wrote: > Hi: > >Is there some way to read only 1 field value from an index given a > docID? > >From the current API, in order to get a field from given a docID, > I > would call: > > IndexSearcher.document(docID) > > which in turn reads in all fields from the disk. > >Here is my problem: > >After the search, I have a set of docIDs. For each > document, I have a unique string identifier. At this point I only > need > these identifiers but with the above API, I am forced to read the > entire row of fields for each document in the search result, which in > my case can be very large. > >Is there an alternative? > > I am thinking more on the lines of a call: > >Field[] getFields(int docID,String fieldName); > > Thanks > > -John > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
reading fields selectively
Hi: Is there some way to read only 1 field value from an index given a docID? From the current API, in order to get a field from given a docID, I would call: IndexSearcher.document(docID) which in turn reads in all fields from the disk. Here is my problem: After the search, I have a set of docIDs. For each document, I have a unique string identifier. At this point I only need these identifiers but with the above API, I am forced to read the entire row of fields for each document in the search result, which in my case can be very large. Is there an alternative? I am thinking more on the lines of a call: Field[] getFields(int docID,String fieldName); Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problems...
On Jan 6, 2005, at 6:23 PM, Ross Rankin wrote: Could you explain this piece further, Erik "BooleanQuery and AND in TermQuery for resellerId" Your code did a textual concatenation (and I'm paraphrasing as I don't have your previous e-mail handy) of "id:" + resellerId. And then it parsed the expression. This is not necessarily a problem, though I red flag it because of what QueryParser and Analyzers can do with that resellerId. Regardless of how you indexed the reseller id field, an analyzer will process it when using QueryParser on it. If that id is completely numeric, some analyzers will toss it, others may leave it alone. If it has alpha characters in it, they may be lowercased. In other words there are lots of variables. This can be avoided by doing this: TermQuery tq = new TermQuery(new Term("id", resellerId)); Query query = QueryParser.parse(/* the main expression */) BooleanQuery bq = new BooleanQuery(); bq.add(tq, true, false); bq.add(query, true, false); Now use bq as the query passed to search(). Make sense? I would love to improve the code of this piece and understand the engine more. Like for example, if something is indexed, it will be found in the search but what about something that is just in the document and not indexed? If the field is not indexed (but just stored), you cannot search on it. I don't know the difference in Stored, Tokenized, Indexed, and Vector and where I would do what... Is there info on that piece on the web somewhere? Stored = as-is value stored in the Lucene index Tokenized = field is analyzed using the specified Analyzer - the tokens emitted are indexed Indexed = the text (either as-is with keyword fields, or the tokens from tokenized fields) is made searchable (aka inverted) Vectored = term frequency is stored in the index in an easily retrievable fashion. Like I have a large (6000 chars) text field I would like to add to the document, it's HTML. I am guessing first it would need to be parsed then added? But added and indexed? The field contains product specs and product compatibility (most in a table form). You definitely want to parse the HTML file (using NekoHTML, perhaps) and extract the text into fields. Maybe the and should be separated, for example. And yes, you would want these fields indexed since you want to search on them, I presume. Stored fields, but not indexed, fields are for metadata you want carried along with search results (like the primary key to a database row, or a filename) that you'd use to display the results but is not needed for searching. Sorry for the newbie questions but I am not finding Google very chock full of Lucene info... Have I got a book to sell you! :) http://www.lucenebook.com Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing flat files with out .txt extension
On Jan 6, 2005, at 6:49 PM, Hetan Shah wrote: Hi Erik, I got the source downloaded and unpacked. I am having difficulty in building and of the modules. Maybe something's wrong with my Ant installation. LuceneInAction% ant test Buildfile: build.xml BUILD FAILED file:/home/hs152827/LuceneInAction/build.xml:12: Unexpected element "available" The good ol' README says this: R E Q U I R E M E N T S --- * JDK 1.4+ * Ant 1.6+ (to run the automated examples) * JUnit 3.8.1+ - junit.jar should be in ANT_HOME/lib You are not running Ant 1.6, I'm sure. Upgrade your version of Ant, and of course follow the rest of the README and all should be well. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing flat files with out .txt extension
Hi Erik, I got the source downloaded and unpacked. I am having difficulty in building and of the modules. Maybe something's wrong with my Ant installation. LuceneInAction% ant test Buildfile: build.xml BUILD FAILED file:/home/hs152827/LuceneInAction/build.xml:12: Unexpected element "available" Total time: 5 seconds LuceneInAction% ant Indexer Buildfile: build.xml BUILD FAILED file:/home/hs152827/LuceneInAction/build.xml:12: Unexpected element "available" Total time: 5 seconds ** Can you point me to proper module for creating my own indexer? I tried looking into the indexing module but was not sure. TIA, -H Erik Hatcher wrote: On Jan 5, 2005, at 6:31 PM, Hetan Shah wrote: How can one index simple text files with out the .txt extension. I am trying to use the IndexFiles and IndexHTML but not to my satisfaction. In the IndexFiles I do not get any control over the content of the file and in case of IndexHTML the files with out any extension do not get index all together. Any pointers are really appreciated. Try out the Indexer code from Lucene in Action. You can download it from the link here: http://www.lucenebook.com/blog/announcements/sourcecode.html It'll be cleaner to follow and borrow from. The code that ships with Lucene is for demonstration purposes. It surprises me how often folks use that code to build real indexes. It's quite straightforward to create your own Java code to do the indexing in whatever manner you like, borrowing from examples. When you get the download unpacked, simply run "ant Indexer" to see it in action. And then "ant Searcher" to search the index just built. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Multi-threading problem: couldn't delete segments
We are having a problem with Lucene in a high concurrency create/delete/search situation. I thought I fixed all these problems, but I guess not. Here's what's happening. We are conducting load testing on our application. On a Windows 2000 server using lucene-1.3-final with compound file enabled, a worker thread is creating new Documents as it ingests content. Meanwhile, a test script is going that is hitting the search part of our application (I think the script also updates and deletes Documents, but I am not sure. My colleague who wrote it has left for the day so I can't ask him.). The scripted test passes with 1, 5, and 10 users hitting the application. At 20 users, we get this exception: [Task Worker1] ERROR com.ancept.ams.search.lucene.LuceneIndexer - Caught exception closing IndexReader in finally block java.io.IOException: couldn't delete segments at org.apache.lucene.store.FSDirectory.renameFile(FSDirectory.java:236) at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java(Compiled Code)) at org.apache.lucene.index.SegmentReader$1.doBody(SegmentReader.java:179 ) at org.apache.lucene.store.Lock$With.run(Lock.java:148) at org.apache.lucene.index.SegmentReader.doClose(SegmentReader.java(Comp iled Code)) at org.apache.lucene.index.IndexReader.close(IndexReader.java(Inlined Co mpiled Code)) at org.apache.lucene.index.SegmentsReader.doClose(SegmentsReader.java(Co mpiled Code)) at org.apache.lucene.index.IndexReader.close(IndexReader.java(Compiled C ode)) at com.ancept.ams.search.lucene.LuceneIndexer.delete(LuceneIndexer.java: 266) All write access to the index is controlled in that LuceneIndexer class by synchronizing on a static lock object. Searching is handled in another part of the code, which creates new IndexSearchers as necessary when the index changes. I do not rely on finalization to clean up these searchers because we found it to be unreliable. I keep track of threads using each searcher and then close it when that number drops to 0 if the searcher is outdated. My problem seems similar to what Robert Leftwich asked about on this mailing list in January 2001. Google Cache: http://64.233.179.104/search?q=cache:1D4h1vSh5AQJ:www.geocrawler.com/mail/msg.php3%3Fmsg_id%3D5020057++lucene+multithreading+problems+site:geocrawler.com&hl=en Doug Cutting replied to him saying that he should synchronize calls to IndexReader.open() and IndexReader.close(): Google Cache: http://64.233.179.104/search?q=cache:arztiytQ42QJ:www.geocrawler.com/archives/3/2624/2001/1/0/5020870/++lucene+multithreading+problems+site:geocrawler.com&hl=en Robert Leftwich then found a problem with his code and eliminated a second IndexReader that was messing stuff up: Google Cache: http://64.233.179.104/search?q=cache:jSIsi6t9KH8J:www.geocrawler.com/mail/msg.php3%3Fmsg_id%3D5037517++lucene+multithreading+problems+site:geocrawler.com&hl=en However, there are differences between Leftwich's design and mine, and besides, that thread is four years old. (Are there even exisiting archives for lucene-user throughout 2001 anywhere?) So any advice would be appreciated. Do I need to synchronize _all_ IndexReader.open() and IndexReader.close() calls? Or is it more likely that I'm missing something in my class that modifies the index? The code is attached. Thank you, Luke Francl // $Id: LuceneIndexer.java 20473 2004-10-19 17:20:10Z lfrancl $ package com.ancept.ams.search.lucene; import com.ancept.ams.asset.AssetUtils; import com.ancept.ams.asset.AttributeValue; import com.ancept.ams.asset.IAsset; import com.ancept.ams.asset.IAssetIdentifier; import com.ancept.ams.asset.IAssetList; import com.ancept.ams.asset.ITimeMetadataAsset; import com.ancept.ams.asset.IVideoAssetView; import com.ancept.ams.controller.RelayFactory; import com.ancept.ams.enums.AttributeNamespace; import com.ancept.ams.enums.AttributeType; import com.ancept.ams.enums.TimeMetadataType; import com.ancept.ams.relay.IAssetRelay; import com.ancept.ams.search.Indexer; import com.ancept.ams.search.Fields; import com.ancept.ams.util.SystemConfig; import com.ancept.ams.util.PerformanceMonitor; import org.apache.log4j.Logger; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.snowball.SnowballAnalyzer; import org.apache.lucene.analysis.StopAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import java.io.File; import java.io.IOException; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Date; import java.util.Iterator; import java.util.List; /** * Controls access to the Lucene index. * * @author Luke Francl **/ public final class LuceneIndexer implements Indexer { private static final Logger l4j = Logger.getLogger( LuceneIndexer
RE: Problems...
: Hoss, could you tell me what to exceptions I'm missing? Thanks! anytime you have a "catch" block, you should be doing something with that exception. If possible, you can recover from an exception, but no matter what you should log the exception in some way so that you know it happened. Your code has two places where it was catching an exception and doing absolutely nothing at all -- allowing processing to continue without even a warning. there was also an area of your code where if you encountered a parse exception from the user input, you invented your own query instead -- again without any sort of logging to let you know waht was happening in the code. building your own query when the users query is giberish isn't neccessarily bad, but logging is your friend. it wasn't clear from the descirption of your problem what you were trying to query for so it was very possible that there was a problem parsing your query, and it was doing the "default" search in that catch block and giving you back zero results ... hence my question about the SYstem.out.println calls that *were* in your code. logging is (again) your friend. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Problems...
Thanks for the responses... It took a bit of time but I'm learning more and more every day on this. To answer Hoss's first question here's the properties for the engine: lucene.path.to.index=/home/httpd/htdocs/index lucene.time.interval=15000 lucene.paramOffset = 0 Hoss, could you tell me what to exceptions I'm missing? Thanks! I figured out my issue, with a lot of help from Luke. (Thanks to the other Luke) The document I was creating for Lucene to index was missing data due to a size issue in with the database records. So Lucene was doing its job there data wasn't there in the index. Took a while to figure out why the document was missing the data, didn't dawn on me that the size and number of the database records would be the issue, but it really was the only thing that changed. Could you explain this piece further, Erik "BooleanQuery and AND in TermQuery for resellerId" I would love to improve the code of this piece and understand the engine more. Like for example, if something is indexed, it will be found in the search but what about something that is just in the document and not indexed? I don't know the difference in Stored, Tokenized, Indexed, and Vector and where I would do what... Is there info on that piece on the web somewhere? Like I have a large (6000 chars) text field I would like to add to the document, it's HTML. I am guessing first it would need to be parsed then added? But added and indexed? The field contains product specs and product compatibility (most in a table form). Sorry for the newbie questions but I am not finding Google very chock full of Lucene info... Ross -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Hostetter Sent: Tuesday, January 04, 2005 6:48 PM To: Lucene Users List Subject: Re: Problems... To start with, there has to be more to the "search" side of things then what you included. this search function is not static, which means it's getting called on an object, which obviously has some internal state (paramOffset, hits, and pathToIndex are a few that jump out at me) what are the values of those variables when this method gets called? second, there are at least two places in your code where potential exceptions get thrown away and execution continues. as a matter of good practice, you should add logging to these spots to make sure you aren't ignoring errors... third, you said " I'm not getting anything in the log that I can point to that says what is not working," but what about what is/isn't in the log? there are several System.out.println calls in this code ... I'm assuming you're logging STDOUT, what do those messages (with variables) say? what is the value of currentOffset on the initial search? what does the query.toString look like? how many total hits are being found when the search is executed? (or is that line not getting logged because the search is getting skipped becuase of some initial state in paramOffset?) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Book in UK
On Jan 6, 2005, at 3:49 PM, Chris Hostetter wrote: B&N agrees that the list price is $60.95 ... which may be what Manning is citing to resellers. This is incorrect information that has somehow gotten out. Amazon and B&N are slow to update their information, but Manning assures me that they have provided the correct information to Amazon to update. The actual price you're paying is certainly not indicative of a $60.95 list price - Amazon doesn't discount 50%, I'm sure. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: multi-threaded thru-put in lucene
John Wang wrote: Is the operation IndexSearcher.search I/O or CPU bound if I am doing 100's of searches on the same query? CPU bound. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene Book in UK
: I ordered my from Amazon a while back and was notified yesterday that it : shipped. Here was my price: really??? .. those bastards. I ordered two copies for my work on December 10th and they still haven't shipped them. : 1Lucene In Action (In Action) $27.17 1 $27.17 Hmm, they only charged me $26.37 each ... but Amazon has been known to experiment with pricepoints. (On my browser, they're currently showing a discounted price as 38.40). I can tell you that on December 10th, Amazon's "List" price was roughly the same as Manning, hence I was about to order from Manning and get the free ebook, when i realized I was looking at the "List price" and not the "Amazon Price". With Amazon's free shipping it was cheaper to buy two the two paper copies from amazon *and* give Manning the $22 for the ebook. : Does anyone know why Amazon.com lists the list price for Lucene in : Action as $60.95? Bookpool.com has the list price as $44.95, which is : the price that Manning is charging. After discounting, bookpool.com has : it on sale for $27.50. B&N agrees that the list price is $60.95 ... which may be what Manning is citing to resellers. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
> > Is this workable for you, Bill? > > No, it doesn't appear to work for me. Whoops! I was testing the wrong jar file. Yes, it *does* appear to work for me. I'll put this in my production code. Thanks again, Erik. Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
Erik, > Is this workable for you, Bill? No, it doesn't appear to work for me. I modified my class to add the extra method, as you suggested. I just forwarded the method to the existing one, as seen below: protected Query getFieldQuery (String field, Analyzer a, String queryText, int slop) throws ParseException { return getFieldQuery(field, a, queryText); } protected Query getFieldQuery (String field, Analyzer a, String queryText) throws ParseException { ... } It's still not getting called. My query string is of the form: name:"Bill Janssen" which is a little different from the one you were testing with. It does work OK (with both versions of Lucene) on simple queries like the one you tested with. My guess is that somewhere between 1.4.1 and 1.4.3, someone decided that FieldQueries and PhraseQueries should be handled differently. Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene Book in UK
I ordered my from Amazon a while back and was notified yesterday that it shipped. Here was my price: The following items were included in this shipment: - Qty Item Price Shipped Subtotal - 1Lucene In Action (In Action) $27.17 1 $27.17 - Item Subtotal: $27.17 Shipping & Handling: $3.99 Super Saver Discount: -$3.99 Total: $27.17 -Original Message- From: Peter Kim [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 2:16 PM To: Lucene Users List Subject: RE: Lucene Book in UK Does anyone know why Amazon.com lists the list price for Lucene in Action as $60.95? Bookpool.com has the list price as $44.95, which is the price that Manning is charging. After discounting, bookpool.com has it on sale for $27.50. Looking forward to getting my copy. Peter -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 1:54 PM To: Lucene Users List Subject: Re: Lucene Book in UK On Jan 6, 2005, at 1:23 PM, David Townsend wrote: > Sorry if this is the wrong forum but I wondered what's happened to > 'Lucene In Action' in the UK. Looking forward to reading it but > amazon.co.uk report it as a 'hard to find' item and are now quoting a > 4-6 week delivery time and tacking on a rare book charge. Amazon.com > are quoting shipping in 24hrs. Is this a new 'Boston Tea Party'? It's news to me that Amazon is shipping it in the U.S. even but I just checked and you're right! They *just* got it in stock though, so I'm sure it takes a bit more time for the U.K. to get it. It's been shipping from Manning's site for a couple of weeks now though, and as noted, it includes the e-book along with it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene Book in UK
Does anyone know why Amazon.com lists the list price for Lucene in Action as $60.95? Bookpool.com has the list price as $44.95, which is the price that Manning is charging. After discounting, bookpool.com has it on sale for $27.50. Looking forward to getting my copy. Peter -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 1:54 PM To: Lucene Users List Subject: Re: Lucene Book in UK On Jan 6, 2005, at 1:23 PM, David Townsend wrote: > Sorry if this is the wrong forum but I wondered what's happened to > 'Lucene In Action' in the UK. Looking forward to reading it but > amazon.co.uk report it as a 'hard to find' item and are now quoting a > 4-6 week delivery time and tacking on a rare book charge. Amazon.com > are quoting shipping in 24hrs. Is this a new 'Boston Tea Party'? It's news to me that Amazon is shipping it in the U.S. even but I just checked and you're right! They *just* got it in stock though, so I'm sure it takes a bit more time for the U.K. to get it. It's been shipping from Manning's site for a couple of weeks now though, and as noted, it includes the e-book along with it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: multi-threaded thru-put in lucene
Thanks Doug! You are right, by adding a Thread.sleep() helped greatly. Mysteries of Java... Another Java threading question. With 1 thread, iterations of 100 searches, it took about 850 ms. by adding a Thread.sleep(10) in the loop. It is taking about 2200 ms. So there is 2200 - 1850 = 350 ms unaccounted for. Is that due to thread scheduling/context switching? Thanks -John On Thu, 6 Jan 2005 10:36:12 -0800, John Wang <[EMAIL PROTECTED]> wrote: > Is the operation IndexSearcher.search I/O or CPU bound if I am doing > 100's of searches on the same query? > > Thanks > > -John > > > On Thu, 06 Jan 2005 10:31:49 -0800, Doug Cutting <[EMAIL PROTECTED]> wrote: > > John Wang wrote: > > > 1 thread: 445 ms. > > > 2 threads: 870 ms. > > > 5 threads: 2200 ms. > > > > > > Pretty much the same numbers you'd get if you are running them > > > sequentially. > > > > > > Any ideas? Am I doing something wrong? > > > > If you're performing compute-bound work on a single-processor machine > > then threading should give you no better performance than sequential, > > perhaps a bit worse. If you're performing io-bound work on a > > single-disk machine then threading should again provide no improvement. > > If the task is evenly compute and i/o bound then you could achieve at > > best a 2x speedup on a single CPU system with a single disk. > > > > If you're compute-bound on an N-CPU system then threading should > > optimally be able to provide a factor of N speedup. > > > > Java's scheduling of compute-bound theads when no threads call > > Thread.sleep() can also be very unfair. > > > > Doug > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Book in UK
On Jan 6, 2005, at 1:23 PM, David Townsend wrote: Sorry if this is the wrong forum but I wondered what's happened to 'Lucene In Action' in the UK. Looking forward to reading it but amazon.co.uk report it as a 'hard to find' item and are now quoting a 4-6 week delivery time and tacking on a rare book charge. Amazon.com are quoting shipping in 24hrs. Is this a new 'Boston Tea Party'? It's news to me that Amazon is shipping it in the U.S. even but I just checked and you're right! They *just* got it in stock though, so I'm sure it takes a bit more time for the U.K. to get it. It's been shipping from Manning's site for a couple of weeks now though, and as noted, it includes the e-book along with it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: multi-threaded thru-put in lucene
Is the operation IndexSearcher.search I/O or CPU bound if I am doing 100's of searches on the same query? Thanks -John On Thu, 06 Jan 2005 10:31:49 -0800, Doug Cutting <[EMAIL PROTECTED]> wrote: > John Wang wrote: > > 1 thread: 445 ms. > > 2 threads: 870 ms. > > 5 threads: 2200 ms. > > > > Pretty much the same numbers you'd get if you are running them sequentially. > > > > Any ideas? Am I doing something wrong? > > If you're performing compute-bound work on a single-processor machine > then threading should give you no better performance than sequential, > perhaps a bit worse. If you're performing io-bound work on a > single-disk machine then threading should again provide no improvement. > If the task is evenly compute and i/o bound then you could achieve at > best a 2x speedup on a single CPU system with a single disk. > > If you're compute-bound on an N-CPU system then threading should > optimally be able to provide a factor of N speedup. > > Java's scheduling of compute-bound theads when no threads call > Thread.sleep() can also be very unfair. > > Doug > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: multi-threaded thru-put in lucene
John Wang wrote: 1 thread: 445 ms. 2 threads: 870 ms. 5 threads: 2200 ms. Pretty much the same numbers you'd get if you are running them sequentially. Any ideas? Am I doing something wrong? If you're performing compute-bound work on a single-processor machine then threading should give you no better performance than sequential, perhaps a bit worse. If you're performing io-bound work on a single-disk machine then threading should again provide no improvement. If the task is evenly compute and i/o bound then you could achieve at best a 2x speedup on a single CPU system with a single disk. If you're compute-bound on an N-CPU system then threading should optimally be able to provide a factor of N speedup. Java's scheduling of compute-bound theads when no threads call Thread.sleep() can also be very unfair. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Book in UK
Have you checked Manning's site (http://www.manning.com), where you can order the book directly from them (the publisher) and they will also provide you with a copy of an eBook in the mean time until your paperback arrives in mail? -pedja P.S. two cubes of sugar with that tea, please :) David Townsend said the following on 1/6/2005 1:23 PM: Sorry if this is the wrong forum but I wondered what's happened to 'Lucene In Action' in the UK. Looking forward to reading it but amazon.co.uk report it as a 'hard to find' item and are now quoting a 4-6 week delivery time and tacking on a rare book charge. Amazon.com are quoting shipping in 24hrs. Is this a new 'Boston Tea Party'? cheers David - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene Book in UK
Sorry if this is the wrong forum but I wondered what's happened to 'Lucene In Action' in the UK. Looking forward to reading it but amazon.co.uk report it as a 'hard to find' item and are now quoting a 4-6 week delivery time and tacking on a rare book charge. Amazon.com are quoting shipping in 24hrs. Is this a new 'Boston Tea Party'? cheers David - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: multi-threaded thru-put in lucene
I actually ran a few tests. But seeing similar behaviors. After removing all the possible variations, this is what I used: 1 Index, doccount is 15,000. Using FSDirectory, e.g. new IndexSearcher(String path), by default I think it uses FSDirectory. each thread is doing 100 iterations of search, e.g. for (int i=0;i<100;++i){ idxSearcher.search(q); } for each thread and each iteration, I am using the same query. I am timing them the following way: long start=System.currenTimeInMillis(); for (int i =0;i wrote: > > : This is what we found: > : > : 1 thread, search takes 20 ms. > : > : 2 threads, search takes 40 ms. > : > : 5 threads, search takes 100 ms. > > how big is your index? What are the term frequencies like in your index? > how many differnt queries did you try? what was the structure of your > query objects like? were you using a RAMDirectory or an FSDirectory? what > hardware were you running on? > > Is your test application small enough that you can post it to the list? > > I haven't done a lot of PMA testing of Lucene, but from what limited > testing i have done I'm a little suprised at those numbers, you'd get > results just as good if you ran the queries sequentially. > > -Hoss > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene appreciation
Hello, Nice work, this mail is just to say that you already have a french concurrent ;) www.keljob.com For now, it's a sql server search engine, but we are planning to implement lucene in two or three months. Of course, we don't handle the same jobs volume (165,000 jobs vs 1,756,000) so it's reasonably "fast", Concerning the crawler, we use a proprietary robot made in C/C++, we plan to move to a java & open src solution this year. we already have a nice experience with lucene (with jakarta James) implemented for a recruiter tool (emailed job applications management) Also planning to implement it (maybe in association with Carrot²) in a resume search engine. Lot of work to be done this year so :) -- Sven Duzont[EMAIL PROTECTED] 38, rue du Sentier / 75002 Paris Tél. : 00 33 (1) 40 13 63 30 Fax : 00 33 (1) 40 13 01 84 En Octobre 2004 le Groupe Keljob cest : * Le 1er acteur du e-recrutement, * 477 000 abonnements à l'alerte email, * 125 000 CV de moins de 6 mois, * 6 300 000 annonces lues, * 2 488 599 visites. jeudi 16 décembre 2004, 17:26:22, vous avez écrit: RK> Hello fellow Lucene users, RK> I'd like to introduce myself and say thanks. We've recently launched RK> http://www.indeed.com, a search engine for jobs based on Lucene. I'm RK> consistently impressed with the quality, professionalism and support of the RK> Lucene project and the Lucene community. This mailing list has been a great RK> help. I'd also like to give mention to some of the consultants who had a big RK> hand in making our project a reality ... Thank you Otis, Aviran, Sergiu & RK> Dawid. RK> As for our project, we're in beta and would love to get your feedback. The RK> index size is currently ~1.8m jobs. My personal email address is rony a_t RK> indeed.com. If you are interested in Lucene work you can set up an rss feed RK> or email alert from here: RK> http://www.indeed.com/search?q=lucene&sort=date RK> Is it possible to be added to the Wiki Powered By page? RK> Thanks Everyone, RK> Rony RK> Indeed.com - one search. all Jobs. RK> http://www.indeed.com RK> - RK> To unsubscribe, e-mail: [EMAIL PROTECTED] RK> For additional commands, e-mail: RK> [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[ANNOUNCE] dotLucene 1.4.3 RC2 (port of Jakarta Lucene to C#)
Hi Folks, I am pleased to announce the availability of "dotLucene 1.4.3 RC2 build-001" This is the second "Release Candidate" release of version 1.4.3 of Jakarta Lucene ported to C# and is intended to be "Final". Please visit http://www.sourceforge.net/projects/dotlucene/ to learn more about dotLucene and to download the source code. Best regards, -- George Aroush - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[ANNOUNCE] Highlighter.Net 1.4.0 RC1 (port of lucene Java highlighter to C#)
Hi Folks, I am pleased to announce the availability of "Highlighter.Net 1.4.0 RC1 build 001" This is the first "Release Candidate" release of version 1.4.0 of Lucene's Java Highlighter ported to C# and is intended to be "Final". Please visit http://www.sourceforge.net/projects/dotlucene/ to learn more about Highlighter.Net as well as dotLucene and to download the source code. Best regards, -- George Aroush - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lock obtain timed out from an MDB
On Thu, 6 Jan 2005, Erik Hatcher wrote: > > On Jan 6, 2005, at 10:41 AM, Joseph Ottinger wrote: > > SHouldn't Lucene warn the user if they do something like this? > > When a user indexes a null? Or attempts to write to the index from two > different IndexWriter instances? > > I believe you should get an NPE if you try index a null field value? > No? Well, I'd agree - the lack of an exception was rather disturbing, considering how badly it destroyed Lucene for the application (requiring not only restart but cleanup as well.) I don't know Lucene well enough to say "according to the code..." but NOT adding the null managed to correct the problem entirely. --- Joseph B. Ottinger http://enigmastation.com IT Consultant[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lock obtain timed out from an MDB
On Jan 6, 2005, at 10:41 AM, Joseph Ottinger wrote: SHouldn't Lucene warn the user if they do something like this? When a user indexes a null? Or attempts to write to the index from two different IndexWriter instances? I believe you should get an NPE if you try index a null field value? No? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: multi-threaded thru-put in lucene
Hi, I have a question. How big (in size and documents) is your index ? How many indexes do you search ? Thanks, Mariella At 10:54 AM 1/5/2005 -0800, you wrote: Hi folks: We are trying to measure thru-put lucene in a multi-threaded environment. This is what we found: 1 thread, search takes 20 ms. 2 threads, search takes 40 ms. 5 threads, search takes 100 ms. Seems like under a multi-threaded scenario, thru-put isn't good, performance is not any better than that of 1 thread. I tried to share an IndexSearcher amongst all threads as well as having an IndexSearcher per thread. Both yield same numbers. Is this consistent with what you'd expect? Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lock obtain timed out from an MDB
Well, I think I isolated the problem: stupid error on my part, I think. I was adding an indexed field that had, um, a value of null. Correcting that made the process go much more properly - although note that I haven't scaled up to have multiple elements to index. Good milestone, though. SHouldn't Lucene warn the user if they do something like this? On Thu, 6 Jan 2005, Erik Hatcher wrote: > Do you have two threads simultaneously either writing or deleting from > the index? > > Erik > > On Jan 6, 2005, at 9:27 AM, Joseph Ottinger wrote: > > > Sorry to reply to my own post, but I now have a greater understanding > > of > > PART of my problem - my SQLDirectory is not *quite* right, I think. So > > I'm > > rolling back to FSDirectory. > > > > Now, I have a servlet that writes to the filesystem to simplify things > > (as > > I'm not confident enough to debug the RDMS-based directory yet. That's > > a > > task for later, I think). The servlet says it successfully creates the > > index like so: > > > > try { > >open the index with create=false > > } catch (file not found) { > >open the index with create=true > > } > > index.optimize(); > > index.close(); > > > > Now, when I fire off any messages to the MDB, it yields the following: > > > > java.io.IOException: Lock obtain timed out: > > Lock@/var/tmp/lucene-d6b0a3281487d1bc4d169d00426f475d-write.lock > > at org.apache.lucene.store.Lock.obtain(Lock.java:58) > > > > Now, this is on only two messages to the MDB, not just a flood of > > messages. Two handlers, so I expect a lock in one's case, but not the > > first MDB call - it should be the one causing the lock for the second > > one, > > if a lock exists at all. > > > > I've verified that when the servlet that initializes the index runs, a > > lock file is NOT present, but again, it looks like every message fired > > through looks for a lock and finds one, when I would think it wouldn't > > be > > there. > > > > What am I not understanding? > > > > On Thu, 6 Jan 2005, Joseph Ottinger wrote: > > > >> If this is a stupid question, I deeply apologize. I'm stumped. > >> > >> I have a message-driven EJB using Lucene. In *every* case where the > >> MDB is > >> trying to create an index, I'm getting "Lock obtain timed out." > >> > >> It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the > >> user > >> list has referred to before - but I don't see how the suggestions > >> there > >> apply to what I'm trying to do. (It's creating a lock file in > >> /var/tmp/ > >> properly, from what I can see, so it's not write permissions, I > >> imagine.) > >> > >> I set the infoStream in my index writer to System.out, but I don't > >> see any > >> extra information. > >> > >> I'm using a SQL-based Directory object, but I get the same problem if > >> I > >> refer to a file directly. > >> > >> Is there a way to override the Lock portably so that I can have the > >> lock > >> itself managed in an RDMS? (It's a J2EE project, so relying on file > >> access > >> is problematic; if the beans using lucene to write to the index are on > >> multiple servers, multiple locks could exist anyway.) > >> > >> -- > >> - > >> Joseph B. Ottinger > >> http://enigmastation.com > >> IT Consultant > >> [EMAIL PROTECTED] > >> > >> > >> - > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > > > > --- > > Joseph B. Ottinger http://enigmastation.com > > IT Consultant[EMAIL PROTECTED] > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --- Joseph B. Ottinger http://enigmastation.com IT Consultant[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lock obtain timed out from an MDB
Do you have two threads simultaneously either writing or deleting from the index? Erik On Jan 6, 2005, at 9:27 AM, Joseph Ottinger wrote: Sorry to reply to my own post, but I now have a greater understanding of PART of my problem - my SQLDirectory is not *quite* right, I think. So I'm rolling back to FSDirectory. Now, I have a servlet that writes to the filesystem to simplify things (as I'm not confident enough to debug the RDMS-based directory yet. That's a task for later, I think). The servlet says it successfully creates the index like so: try { open the index with create=false } catch (file not found) { open the index with create=true } index.optimize(); index.close(); Now, when I fire off any messages to the MDB, it yields the following: java.io.IOException: Lock obtain timed out: Lock@/var/tmp/lucene-d6b0a3281487d1bc4d169d00426f475d-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:58) Now, this is on only two messages to the MDB, not just a flood of messages. Two handlers, so I expect a lock in one's case, but not the first MDB call - it should be the one causing the lock for the second one, if a lock exists at all. I've verified that when the servlet that initializes the index runs, a lock file is NOT present, but again, it looks like every message fired through looks for a lock and finds one, when I would think it wouldn't be there. What am I not understanding? On Thu, 6 Jan 2005, Joseph Ottinger wrote: If this is a stupid question, I deeply apologize. I'm stumped. I have a message-driven EJB using Lucene. In *every* case where the MDB is trying to create an index, I'm getting "Lock obtain timed out." It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the user list has referred to before - but I don't see how the suggestions there apply to what I'm trying to do. (It's creating a lock file in /var/tmp/ properly, from what I can see, so it's not write permissions, I imagine.) I set the infoStream in my index writer to System.out, but I don't see any extra information. I'm using a SQL-based Directory object, but I get the same problem if I refer to a file directly. Is there a way to override the Lock portably so that I can have the lock itself managed in an RDMS? (It's a J2EE project, so relying on file access is problematic; if the beans using lucene to write to the index are on multiple servers, multiple locks could exist anyway.) -- - Joseph B. Ottinger http://enigmastation.com IT Consultant [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --- Joseph B. Ottinger http://enigmastation.com IT Consultant[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lock obtain timed out from an MDB
Sorry to reply to my own post, but I now have a greater understanding of PART of my problem - my SQLDirectory is not *quite* right, I think. So I'm rolling back to FSDirectory. Now, I have a servlet that writes to the filesystem to simplify things (as I'm not confident enough to debug the RDMS-based directory yet. That's a task for later, I think). The servlet says it successfully creates the index like so: try { open the index with create=false } catch (file not found) { open the index with create=true } index.optimize(); index.close(); Now, when I fire off any messages to the MDB, it yields the following: java.io.IOException: Lock obtain timed out: Lock@/var/tmp/lucene-d6b0a3281487d1bc4d169d00426f475d-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:58) Now, this is on only two messages to the MDB, not just a flood of messages. Two handlers, so I expect a lock in one's case, but not the first MDB call - it should be the one causing the lock for the second one, if a lock exists at all. I've verified that when the servlet that initializes the index runs, a lock file is NOT present, but again, it looks like every message fired through looks for a lock and finds one, when I would think it wouldn't be there. What am I not understanding? On Thu, 6 Jan 2005, Joseph Ottinger wrote: > If this is a stupid question, I deeply apologize. I'm stumped. > > I have a message-driven EJB using Lucene. In *every* case where the MDB is > trying to create an index, I'm getting "Lock obtain timed out." > > It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the user > list has referred to before - but I don't see how the suggestions there > apply to what I'm trying to do. (It's creating a lock file in /var/tmp/ > properly, from what I can see, so it's not write permissions, I imagine.) > > I set the infoStream in my index writer to System.out, but I don't see any > extra information. > > I'm using a SQL-based Directory object, but I get the same problem if I > refer to a file directly. > > Is there a way to override the Lock portably so that I can have the lock > itself managed in an RDMS? (It's a J2EE project, so relying on file access > is problematic; if the beans using lucene to write to the index are on > multiple servers, multiple locks could exist anyway.) > > --- > Joseph B. Ottinger http://enigmastation.com > IT Consultant[EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --- Joseph B. Ottinger http://enigmastation.com IT Consultant[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Span Query Performance
Sorry for the duplicate on lucene-dev, it should have gone to lucene-user directly: A bit more: On Thursday 06 January 2005 10:22, Paul Elschot wrote: > On Thursday 06 January 2005 02:17, Andrew Cunningham wrote: > > Hi all, > > > > I'm currently doing a query similar to the following: > > > > for w in wordset: > > query = w near (word1 V word2 V word3 ... V word1422); > > perform query > > > > and I am doing this through SpanQuery.getSpans(), iterating through the > > spans and counting > > the matches, which can result in 4782282 matches (essentially I am only > > after the match count). > > The query works but the performance can be somewhat slow; so I am wondering: > > ... > > c) Is there a faster method to what I am doing I should consider? > > Preindexing all word combinations that you're interested in. > In case you know all the words in advance, you could also index a helper word at the same position as each of those words. This requires a custom analyzer that inserts the helper word in the token stream with a zero position increment. The query then simplifies to: query = w near helperword which would probably speed things up significantly. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
On Jan 5, 2005, at 5:04 AM, Erik Hatcher wrote: On Jan 4, 2005, at 9:43 PM, Bill Janssen wrote: Let me be a bit more explicit. My method (essentially an after-method, for those Lisp'rs out there) begins thusly: protected Query getFieldQuery (String field, Analyzer a, String queryText) throws ParseException { Query x = super.getFieldQuery(field, a, queryText); ... } If I remove the "Analyzer a" from both the signature and the super call, the super call won't compile because that method isn't in the QueryParser in 1.4.1. But my getFieldQuery() method won't even be called in 1.4.1, because it doesn't exist in that version of the QueryParser. Will it work if you override this method also? protected Query getFieldQuery(String field, Analyzer analyzer, String queryText, int slop) My head is spinning looking at all the various signatures of this method we have and trying to backtrack where things went awry. I tried out my suggestion (code pasted below) against lucene-1.4-final.jar and lucene-1.4-3.jar (I don't have the 1.4.1 JAR handy) and was successful. If you override both signatures of getFieldQuery it should work fine for you across all 1.4.x versions. Not ideal, but at least a workaround. Is this workable for you, Bill? Erik public class CustomQueryParser extends QueryParser { public CustomQueryParser(String field, Analyzer analyzer) { super(field, analyzer); } protected Query getFieldQuery(String field, Analyzer analyzer, String queryText, int slop) throws ParseException { System.out.println("(slop) queryText = " + queryText); return null; } protected Query getFieldQuery (String field, Analyzer a, String queryText) throws ParseException { System.out.println("(no-slop) queryText = " + queryText); return null; } public static void main(String[] args) throws Exception { CustomQueryParser qp = new CustomQueryParser("f", new WhitespaceAnalyzer()); qp.parse("foo bar"); qp.parse("\"foo bar\""); } } The output was identical with both versions of Lucene: (no-slop) queryText = foo (no-slop) queryText = bar (slop) queryText = foo bar - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lock obtain timed out from an MDB
If this is a stupid question, I deeply apologize. I'm stumped. I have a message-driven EJB using Lucene. In *every* case where the MDB is trying to create an index, I'm getting "Lock obtain timed out." It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the user list has referred to before - but I don't see how the suggestions there apply to what I'm trying to do. (It's creating a lock file in /var/tmp/ properly, from what I can see, so it's not write permissions, I imagine.) I set the infoStream in my index writer to System.out, but I don't see any extra information. I'm using a SQL-based Directory object, but I get the same problem if I refer to a file directly. Is there a way to override the Lock portably so that I can have the lock itself managed in an RDMS? (It's a J2EE project, so relying on file access is problematic; if the beans using lucene to write to the index are on multiple servers, multiple locks could exist anyway.) --- Joseph B. Ottinger http://enigmastation.com IT Consultant[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Null Pointer Exception
Do'H , Please disregard the message forgot to assign the return value when i create the indexreader Rupinder Singh Mazara wrote: Hi all while executing a query on lucene i get the following execption, if a check for the IndexSearcher object == nulll or a assert i donot get any errors ? Please help me out on this . java.lang.NullPointerException at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:69) at org.apache.lucene.search.Similarity.idf(Similarity.java:255) at org.apache.lucene.search.TermQuery$TermWeight.sumOfSquaredWeights(TermQuery.java:47) at org.apache.lucene.search.BooleanQuery$BooleanWeight.sumOfSquaredWeights(BooleanQuery.java:110) at org.apache.lucene.search.Query.weight(Query.java:86) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) To access the Searchable object I use the following lines of code, at various places in my web application, all was fine till this morning and running command line test scripts does not show a error public static IndexReader fetchCitationReader(ServletContext context) throws IOException { IndexReader rval = (IndexReader) context.getAttribute("luceneIndexReader"); if (rval == null) { String var = (String) context.getAttribute("luceneRootName"); System.out.println("var = " + var); IndexReader indexReader = IndexReader.open(new File(var)); context.setAttribute("luceneIndexReader", indexReader); } return rval; } public static Searcher fetchCitationSearcher(ServletContext context) throws IOException { Searcher rval = (Searcher) context.getAttribute("luceneSearchable"); if (rval == null) { rval = new IndexSearcher(fetchCitationReader(context)); context.setAttribute("luceneSearchable", rval); } return rval; } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Null Pointer Exception
Hi all while executing a query on lucene i get the following execption, if a check for the IndexSearcher object == nulll or a assert i donot get any errors ? Please help me out on this . java.lang.NullPointerException at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:69) at org.apache.lucene.search.Similarity.idf(Similarity.java:255) at org.apache.lucene.search.TermQuery$TermWeight.sumOfSquaredWeights(TermQuery.java:47) at org.apache.lucene.search.BooleanQuery$BooleanWeight.sumOfSquaredWeights(BooleanQuery.java:110) at org.apache.lucene.search.Query.weight(Query.java:86) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) To access the Searchable object I use the following lines of code, at various places in my web application, all was fine till this morning and running command line test scripts does not show a error public static IndexReader fetchCitationReader(ServletContext context) throws IOException { IndexReader rval = (IndexReader) context.getAttribute("luceneIndexReader"); if (rval == null) { String var = (String) context.getAttribute("luceneRootName"); System.out.println("var = " + var); IndexReader indexReader = IndexReader.open(new File(var)); context.setAttribute("luceneIndexReader", indexReader); } return rval; } public static Searcher fetchCitationSearcher(ServletContext context) throws IOException { Searcher rval = (Searcher) context.getAttribute("luceneSearchable"); if (rval == null) { rval = new IndexSearcher(fetchCitationReader(context)); context.setAttribute("luceneSearchable", rval); } return rval; } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Span Query Performance
On Thursday 06 January 2005 02:17, Andrew Cunningham wrote: > Hi all, > > I'm currently doing a query similar to the following: > > for w in wordset: > query = w near (word1 V word2 V word3 ... V word1422); > perform query > > and I am doing this through SpanQuery.getSpans(), iterating through the > spans and counting > the matches, which can result in 4782282 matches (essentially I am only > after the match count). > The query works but the performance can be somewhat slow; so I am wondering: > > a) Would the query potentially run faster if I used > Searcher.search(query) with a custom similarity, > or do both methods essentially use the same mechanics It would be somewhat slower, because it loops over the getSpans() and computes document scores and constructs a Hits from the scores. > b) Does using a RAMDirectory improve query performance any significant > amount. That depends on your operating system, the size of the index, the amount of RAM you can use, the file buffering efficiency, other loads on the computer ... > c) Is there a faster method to what I am doing I should consider? Preindexing all word combinations that you're interested in. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: multi-threaded thru-put in lucene
: This is what we found: : : 1 thread, search takes 20 ms. : : 2 threads, search takes 40 ms. : : 5 threads, search takes 100 ms. how big is your index? What are the term frequencies like in your index? how many differnt queries did you try? what was the structure of your query objects like? were you using a RAMDirectory or an FSDirectory? what hardware were you running on? Is your test application small enough that you can post it to the list? I haven't done a lot of PMA testing of Lucene, but from what limited testing i have done I'm a little suprised at those numbers, you'd get results just as good if you ran the queries sequentially. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Question about Analyzer and words spelled in different languages
: Is there any already written analyzer that would take that name : (Schäffer or any other name that has entities) so that : Lucene index could searched (once the field has been indexed) for the real : version of the name, which is : : Schäffer : : and the english spelled version of the name which is : : Schaffer I don't know about the un-xml-escaping part of things (there are lots of xml escapng libraries out there, i'm sure one of them has an unescape) but there was a recent discussion about unicode characters that look similar and writting an analyzer that could know about them. the last message in the thread was from me, pointing out that it should be easy to build the mapping table once, and then write a quick and dirty Analyzer filter to use it ... but no one seemed to have any code handy that allready did that... http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]&by=thread&from=962022 -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]