RE: Extracting Lucene onto Tomcat
hi Just Copy the lucene.war file into the TomCat webApps Directory, and then start the Tomcat On the Browser type... http://localhost:8080/luceneweb will serve u the Pages. But first u have to index u'r directory for the web module to Serve u the searchable hits , I think there should be some Information in the Lucene package itself for doing this with regards Karthik -Original Message- From: Zilverline info [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 7:56 PM To: Lucene Users List Subject: Re: Extracting Lucene onto Tomcat Hi Ian, Depending on what you want to do, you could also follow the installation instructions on http://www.zilverline.org. It describes how to install zilverline, but the same goes for the lucene war. Hope this helps, Michael Franken Ian McDonnell wrote: >Also another silly question, do i need to setup a war on the server? > > >--- Ian McDonnell <[EMAIL PROTECTED]> wrote: >Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got the classpath set to web-inf/classes. > >Have i extraced it to the wrong directory? > > >--- Erik Hatcher <[EMAIL PROTECTED]> wrote: >On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: > > >>Is the package information and import paths ready to deploy on Tomcat >>server. I tried extracting lucene on the server, but when i compile >>files, it just throws numerous no class definition errors and errors >>relating to the package. >> >> > >Huh? Lucene certainly deploys just fine in Tomcat web applications (in >a WAR under WEB-INF/lib). Could you elaborate on what you mean here? > > Erik > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > >_ >Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > >_ >Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Use of Convertes or Parser
Ok Thanks. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:33 PM To: Lucene Users List Subject: Re: Use of Convertes or Parser Lucene cannot parse those document formats that you mentioned. You need 3rd party parsers to do that. For example, POI will parse Excel and MS Word docs, PDFBox will parse PDF. Otis --- "Natarajan.T" <[EMAIL PROTECTED]> wrote: > Hi Guys, > > I have a small query, ie. Lucene 1.4 APIs directly indexing all the > documents(PPT,PDF,WORD,etc.) then why we go for Converters or > Parsers. > > > Thanks, > Natarajan. > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Syntax of Query
Guys/Gals, Does and one have any pointers for this kind of query? Thanks. Need some help with creating a query. Here is the scenario: Field 1: Field 2: Field 3: MultiSelect 1 : MultiSelect 2 : What would the query look like if the condition is at any time there will be one entry from field 1, 2, or 3 and few entries from MultiSelect1 and few entries from MultiSelect. Would it look something like +field1 +(val11 OR val12 OR val14) +(val21 OR val23 OR val24) Thanks for all you guys support. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Can I retrieve token offsets from Hits?
> I need these values for hihglighting. I've already looked to > Highlighter in sandbox but it actually re-analyzes the original > document's field. Technically not true, as of a few months ago. The good news is the highlighter has been redesigned specifically to use TokenStreams not Analyzers. This would enable you to pass the token position information in from a pre-computed store of token positions. The bad news is that such a token-position storage feature has not been added to core Lucene yet. If it ever is added the highlighter is already set up to make good use of it. Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Weighting database fields
Ernesto De Santis wrote: If some field have set a boots value in index time, and when in search time the query have another boost value for this field, what happens? which value is used for boost? The two boosts are both multiplied into the score. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Weighting database fields
Hi Erik > On Jul 21, 2004, at 11:40 AM, Anson Lau wrote: > > Is there any benefit to set the boost during indexing rather than set > > it > > during query? > > It allows setting each document differently. For example, > TheServerSide is using field-level boosts at index time to control > ordering by date, such that newer articles come up first. This could > not be done at query time since each document gets a different field > boost. If some field have set a boots value in index time, and when in search time the query have another boost value for this field, what happens? which value is used for boost? Bye, Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.725 / Virus Database: 480 - Release Date: 19/07/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Slightly off topic, I need to have luke use my Analyzer
Sorry typo in the version date in my previous mail -- I meant Luke v 0.5 (2004-06-25) -Original Message- From: Chellappa, Kannan Sent: Wednesday, July 21, 2004 12:16 PM To: Lucene Users List Subject: RE: Slightly off topic, I need to have luke use my Analyzer Worked for me. I added my jar to the classpath and my analyzer appeared in the analyzers list in the search tab as well as in the analyzers list in the plugins tab. I am using Luke v 0.5 (2004-05-25) Kannan -Original Message- From: Rob Jose [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 11:37 AM To: Lucene Users List Subject: Slightly off topic, I need to have luke use my Analyzer Sorry for the slightly off topic post, but I have a need to use luke with my Analyzer. Has anyone done this? I have added a jar file to my classpath, but that didn't help. Thanks in advance Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Slightly off topic, I need to have luke use my Analyzer
Worked for me. I added my jar to the classpath and my analyzer appeared in the analyzers list in the search tab as well as in the analyzers list in the plugins tab. I am using Luke v 0.5 (2004-05-25) Kannan -Original Message- From: Rob Jose [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 11:37 AM To: Lucene Users List Subject: Slightly off topic, I need to have luke use my Analyzer Sorry for the slightly off topic post, but I have a need to use luke with my Analyzer. Has anyone done this? I have added a jar file to my classpath, but that didn't help. Thanks in advance Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Slightly off topic, I need to have luke use my Analyzer
Sorry for the slightly off topic post, but I have a need to use luke with my Analyzer. Has anyone done this? I have added a jar file to my classpath, but that didn't help. Thanks in advance Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
I will post a patch soon Aviran -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:56 PM To: Lucene Users List Subject: Re: Sort: 1.4-rc3 vs. 1.4-final The key in the WeakHashMap should be the IndexReader, not the Entry. I think this should become a two-level cache, a WeakHashMap of HashMaps, the WeakHashMap keyed by IndexReader, the HashMap keyed by Entry. I think the Entry class can also be changed to not include an IndexReader field. Does this make sense? Would someone like to construct a patch and submit it to the developer list? Doug Aviran wrote: > I think I found the problem > FieldCacheImpl uses WeakHashMap to store the cached objects, but since > there is no other reference to this cache it is getting released. > Switching to HashMap solves it. The only problem is that I don't see > anywhere where the cached object will get released if you open a new > IndexReader. > > Aviran > > -Original Message- > From: Greg Gershman [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 21, 2004 13:13 PM > To: Lucene Users List > Subject: RE: Sort: 1.4-rc3 vs. 1.4-final > > > I've done a bit more snooping around; it seems that in > FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a > stored comparator in the cache always return null. This occurs even > for the built-in sort types (I tested it on integers and my code for > longs). The comparators don't even appear to be being stored in the > HashMap to begin with. > > Any ideas? > > Greg > > > > --- Aviran <[EMAIL PROTECTED]> wrote: > >>Since I had to implement sorting in lucene 1.2 I had >>to write my own sorting >>using something similar to a lucene's contribution >>called SortField. >>Yesterday I did some tests, trying to use lucene 1.4 >>Sort objects and I >>realized that my old implementation works 40% faster >>then Lucene's >>implementation. My guess is that you are right and >>there is a problem with >>the cache although I couldn't find what that is yet. >> >>Aviran >> >>-Original Message- >>From: Greg Gershman [mailto:[EMAIL PROTECTED] >>Sent: Wednesday, July 21, 2004 9:22 AM >>To: [EMAIL PROTECTED] >>Subject: Sort: 1.4-rc3 vs. 1.4-final >> >> >>When rc3 came out, I modified the classes used for >>Sorting to, in addition to Integer, Float and >>String-based sort keys, use Long values. All I did >>was add extra statements in 2 classes (SortField and >>FieldSortedHitQueue) that made a special case for >>longs, and created a LongSortedHitQueue identical to >>the IntegerSortedHitQueue, only using longs. >> >>This worked as expected; Long values converted to >>strings and stored in Field.Keyword type fields >>would >>be sorted according to Long order. The initial >>query >>would take a while, to build the sorted array, but >>subsequent queries would take little to no time at >>all. >> >>I went back to look at 1.4 final, and noticed the >>Sort implementation has >>changed quite a bit. I tried the same type of >>modifications to the existing >>source files, but was unable to achieve similiar >>results. >>Each subsequent query seems to take a significant >>amount of time, as if the Sorted array is being >>rebuilt each time. Also, I tried sorting on an >>Integer fields and got similar results, which leads >>me >>to believe there might be a caching problem >>somewhere. >> >>Has anyone else seen this in 1.4-final? Also, I >>would >>like it if Long sorted fields could become a part of >>the API; it makes sorting by date a breeze. >> >>Thanks! >> >>Greg Gershman >> >> >> >>__ >>Do you Yahoo!? >>New and Improved Yahoo! Mail - Send 10MB messages! >>http://promotions.yahoo.com/new_mail >> >> > > - > >>To unsubscribe, e-mail: [EMAIL PROTECTED] >>For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> > > - > >>To unsubscribe, e-mail: [EMAIL PROTECTED] >>For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > > > > > __ > Do you Yahoo!? > Vote for the stars of Yahoo!'s next ad campaign! > http://advision.webevents.yahoo.com/yahoo/votelifeengine/ > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comm
Re: Sort: 1.4-rc3 vs. 1.4-final
The key in the WeakHashMap should be the IndexReader, not the Entry. I think this should become a two-level cache, a WeakHashMap of HashMaps, the WeakHashMap keyed by IndexReader, the HashMap keyed by Entry. I think the Entry class can also be changed to not include an IndexReader field. Does this make sense? Would someone like to construct a patch and submit it to the developer list? Doug Aviran wrote: I think I found the problem FieldCacheImpl uses WeakHashMap to store the cached objects, but since there is no other reference to this cache it is getting released. Switching to HashMap solves it. The only problem is that I don't see anywhere where the cached object will get released if you open a new IndexReader. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:13 PM To: Lucene Users List Subject: RE: Sort: 1.4-rc3 vs. 1.4-final I've done a bit more snooping around; it seems that in FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored comparator in the cache always return null. This occurs even for the built-in sort types (I tested it on integers and my code for longs). The comparators don't even appear to be being stored in the HashMap to begin with. Any ideas? Greg --- Aviran <[EMAIL PROTECTED]> wrote: Since I had to implement sorting in lucene 1.2 I had to write my own sorting using something similar to a lucene's contribution called SortField. Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I realized that my old implementation works 40% faster then Lucene's implementation. My guess is that you are right and there is a problem with the cache although I couldn't find what that is yet. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:22 AM To: [EMAIL PROTECTED] Subject: Sort: 1.4-rc3 vs. 1.4-final When rc3 came out, I modified the classes used for Sorting to, in addition to Integer, Float and String-based sort keys, use Long values. All I did was add extra statements in 2 classes (SortField and FieldSortedHitQueue) that made a special case for longs, and created a LongSortedHitQueue identical to the IntegerSortedHitQueue, only using longs. This worked as expected; Long values converted to strings and stored in Field.Keyword type fields would be sorted according to Long order. The initial query would take a while, to build the sorted array, but subsequent queries would take little to no time at all. I went back to look at 1.4 final, and noticed the Sort implementation has changed quite a bit. I tried the same type of modifications to the existing source files, but was unable to achieve similiar results. Each subsequent query seems to take a significant amount of time, as if the Sorted array is being rebuilt each time. Also, I tried sorting on an Integer fields and got similar results, which leads me to believe there might be a caching problem somewhere. Has anyone else seen this in 1.4-final? Also, I would like it if Long sorted fields could become a part of the API; it makes sorting by date a breeze. Thanks! Greg Gershman __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
I just saw this post, I guess we both came to the same conclusion. The only problem is that the cached object never gets released, and a new one will get created every time you open a new IndexReader Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:30 PM To: Lucene Users List Subject: RE: Sort: 1.4-rc3 vs. 1.4-final I switched the Comparators and FieldCache classes to use java.util.HashMap instead of java.util.WeakHashMap, and got the performance boost I was looking for (test index of 100K documents; initial search took 991 ms, all subsequent searchs took < 90ms. Before, I was seeing initial query of ~1sec, subsequent queries between 500 and 700 ms, with comparator and field lookup table computed each time). I guess the question is why use a WeakHashMap here as opposed to a HashMap? Greg --- Greg Gershman <[EMAIL PROTECTED]> wrote: > I've done a bit more snooping around; it seems that > in > FieldSortedHitQueue.getCachedComparator(line 153), > calls to lookup a stored comparator in the cache > always return null. This occurs even for the > built-in > sort types (I tested it on integers and my code for > longs). The comparators don't even appear to be > being > stored in the HashMap to begin with. > > Any ideas? > > Greg > > > > --- Aviran <[EMAIL PROTECTED]> wrote: > > Since I had to implement sorting in lucene 1.2 I > had > > to write my own sorting > > using something similar to a lucene's contribution > > called SortField. > > Yesterday I did some tests, trying to use lucene > 1.4 > > Sort objects and I > > realized that my old implementation works 40% > faster > > then Lucene's > > implementation. My guess is that you are right and > > there is a problem with > > the cache although I couldn't find what that is > yet. > > > > Aviran > > > > -Original Message- > > From: Greg Gershman [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, July 21, 2004 9:22 AM > > To: [EMAIL PROTECTED] > > Subject: Sort: 1.4-rc3 vs. 1.4-final > > > > > > When rc3 came out, I modified the classes used for > > Sorting to, in addition to Integer, Float and > > String-based sort keys, use Long values. All I > did > > was add extra statements in 2 classes (SortField > and > > FieldSortedHitQueue) that made a special case for > > longs, and created a LongSortedHitQueue identical > to > > the IntegerSortedHitQueue, only using longs. > > > > This worked as expected; Long values converted to > > strings and stored in Field.Keyword type fields > > would > > be sorted according to Long order. The initial > > query > > would take a while, to build the sorted array, but subsequent > > queries would take little to no time at all. > > > > I went back to look at 1.4 final, and noticed the > > Sort implementation has > > changed quite a bit. I tried the same type of modifications to the > > existing source files, but was unable to achieve similiar > > results. > > Each subsequent query seems to take a significant > > amount of time, as if the Sorted array is being > > rebuilt each time. Also, I tried sorting on an > > Integer fields and got similar results, which > leads > > me > > to believe there might be a caching problem > > somewhere. > > > > Has anyone else seen this in 1.4-final? Also, I > > would > > like it if Long sorted fields could become a part > of > > the API; it makes sorting by date a breeze. > > > > Thanks! > > > > Greg Gershman > > > > > > > > __ > > Do you Yahoo!? > > New and Improved Yahoo! Mail - Send 10MB messages! > > http://promotions.yahoo.com/new_mail > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > __ > Do you Yahoo!? > Vote for the stars of Yahoo!'s next ad campaign! > http://advision.webevents.yahoo.com/yahoo/votelifeengine/ > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
I think I found the problem FieldCacheImpl uses WeakHashMap to store the cached objects, but since there is no other reference to this cache it is getting released. Switching to HashMap solves it. The only problem is that I don't see anywhere where the cached object will get released if you open a new IndexReader. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 13:13 PM To: Lucene Users List Subject: RE: Sort: 1.4-rc3 vs. 1.4-final I've done a bit more snooping around; it seems that in FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored comparator in the cache always return null. This occurs even for the built-in sort types (I tested it on integers and my code for longs). The comparators don't even appear to be being stored in the HashMap to begin with. Any ideas? Greg --- Aviran <[EMAIL PROTECTED]> wrote: > Since I had to implement sorting in lucene 1.2 I had > to write my own sorting > using something similar to a lucene's contribution > called SortField. > Yesterday I did some tests, trying to use lucene 1.4 > Sort objects and I > realized that my old implementation works 40% faster > then Lucene's > implementation. My guess is that you are right and > there is a problem with > the cache although I couldn't find what that is yet. > > Aviran > > -Original Message- > From: Greg Gershman [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 21, 2004 9:22 AM > To: [EMAIL PROTECTED] > Subject: Sort: 1.4-rc3 vs. 1.4-final > > > When rc3 came out, I modified the classes used for > Sorting to, in addition to Integer, Float and > String-based sort keys, use Long values. All I did > was add extra statements in 2 classes (SortField and > FieldSortedHitQueue) that made a special case for > longs, and created a LongSortedHitQueue identical to > the IntegerSortedHitQueue, only using longs. > > This worked as expected; Long values converted to > strings and stored in Field.Keyword type fields > would > be sorted according to Long order. The initial > query > would take a while, to build the sorted array, but > subsequent queries would take little to no time at > all. > > I went back to look at 1.4 final, and noticed the > Sort implementation has > changed quite a bit. I tried the same type of > modifications to the existing > source files, but was unable to achieve similiar > results. > Each subsequent query seems to take a significant > amount of time, as if the Sorted array is being > rebuilt each time. Also, I tried sorting on an > Integer fields and got similar results, which leads > me > to believe there might be a caching problem > somewhere. > > Has anyone else seen this in 1.4-final? Also, I > would > like it if Long sorted fields could become a part of > the API; it makes sorting by date a breeze. > > Thanks! > > Greg Gershman > > > > __ > Do you Yahoo!? > New and Improved Yahoo! Mail - Send 10MB messages! > http://promotions.yahoo.com/new_mail > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
I switched the Comparators and FieldCache classes to use java.util.HashMap instead of java.util.WeakHashMap, and got the performance boost I was looking for (test index of 100K documents; initial search took 991 ms, all subsequent searchs took < 90ms. Before, I was seeing initial query of ~1sec, subsequent queries between 500 and 700 ms, with comparator and field lookup table computed each time). I guess the question is why use a WeakHashMap here as opposed to a HashMap? Greg --- Greg Gershman <[EMAIL PROTECTED]> wrote: > I've done a bit more snooping around; it seems that > in > FieldSortedHitQueue.getCachedComparator(line 153), > calls to lookup a stored comparator in the cache > always return null. This occurs even for the > built-in > sort types (I tested it on integers and my code for > longs). The comparators don't even appear to be > being > stored in the HashMap to begin with. > > Any ideas? > > Greg > > > > --- Aviran <[EMAIL PROTECTED]> wrote: > > Since I had to implement sorting in lucene 1.2 I > had > > to write my own sorting > > using something similar to a lucene's contribution > > called SortField. > > Yesterday I did some tests, trying to use lucene > 1.4 > > Sort objects and I > > realized that my old implementation works 40% > faster > > then Lucene's > > implementation. My guess is that you are right and > > there is a problem with > > the cache although I couldn't find what that is > yet. > > > > Aviran > > > > -Original Message- > > From: Greg Gershman [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, July 21, 2004 9:22 AM > > To: [EMAIL PROTECTED] > > Subject: Sort: 1.4-rc3 vs. 1.4-final > > > > > > When rc3 came out, I modified the classes used for > > Sorting to, in addition to Integer, Float and > > String-based sort keys, use Long values. All I > did > > was add extra statements in 2 classes (SortField > and > > FieldSortedHitQueue) that made a special case for > > longs, and created a LongSortedHitQueue identical > to > > the IntegerSortedHitQueue, only using longs. > > > > This worked as expected; Long values converted to > > strings and stored in Field.Keyword type fields > > would > > be sorted according to Long order. The initial > > query > > would take a while, to build the sorted array, but > > subsequent queries would take little to no time at > > all. > > > > I went back to look at 1.4 final, and noticed the > > Sort implementation has > > changed quite a bit. I tried the same type of > > modifications to the existing > > source files, but was unable to achieve similiar > > results. > > Each subsequent query seems to take a significant > > amount of time, as if the Sorted array is being > > rebuilt each time. Also, I tried sorting on an > > Integer fields and got similar results, which > leads > > me > > to believe there might be a caching problem > > somewhere. > > > > Has anyone else seen this in 1.4-final? Also, I > > would > > like it if Long sorted fields could become a part > of > > the API; it makes sorting by date a breeze. > > > > Thanks! > > > > Greg Gershman > > > > > > > > __ > > Do you Yahoo!? > > New and Improved Yahoo! Mail - Send 10MB messages! > > http://promotions.yahoo.com/new_mail > > > > > - > > To unsubscribe, e-mail: > > [EMAIL PROTECTED] > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > > - > > To unsubscribe, e-mail: > > [EMAIL PROTECTED] > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > __ > Do you Yahoo!? > Vote for the stars of Yahoo!'s next ad campaign! > http://advision.webevents.yahoo.com/yahoo/votelifeengine/ > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
I've done a bit more snooping around; it seems that in FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored comparator in the cache always return null. This occurs even for the built-in sort types (I tested it on integers and my code for longs). The comparators don't even appear to be being stored in the HashMap to begin with. Any ideas? Greg --- Aviran <[EMAIL PROTECTED]> wrote: > Since I had to implement sorting in lucene 1.2 I had > to write my own sorting > using something similar to a lucene's contribution > called SortField. > Yesterday I did some tests, trying to use lucene 1.4 > Sort objects and I > realized that my old implementation works 40% faster > then Lucene's > implementation. My guess is that you are right and > there is a problem with > the cache although I couldn't find what that is yet. > > Aviran > > -Original Message- > From: Greg Gershman [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 21, 2004 9:22 AM > To: [EMAIL PROTECTED] > Subject: Sort: 1.4-rc3 vs. 1.4-final > > > When rc3 came out, I modified the classes used for > Sorting to, in addition to Integer, Float and > String-based sort keys, use Long values. All I did > was add extra statements in 2 classes (SortField and > FieldSortedHitQueue) that made a special case for > longs, and created a LongSortedHitQueue identical to > the IntegerSortedHitQueue, only using longs. > > This worked as expected; Long values converted to > strings and stored in Field.Keyword type fields > would > be sorted according to Long order. The initial > query > would take a while, to build the sorted array, but > subsequent queries would take little to no time at > all. > > I went back to look at 1.4 final, and noticed the > Sort implementation has > changed quite a bit. I tried the same type of > modifications to the existing > source files, but was unable to achieve similiar > results. > Each subsequent query seems to take a significant > amount of time, as if the Sorted array is being > rebuilt each time. Also, I tried sorting on an > Integer fields and got similar results, which leads > me > to believe there might be a caching problem > somewhere. > > Has anyone else seen this in 1.4-final? Also, I > would > like it if Long sorted fields could become a part of > the API; it makes sorting by date a breeze. > > Thanks! > > Greg Gershman > > > > __ > Do you Yahoo!? > New and Improved Yahoo! Mail - Send 10MB messages! > http://promotions.yahoo.com/new_mail > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Use of Convertes or Parser
Lucene cannot parse those document formats that you mentioned. You need 3rd party parsers to do that. For example, POI will parse Excel and MS Word docs, PDFBox will parse PDF. Otis --- "Natarajan.T" <[EMAIL PROTECTED]> wrote: > Hi Guys, > > I have a small query, ie. Lucene 1.4 APIs directly indexing all the > documents(PPT,PDF,WORD,etc.) then why we go for Converters or > Parsers. > > > Thanks, > Natarajan. > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Weighting database fields
On Jul 21, 2004, at 11:40 AM, Anson Lau wrote: Is there any benefit to set the boost during indexing rather than set it during query? It allows setting each document differently. For example, TheServerSide is using field-level boosts at index time to control ordering by date, such that newer articles come up first. This could not be done at query time since each document gets a different field boost. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Extracting Lucene onto Tomcat
On Jul 21, 2004, at 11:19 AM, Ian McDonnell wrote: No sorry i didnt mean that i was trying to extract the jars at all. I meant the extraction of the original lucene source bundle. I have been developing in java for going on 5 years now, but am relatively new to Web Apps. I have some experience in TomCat from days as an undergrad and do understand that perhaps the questions i'm asking wernt exactly tech questions relating to Lucene, but rather Tomcat related enquiries. I think the reason i was struggling was that i havn't been able to locate the lucene war files as they dont seem to have come as part of the latest source drops. The webapp is built from the source code, not included directly as a WAR. If you download the 1.4 binary distribution, luceneweb.war is pre-built at the top-level. If you grab the source release for 1.4, use Ant with the war-demo target: % ant war-demo Buildfile: build.xml . . . compile-demo: [mkdir] Created dir: /Users/erik/Desktop/Downloads/lucene-1.4-final/build/classes/demo [javac] Compiling 17 source files to /Users/erik/Desktop/Downloads/lucene-1.4-final/build/classes/demo jar-demo: [jar] Building jar: /Users/erik/Desktop/Downloads/lucene-1.4-final/build/lucene-demos-1.5- rc1-dev.jar war-demo: [war] Building war: /Users/erik/Desktop/Downloads/lucene-1.4-final/build/luceneweb.war BUILD SUCCESSFUL - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Weighting database fields
Erik, Is there any benefit to set the boost during indexing rather than set it during query? I usually set it when doing a query because you can change that boost values easily without having to re-index. Thanks, ANson -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, July 22, 2004 12:52 AM To: Lucene Users List Subject: Re: Weighting database fields On Jul 21, 2004, at 10:09 AM, Anson Lau wrote: > Apply boost factor to fields when you do a lucene search. Or... set the boost on the Field during indexing. Erik > > Anson > > -Original Message- > From: John Patterson [mailto:[EMAIL PROTECTED] > Sent: Thursday, July 22, 2004 12:07 AM > To: [EMAIL PROTECTED] > Subject: Weighting database fields > > Hi, > > What is the best way to get Lucene to assign weightings to certain > fields > from a database? For example, the 'name' field should be weighted > higher > than the 'description' field. > > Thanks, > > John. > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Weighting database fields
Thanks, that was what I was after! - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, July 21, 2004 9:52 PM Subject: Re: Weighting database fields > On Jul 21, 2004, at 10:09 AM, Anson Lau wrote: > > Apply boost factor to fields when you do a lucene search. > > Or... set the boost on the Field during indexing. > > Erik > > > > > > Anson > > > > -Original Message- > > From: John Patterson [mailto:[EMAIL PROTECTED] > > Sent: Thursday, July 22, 2004 12:07 AM > > To: [EMAIL PROTECTED] > > Subject: Weighting database fields > > > > Hi, > > > > What is the best way to get Lucene to assign weightings to certain > > fields > > from a database? For example, the 'name' field should be weighted > > higher > > than the 'description' field. > > > > Thanks, > > > > John. > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Extracting Lucene onto Tomcat
No sorry i didnt mean that i was trying to extract the jars at all. I meant the extraction of the original lucene source bundle. I have been developing in java for going on 5 years now, but am relatively new to Web Apps. I have some experience in TomCat from days as an undergrad and do understand that perhaps the questions i'm asking wernt exactly tech questions relating to Lucene, but rather Tomcat related enquiries. I think the reason i was struggling was that i havn't been able to locate the lucene war files as they dont seem to have come as part of the latest source drops. Thx for the advice and hopefully you can help me out when i'm further into the development process. Ian --- Erik Hatcher <[EMAIL PROTECTED]> wrote: There is no need to extract Lucene's JAR file. Your questions indicate that you have some Tomcat and Java web application learning to do and this forum is not the most appropriate place to ask. Lucene includes a web application demo that you could try deploying by following the steps here: http://jakarta.apache.org/lucene/docs/demo3.html Just drop luceneweb.jar into CATALINA_HOME/webapps, restart Tomcat and try hitting http://localhost:8080/luceneweb and pressing the search button - you will get an error unless you've followed all the steps, but you should not get a class cast exception and Lucene will be working properly (now follow the steps to build an index and configure the pointer to it). Erik On Jul 21, 2004, at 9:43 AM, Ian McDonnell wrote: > Well when i extracted it, it created the org/apache/lucene directories > in the public_html directory. When i try to compile any of the source > it just throws numerous errors. I've got the classpath set to > web-inf/classes. > > Have i extraced it to the wrong directory? > > > --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: >> Is the package information and import paths ready to deploy on Tomcat >> server. I tried extracting lucene on the server, but when i compile >> files, it just throws numerous no class definition errors and errors >> relating to the package. > > Huh? Lucene certainly deploys just fine in Tomcat web applications (in > a WAR under WEB-INF/lib). Could you elaborate on what you mean here? > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > _ > Sign up for FREE email from SpinnersCity Online Dance Magazine & > Vortal at http://www.spinnerscity.com > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Extracting Lucene onto Tomcat
Hi Ian, You don't extract war files, or jar files. To deploy a web application that comes as a war file, you just have to drop it into webserver/servlet engine. So just: copy lucene.war /webapps. That's it. I advice you to read some of the documentation on the Tomcat website on deploying webapplications, or if you're really serious buy this book: http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471446629.html regards, Michael Ian McDonnell wrote: I was looking at your instructions there, but couldnt really figure out what you mean. Can i manually add the extracted directories onto the tomcat server, if so what should my root directory be? Say for example the extracted directories org/apache/lucene/ Should i have that as public_html/WEB-INF/org/apache/lucene? Ian --- Zilverline info <[EMAIL PROTECTED]> wrote: Hi Ian, Depending on what you want to do, you could also follow the installation instructions on http://www.zilverline.org. It describes how to install zilverline, but the same goes for the lucene war. Hope this helps, Michael Franken Ian McDonnell wrote: Also another silly question, do i need to setup a war on the server? --- Ian McDonnell <[EMAIL PROTECTED]> wrote: Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got the classpath set to web-inf/classes. Have i extraced it to the wrong directory? --- Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: Is the package information and import paths ready to deploy on Tomcat server. I tried extracting lucene on the server, but when i compile files, it just throws numerous no class definition errors and errors relating to the package. Huh? Lucene certainly deploys just fine in Tomcat web applications (in a WAR under WEB-INF/lib). Could you elaborate on what you mean here? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Extracting Lucene onto Tomcat
There is no need to extract Lucene's JAR file. Your questions indicate that you have some Tomcat and Java web application learning to do and this forum is not the most appropriate place to ask. Lucene includes a web application demo that you could try deploying by following the steps here: http://jakarta.apache.org/lucene/docs/demo3.html Just drop luceneweb.jar into CATALINA_HOME/webapps, restart Tomcat and try hitting http://localhost:8080/luceneweb and pressing the search button - you will get an error unless you've followed all the steps, but you should not get a class cast exception and Lucene will be working properly (now follow the steps to build an index and configure the pointer to it). Erik On Jul 21, 2004, at 9:43 AM, Ian McDonnell wrote: Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got the classpath set to web-inf/classes. Have i extraced it to the wrong directory? --- Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: Is the package information and import paths ready to deploy on Tomcat server. I tried extracting lucene on the server, but when i compile files, it just throws numerous no class definition errors and errors relating to the package. Huh? Lucene certainly deploys just fine in Tomcat web applications (in a WAR under WEB-INF/lib). Could you elaborate on what you mean here? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Weighting database fields
On Jul 21, 2004, at 10:09 AM, Anson Lau wrote: Apply boost factor to fields when you do a lucene search. Or... set the boost on the Field during indexing. Erik Anson -Original Message- From: John Patterson [mailto:[EMAIL PROTECTED] Sent: Thursday, July 22, 2004 12:07 AM To: [EMAIL PROTECTED] Subject: Weighting database fields Hi, What is the best way to get Lucene to assign weightings to certain fields from a database? For example, the 'name' field should be weighted higher than the 'description' field. Thanks, John. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Extracting Lucene onto Tomcat
I was looking at your instructions there, but couldnt really figure out what you mean. Can i manually add the extracted directories onto the tomcat server, if so what should my root directory be? Say for example the extracted directories org/apache/lucene/ Should i have that as public_html/WEB-INF/org/apache/lucene? Ian --- Zilverline info <[EMAIL PROTECTED]> wrote: Hi Ian, Depending on what you want to do, you could also follow the installation instructions on http://www.zilverline.org. It describes how to install zilverline, but the same goes for the lucene war. Hope this helps, Michael Franken Ian McDonnell wrote: >Also another silly question, do i need to setup a war on the server? > > >--- Ian McDonnell <[EMAIL PROTECTED]> wrote: >Well when i extracted it, it created the org/apache/lucene directories in the >public_html directory. When i try to compile any of the source it just throws >numerous errors. I've got the classpath set to web-inf/classes. > >Have i extraced it to the wrong directory? > > >--- Erik Hatcher <[EMAIL PROTECTED]> wrote: >On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: > > >>Is the package information and import paths ready to deploy on Tomcat >>server. I tried extracting lucene on the server, but when i compile >>files, it just throws numerous no class definition errors and errors >>relating to the package. >> >> > >Huh? Lucene certainly deploys just fine in Tomcat web applications (in >a WAR under WEB-INF/lib). Could you elaborate on what you mean here? > > Erik > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > >_ >Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at >http://www.spinnerscity.com > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > >_ >Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at >http://www.spinnerscity.com > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Use of Convertes or Parser
Hi Guys, I have a small query, ie. Lucene 1.4 APIs directly indexing all the documents(PPT,PDF,WORD,etc.) then why we go for Converters or Parsers. Thanks, Natarajan.
Re: Extracting Lucene onto Tomcat
Hi Ian, Depending on what you want to do, you could also follow the installation instructions on http://www.zilverline.org. It describes how to install zilverline, but the same goes for the lucene war. Hope this helps, Michael Franken Ian McDonnell wrote: Also another silly question, do i need to setup a war on the server? --- Ian McDonnell <[EMAIL PROTECTED]> wrote: Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got the classpath set to web-inf/classes. Have i extraced it to the wrong directory? --- Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: Is the package information and import paths ready to deploy on Tomcat server. I tried extracting lucene on the server, but when i compile files, it just throws numerous no class definition errors and errors relating to the package. Huh? Lucene certainly deploys just fine in Tomcat web applications (in a WAR under WEB-INF/lib). Could you elaborate on what you mean here? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Extracting Lucene onto Tomcat
Also another silly question, do i need to setup a war on the server? --- Ian McDonnell <[EMAIL PROTECTED]> wrote: Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got the classpath set to web-inf/classes. Have i extraced it to the wrong directory? --- Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: > Is the package information and import paths ready to deploy on Tomcat > server. I tried extracting lucene on the server, but when i compile > files, it just throws numerous no class definition errors and errors > relating to the package. Huh? Lucene certainly deploys just fine in Tomcat web applications (in a WAR under WEB-INF/lib). Could you elaborate on what you mean here? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Weighting database fields
Apply boost factor to fields when you do a lucene search. Anson -Original Message- From: John Patterson [mailto:[EMAIL PROTECTED] Sent: Thursday, July 22, 2004 12:07 AM To: [EMAIL PROTECTED] Subject: Weighting database fields Hi, What is the best way to get Lucene to assign weightings to certain fields from a database? For example, the 'name' field should be weighted higher than the 'description' field. Thanks, John. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Weighting database fields
Hi, What is the best way to get Lucene to assign weightings to certain fields from a database? For example, the 'name' field should be weighted higher than the 'description' field. Thanks, John. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Extracting Lucene onto Tomcat
Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got the classpath set to web-inf/classes. Have i extraced it to the wrong directory? --- Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: > Is the package information and import paths ready to deploy on Tomcat > server. I tried extracting lucene on the server, but when i compile > files, it just throws numerous no class definition errors and errors > relating to the package. Huh? Lucene certainly deploys just fine in Tomcat web applications (in a WAR under WEB-INF/lib). Could you elaborate on what you mean here? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sort: 1.4-rc3 vs. 1.4-final
Since I had to implement sorting in lucene 1.2 I had to write my own sorting using something similar to a lucene's contribution called SortField. Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I realized that my old implementation works 40% faster then Lucene's implementation. My guess is that you are right and there is a problem with the cache although I couldn't find what that is yet. Aviran -Original Message- From: Greg Gershman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 9:22 AM To: [EMAIL PROTECTED] Subject: Sort: 1.4-rc3 vs. 1.4-final When rc3 came out, I modified the classes used for Sorting to, in addition to Integer, Float and String-based sort keys, use Long values. All I did was add extra statements in 2 classes (SortField and FieldSortedHitQueue) that made a special case for longs, and created a LongSortedHitQueue identical to the IntegerSortedHitQueue, only using longs. This worked as expected; Long values converted to strings and stored in Field.Keyword type fields would be sorted according to Long order. The initial query would take a while, to build the sorted array, but subsequent queries would take little to no time at all. I went back to look at 1.4 final, and noticed the Sort implementation has changed quite a bit. I tried the same type of modifications to the existing source files, but was unable to achieve similiar results. Each subsequent query seems to take a significant amount of time, as if the Sorted array is being rebuilt each time. Also, I tried sorting on an Integer fields and got similar results, which leads me to believe there might be a caching problem somewhere. Has anyone else seen this in 1.4-final? Also, I would like it if Long sorted fields could become a part of the API; it makes sorting by date a breeze. Thanks! Greg Gershman __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Sort: 1.4-rc3 vs. 1.4-final
When rc3 came out, I modified the classes used for Sorting to, in addition to Integer, Float and String-based sort keys, use Long values. All I did was add extra statements in 2 classes (SortField and FieldSortedHitQueue) that made a special case for longs, and created a LongSortedHitQueue identical to the IntegerSortedHitQueue, only using longs. This worked as expected; Long values converted to strings and stored in Field.Keyword type fields would be sorted according to Long order. The initial query would take a while, to build the sorted array, but subsequent queries would take little to no time at all. I went back to look at 1.4 final, and noticed the Sort implementation has changed quite a bit. I tried the same type of modifications to the existing source files, but was unable to achieve similiar results. Each subsequent query seems to take a significant amount of time, as if the Sorted array is being rebuilt each time. Also, I tried sorting on an Integer fields and got similar results, which leads me to believe there might be a caching problem somewhere. Has anyone else seen this in 1.4-final? Also, I would like it if Long sorted fields could become a part of the API; it makes sorting by date a breeze. Thanks! Greg Gershman __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sorting on tokenized fields
You can create a new field which contains the full untokened string and use it as a sort field. -Original Message- From: Florian Sauvin [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 20, 2004 20:13 PM To: Lucene Users List Subject: Sorting on tokenized fields I see in the Javadoc that it is only possible to sort on fields that are not tokenized, I have two questions about that: 1) What happens if the field is tokenized, is sorting done anyway, using the first term only? 2) Is there a way to do some sorting anyway, by concatenating all the tokens into one string? -- Florian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Extracting Lucene onto Tomcat
On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: Is the package information and import paths ready to deploy on Tomcat server. I tried extracting lucene on the server, but when i compile files, it just throws numerous no class definition errors and errors relating to the package. Huh? Lucene certainly deploys just fine in Tomcat web applications (in a WAR under WEB-INF/lib). Could you elaborate on what you mean here? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: speeding up lucene search
Has anyone tried splitting up an index into smaller chunks, without putting the different indicies on a different physical disk/box? What sort of performance gain do you get from it? Anson -Original Message- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 7:43 PM To: Lucene Users List Subject: Re: speeding up lucene search In general, yes. By splitting up a large index into smaller indicies, you are linearizing the search time. Furthermore, that allows you to make your search distributable. -John On Wed, 21 Jul 2004 13:00:28 +1000, Anson Lau <[EMAIL PROTECTED]> wrote: > Hello guys, > > What are some general techniques to make lucene search faster? > > I'm thinking about splitting up the index. My current index has approx 1.8 > million documents (small documents) and index size is about 550MB. Am I > likely to get much gain out of splitting it up and use a > multiparallelsearcher? > > Most of my search queries search queries search on 5-10 fields. > > Are there other things I should look at? > > Thanks to all, > Anson > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Extracting Lucene onto Tomcat
Is the package information and import paths ready to deploy on Tomcat server. I tried extracting lucene on the server, but when i compile files, it just throws numerous no class definition errors and errors relating to the package. Ian _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene vs. MySQL Full-Text
Interestingly (and ironically) enough, the project I'm currently working on requires full-text searching of Word and PDF resumes. SQL Server is already the required database as well, so we are leveraging the full-text indexing capabilities it has. There is a special trick to drop a BLOB into a table which also has a file extension and mime type columns, and have SQL Server index it with its Index Server capabilities. Lucene was not needed, and we made the pragmatic (simplest that worked well) choice. My recommendation would be to implement something rather than debate it - and if it is good enough, leave it alone, if not then try a different approach :) Erik On Jul 21, 2004, at 7:29 AM, Anson Lau wrote: Depending on what MySQL Full-text search support you probably will lose some of the advance things you get for free from Lucene, such as proximity search, wildcard search, search term and search field boosting, scoring of the documents, etc. Afterall it depends on what you need to do. In our dev team we are actually currently having a mini debate over whether to use lucene for our project or write something from scratch that's based on a DB. We need really good performance. I feel lucene can do our job very well, some of our guys feel using a DB based search can give us greater performance on the type of search we do. Anson -Original Message- From: Florian Sauvin [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 8:55 AM To: Lucene Users List Subject: Re: Lucene vs. MySQL Full-Text On Jul 20, 2004, at 12:29 PM, Tim Brennan wrote: Someone came into my office today and asked me about the project I am trying to Lucene for -- "why aren't you just using a MySQL full-text index to do that" -- after thinking about it for a few minutes, I realized I don't have a great answer. MySQL builds inverted indexes for (in theory) doing the same type of lookup that lucene does. You'd maybe have to build some kind of a layer on the front to mimic Lucene's analyzers, but that wouldn't be too hard My only experience with MySQLfulltext is trivial test apps -- but the MySQL world does have some significant advantages (its a known quantity from an operations perspective, etc). Does anyone out there have anything more concrete they can add? --tim I'd say that MySQL full text is much slower if you have a lot of data... that is one of the reasons we started using lucene (We had a mysql db to do the search), it's way faster! -- Florian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene vs. MySQL Full-Text
Depending on what MySQL Full-text search support you probably will lose some of the advance things you get for free from Lucene, such as proximity search, wildcard search, search term and search field boosting, scoring of the documents, etc. Afterall it depends on what you need to do. In our dev team we are actually currently having a mini debate over whether to use lucene for our project or write something from scratch that's based on a DB. We need really good performance. I feel lucene can do our job very well, some of our guys feel using a DB based search can give us greater performance on the type of search we do. Anson -Original Message- From: Florian Sauvin [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 8:55 AM To: Lucene Users List Subject: Re: Lucene vs. MySQL Full-Text On Jul 20, 2004, at 12:29 PM, Tim Brennan wrote: > Someone came into my office today and asked me about the project I am > trying to Lucene for -- "why aren't you just using a MySQL full-text > index to do that" -- after thinking about it for a few minutes, I > realized I don't have a great answer. > > MySQL builds inverted indexes for (in theory) doing the same type of > lookup that lucene does. You'd maybe have to build some kind of a > layer > on the front to mimic Lucene's analyzers, but that wouldn't be too > hard > > My only experience with MySQLfulltext is trivial test apps -- but the > MySQL world does have some significant advantages (its a known quantity > from an operations perspective, etc). Does anyone out there have > anything more concrete they can add? > > --tim > > I'd say that MySQL full text is much slower if you have a lot of data... that is one of the reasons we started using lucene (We had a mysql db to do the search), it's way faster! -- Florian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Can I retrieve token offsets from Hits?
On Jul 21, 2004, at 6:59 AM, Stepan Mik wrote: It is possible to retrieve tokens offsets (Token.startOffset(), Token.endOffset()) later when document is found and returned in hit collection? No offsets are not stored in the index. In fact, the only place they are currently used is with the Highlighter code. I need these values for hihglighting. I've already looked to Highlighter in sandbox but it actually re-analyzes the original document's field. However, this is not preffered way when using complicated (performance demanding) analyzer. So my question is - it is possible to store (somehow) token offsets and get them later without reanalizing the document? There has been lots of discussion on this topic in the past. Perhaps you could dig up those threads to get a feel for what the latest thinking on this is. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Can I retrieve token offsets from Hits?
Hi, It is possible to retrieve tokens offsets (Token.startOffset(), Token.endOffset()) later when document is found and returned in hit collection? I need these values for hihglighting. I've already looked to Highlighter in sandbox but it actually re-analyzes the original document's field. However, this is not preffered way when using complicated (performance demanding) analyzer. So my question is - it is possible to store (somehow) token offsets and get them later without reanalizing the document? Thanks Stepan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: speeding up lucene search
In general, yes. By splitting up a large index into smaller indicies, you are linearizing the search time. Furthermore, that allows you to make your search distributable. -John On Wed, 21 Jul 2004 13:00:28 +1000, Anson Lau <[EMAIL PROTECTED]> wrote: > Hello guys, > > What are some general techniques to make lucene search faster? > > I'm thinking about splitting up the index. My current index has approx 1.8 > million documents (small documents) and index size is about 550MB. Am I > likely to get much gain out of splitting it up and use a > multiparallelsearcher? > > Most of my search queries search queries search on 5-10 fields. > > Are there other things I should look at? > > Thanks to all, > Anson > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene cutomized indexing
Hi Eric and Grant: Thanks for the replies and this is certainly encouraging. As suggested, I will post furthere such discussions to the dev list. Thanks -John On Tue, 20 Jul 2004 15:37:35 -0400, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > It seems to me the answer to this is not necessarily to open up the API, but to > provide a mechanism for adding Writers and Readers to the indexing/searching process > at the application level. These readers and writers could be passed to Lucene and > used to read and write to separate files (thus, not harming the index file format). > They could be used to read/write an arbitrary amount of metadata at the term, > document and/or index level w/o affecting the core Lucene index. Furthermore, > previous versions could still work b/c they would just ignore the new files and the > indexes could be used by other applications as well. > > This is just a thought in the infancy stage, but it seems like it would solve the > problem. Of course, the trick is figuring out how it fits into the API (or maybe it > becomes a part of 2.0). Not sure if it is even feasible, but it seems like you > could define interfaces for Readers and Writers that met the requirements to do this. > > This may be better discussed on the dev list. > > >>> [EMAIL PROTECTED] 07/20/04 11:28AM >>> > > > Hi: > I am trying to store some Databased like field values into lucene. > I have my own way of storing field values in a customized format. > > I guess my question is wheather we can make the Reader/Writer > classes, e.g. FieldReader, FieldWriter, DocumentReader/Writer classes > non-final? > > I have asked to make the Lucene API less restrictive many many many > times but got no replies. Is this request feasible? > > Thanks > > -John > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]