SpanNotQuery.hashCode cut/paste error?
SpanNodeQuery's hashCode method makes two refrences to include.hashCode(), but none to exclude.hashCode() ... this is a mistake yes/no? -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-569) NearSpans skipTo bug
[ http://issues.apache.org/jira/browse/LUCENE-569?page=comments#action_12411904 ] paul.elschot commented on LUCENE-569: - > I tried to make sense of the existing NearSpans implimentation over the > weekend ... i did not succeed. > I still haven't had a cahnce to look at the new one in LUCENE-413 but i wnat > to clarify something you said.. For the unordered case the priority queue implementation over the subspans in the current NearSpans is fine. For the ordered case I could not figure out how to deal with the priority queue and the restriction on ordering at the same time. This is precisely what the bug above shows. > >>> The NearSpansOrdered there differs from the current version in that it > >>> does not > >>> match overlapping subspans, but it passes all current test cases > >>> including TestNearSpans here. > > ...should I understand you to mean then that the current implimentaion of > NearSpans does work > correctly with overlapping sub-spans ... there just isnt' a test for it? For ordered queries, it might work with overlapping sub-spans on some cases. However, I'd expect any test to run into the bug above for some other ordered cases. > that seems like important enough behavior that we wouldn't want to break it > to fix this bug. Given the bug, I hope nothing depends on it. > Even if matching on overlapping subspans wasn't an intentional feature of > NearSpans -- the fact that it > currently works and the documentation is silent on the issue suggests to me > that it should remain supported. That can probably be done by modifying the NearSpansOrdered of LUCENE-413 at lines 133-138 and at line 167 where the end of the previous (possibly matching) subspans is compared to the start of the next one. This could compare the start with the start instead. I don't know what precisely is the intended behaviour, so I can't say whether these changed comparisons should allow equality or not. Perhaps the ends should be compared when the starts are equal, just like it is done in the priority queue for the unordered case. > NearSpans skipTo bug > > > Key: LUCENE-569 > URL: http://issues.apache.org/jira/browse/LUCENE-569 > Project: Lucene - Java > Type: Bug > Components: Search > Reporter: Hoss Man > Attachments: TestNearSpans.java > > NearSpans appears to have a bug in skipTo that causes it to skip over some > matching documents completely. I discovered this bug while investigating > problems with SpanWeight.explain, but as far as I can tell the Bug is not > specific to Explanations ... it seems like it could potentially result in > incorrect matching in some situations where a SpanNearQuery is nested in > another query such thatskipTo will be used ... I tried to create a high level > test case to exploit the bug when searching, but i could not. TestCase > exploiting the class using NearSpan and SpanScorer will follow... -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: OpenBitSet
>Weird... I'm not sure how that could be. Are you sure you didn't get >the numbers reversed? that is exactly what happend, sorry for wrong numbers, now it looks as it should: java -version Java(TM) SE Runtime Environment (build 1.6.0-beta2-b83) Java HotSpot(TM) Client VM (build 1.6.0-beta2-b83, mixed mode, sharing) java -server -Xbatch org.apache.solr.util.BitSetPerf 100 50 1 union 3000 bit ret=0 TIME=21966 java -server -Xbatch org.apache.solr.util.BitSetPerf 100 50 1 union 3000 open ret=0 TIME=19832 I measured also on different densities, and it looks about the same. When I find a few spare minutes will make one PerfTest that generates gnuplot diagrams. Wold be interesting to see how all key methods behave as a function of density/size. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Jira Convention: Resolved vs Closed
I've historically treated Closed and Resolved as the same thing and have closed resolved issues just to set them to that state. Erik On May 15, 2006, at 9:24 PM, Chris Hostetter wrote: Is there a documented or unspoken policy about the "Resolved" vs "Closed" bug statuses? How/when should a resolved bug be closed? (In my experience policy has tended towards the person fixing the bug to "resolve" it, and the person who opened the bug to "close" once they're verified the fix -- but that's not really possible with the way the Lucene Jira project is setup, since anyone can open a bug, but only developers can close them) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SpanNotQuery.hashCode cut/paste error?
Yes, this is a mistake. I'm happy to fix it, but looks like you have other patches in progress. Erik On May 16, 2006, at 3:33 AM, Chris Hostetter wrote: SpanNodeQuery's hashCode method makes two refrences to include.hashCode(), but none to exclude.hashCode() ... this is a mistake yes/no? -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Nio File Caching & Performance Test
My tests still hold that the NioFile I submitted is significantly faster than the standard FSDirectory. BUT, the memory mapped implementation is significantly faster than NioFile. I attribute this to the overhead of managing the soft references, and possible GC interaction. SO, I would like to use a memory mapped reader, but I encounter OOM errors when mapping large files, due to running out of address space. Has anyone found a solution for this? (A 2 gig index is not all that large...). -Original Message- From: Murat Yakici [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 1:55 AM To: java-dev@lucene.apache.org Subject: Re: Nio File Caching & Performance Test Hi, According to my humble tests, there is no significant improvement either. NIO has buffer creation time costs compared to other Buffered IOs. However, a testbed would be ideal for benchmarks. Murat Doug Cutting wrote: > Robert Engels wrote: > >> The most important statistic is that the reading via the local cache, vs. >> going to the OS (where the block is cached) is 3x faster (22344 vs. >> 68578). >> With random reads, when the block may not be in the OS cache, it is 8x >> faster (72766 vs. 586391). > > [ ... ] > >> This test only demonstrates improvements in the low-level IO layer, >> but one >> could infer significant performance improvements for common searches >> and/or >> document retrievals. > > > That is not an inference I would make. There should be some > improvement, but whether it is significant is not clear to me. > >> Is there a standard Lucene search performance I could run both with and >> without the NioFSDirectory to demonstrate real world performance >> improvements? I have some internal tests that I am collating, but I would >> rather use a standard test if possible. > > > No, we don't have a standard benchmark suite. Folks have talked about > developing one, but I don't think one yet exists. > > Report what you have. Describe the collection, how it is indexed, how > you've selected queries, and the improvement in average response time. > > Doug > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Nio File Caching & Performance Test
Robert Engels wrote: SO, I would like to use a memory mapped reader, but I encounter OOM errors when mapping large files, due to running out of address space. Has anyone found a solution for this? (A 2 gig index is not all that large...). A 64-bit hardware, OS and JVM solves this nicely. On 32-bit systems it is hard for the OS to allocate the large, contiguous regions of address space required to memory map a 2GB index. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Nio File Caching & Performance Test
On 5/16/06, Robert Engels <[EMAIL PROTECTED]> wrote: SO, I would like to use a memory mapped reader, but I encounter OOM errors when mapping large files, due to running out of address space. Pretty much all x86 servers sold are 64 bit capable now. Run a 64 bit OS if you can :-) Has anyone found a solution for this? (A 2 gig index is not all that large...). I guess one could try a hybrid approach... only mmap certain index files that are critical for performance. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Jira Convention: Resolved vs Closed
Chris Hostetter wrote: How/when should a resolved bug be closed? I close bugs after their "fix version" is released. The distinction between "resolved" and "closed" is intended for projects with a formal QA process. An engineer fixes a bug and marks it "resolved", and then a tester verifies the test and either closes it or re-opens it if it has not been fixed. In our case, we're all testers. Released, closed issues should generally not be re-opened. If there are further problems related to an issue after a release is made and the issue has been closed, then a new issue should be created. Why? If a project has a Jira issues for every commit, and includes the issue name in the commit message, then Jira's change log fully can fully document the release, including links to subversion diffs, etc. But re-opening a closed bug messes this up. It's better to add a new bug that links to the old, closed bug. We're trying to operate this way on Hadoop. Issues are entered for most planned changes and assigned a "fix release". Then Jira's "road map" feature can be used to see what features are planned for various upcoming releases. This isn't perfect, since issues dropped for one release are pushed to the next, and the list of issues per release becomes unrealistically large (at least for the monthly release schedule we're on). But on Hadoop we currently have dedicated resources who can be assigned bugs and will work hard to fix them by a release date. I'm not sure whether this would work on Lucene, which currently lacks such dedicated resources, but it might be interesting to try. (In my experience policy has tended towards the person fixing the bug to "resolve" it, and the person who opened the bug to "close" once they're verified the fix -- but that's not really possible with the way the Lucene Jira project is setup, since anyone can open a bug, but only developers can close them) Note that I think it's okay to add folks the the "lucene-developers" Jira group who are not Lucene committers. Some folks are very involved with Lucene, but don't submit so many patches that they need to be a committer. For such people it can make sense to have them able to help manage Jira issues. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Phrase IDF and collection frequency !
Hi, Are there any ideas on how to compute the "document frequency" and "collection frequency" of phrases? Document frequency is the number of documents containing the phrase. Collection frequency is the frequency of the phrase in the whole collection. Thanks in advance for any help Samir - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
FieldsReader synchronized access vs. ThreadLocal ?
In SegmentReader, currently the access to FieldsReader.doc(n) is synchronized (which is must be). Does it not make sense to use a ThreadLocal implementation similar to the TermInfosReader? It seems that in a highly multi-threaded server this synchronized method could lead to significant blocking when the documents are being retrieved?
Re: OpenBitSet
: I measured also on different densities, and it looks about the same. : When I find a few spare minutes will make one PerfTest that generates : gnuplot diagrams. Wold be interesting to see how all key methods behave : as a function of density/size. I was thinking the same thing ... i just haven't had time to play with it. It migh also be usefull to check how the distribution of the set bits affects things -- i suspect that for some "Filters" there some amount of clustering as many people index their documents in a particular order, and then filter on ranges of that order (ie: index documents as they are created, and then filtering on create date) ... using Random.nextGaussian() to pick which bets to set might be interesting. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
query question
I am not sure if it is a question, could anybody tell me if the query syntax can do the select...from..where job as in traditional database? I have checked Lucene query syntax, but seems a little bit not too complex as SQL...correct me if wrong, or there is no such requirement for searching engine? best, Dedian
Re: OpenBitSet
Yeah, good hint. We actually made such measurements on TreeIntegerSet implementation, and it is totally astonishing what you get as a result (I remember 6Meg against 2k Memory consumption for "predominantly sorted bit vectors" like zip codes, conjuction/disjunct speed oreder of magnitude faster as it walks shallow tree in that case). If you have any posibility to sort your indexes, do so, even Lucene on disk representation appreciates this I guess (skips are faster, bit vectors on disk better compressed/decompresed?) We even made one small visualizer of bit vectors that visualizes (generates image) HitCollector results for any specified query (gray image where every pixel represents 8-32 succesive bits from bit vector higher density=>darker color ). I like to see the enemy first. When we are allready in this area, just a curiosity, friend of mine has one head spinning idea, to utilize graphics card HW to do super fast bit vector operations. These thingies today are really optimized for basic bit operations. I am just curious to see what he comes up with. I hope I will have some time next week or so to polish some tests for OpenBitSet a bit and drop it somewhere on Jira if anybody has interest to play with. A bit off topic, is there anybody who is doing ChainedFilter version that uses docNrSkipper? As I recall, you wrote BitSet version :) - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org; eks dev <[EMAIL PROTECTED]> Sent: Tuesday, 16 May, 2006 8:13:53 PM Subject: Re: OpenBitSet : I measured also on different densities, and it looks about the same. : When I find a few spare minutes will make one PerfTest that generates : gnuplot diagrams. Wold be interesting to see how all key methods behave : as a function of density/size. I was thinking the same thing ... i just haven't had time to play with it. It migh also be usefull to check how the distribution of the set bits affects things -- i suspect that for some "Filters" there some amount of clustering as many people index their documents in a particular order, and then filter on ranges of that order (ie: index documents as they are created, and then filtering on create date) ... using Random.nextGaussian() to pick which bets to set might be interesting. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Nio File Caching & Performance Test
Hi Robert, I might be easily wrong, but I beleive I saw something on JIRA (or was it bugzilla?) a long long time ago, where somebody made MMAP implementation for really big indexes that works on 32 bit. I guess it is worth checking it. - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org; [EMAIL PROTECTED] Sent: Tuesday, 16 May, 2006 6:10:07 PM Subject: Re: Nio File Caching & Performance Test On 5/16/06, Robert Engels <[EMAIL PROTECTED]> wrote: > SO, I would like to use a memory mapped reader, but I encounter OOM errors > when mapping large files, due to running out of address space. Pretty much all x86 servers sold are 64 bit capable now. Run a 64 bit OS if you can :-) > Has anyone found a solution for this? (A 2 gig index is not all that > large...). I guess one could try a hybrid approach... only mmap certain index files that are critical for performance. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
java.lang.IndexOutOfBoundsException when querying Lucene
Hi! I am having quite a complex query that gets executed against the JCR content (that used Lucene for indexing/searching). From time to time I am seeing this exception: [trace] java.lang.IndexOutOfBoundsException: Index: 99, Size: 27 at java.util.ArrayList.RangeCheck(ArrayList.java:546) at java.util.ArrayList.get(ArrayList.java:321) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237) at org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:103) at org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:103) at org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:103) at org.apache.lucene.index.MultiReader.document(MultiReader.java:108) at org.apache.jackrabbit.core.query.lucene.ChildAxisQuery$ChildAxisScorer.calculateChildren(ChildAxisQuery.java:308) at org.apache.jackrabbit.core.query.lucene.ChildAxisQuery$ChildAxisScorer.next(ChildAxisQuery.java:250) at org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:87) at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:44) at org.apache.lucene.search.Scorer.score(Scorer.java:37) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:121) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) at org.apache.lucene.search.Hits.(Hits.java:51) at org.apache.lucene.search.Searcher.search(Searcher.java:41) at org.apache.jackrabbit.core.query.lucene.SearchIndex.executeQuery(SearchIndex.java:374) at org.apache.jackrabbit.core.query.lucene.QueryImpl.execute(QueryImpl.java:174) at org.apache.jackrabbit.core.query.QueryImpl.execute(QueryImpl.java:130) [/trace] I really don't have any idea why this is happening. Do you have any pointers? I would like to understand what may go wrong so that I can prevent at least in my application (that is based on Jackrabbit JCR implementation, and so on Lucene) that this occurs (or at least I can reliable catch the exception and understand what I have to do when it occurs). The ML contains a couple of reported IndexOutOfBoundsException reports but all of them are about index merging. Same on JIRA. Any help, ideas, hints are highly appreciated, thanks very much in advance, ./alex -- .w( the_mindstorm )p. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Nio File Caching & Performance Test
The MMapDirectory works for really big indexes (larger than 2 gig), BUT if the JVM does not have enough address space (32 bit JVM)it will not work. -Original Message- From: eks dev [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 2:20 PM To: java-dev@lucene.apache.org Subject: Re: Nio File Caching & Performance Test Hi Robert, I might be easily wrong, but I beleive I saw something on JIRA (or was it bugzilla?) a long long time ago, where somebody made MMAP implementation for really big indexes that works on 32 bit. I guess it is worth checking it. - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org; [EMAIL PROTECTED] Sent: Tuesday, 16 May, 2006 6:10:07 PM Subject: Re: Nio File Caching & Performance Test On 5/16/06, Robert Engels <[EMAIL PROTECTED]> wrote: > SO, I would like to use a memory mapped reader, but I encounter OOM errors > when mapping large files, due to running out of address space. Pretty much all x86 servers sold are 64 bit capable now. Run a 64 bit OS if you can :-) > Has anyone found a solution for this? (A 2 gig index is not all that > large...). I guess one could try a hybrid approach... only mmap certain index files that are critical for performance. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
non indexed field searching?
I know I've (and others have brought this up before), but maybe now with the lazy field loading (seemingly due to larger documents being stored) it is time to revisit. It seems that maybe a query could be separated into Filter and Query clauses (similar to how the query optimizer works in Nutch). Clauses that were based on non-indexed fields would be converted to a Filter. The problem is if you have some thing like (indexed:somevalue OR nonindexed:somevalue) would require a complete visit to every document. But something like (indexed:somevalue AND nonindexed:somevalue) would be very efficient. I understand that this is moving Lucene closer to a database, but it is just very difficult to perform some complex queries efficiently without it. *** As an aside, I still don't understand why Filter is not an interface interface Filter { boolean include(IndexReader reader,int doc) } and then you would have NonIndexedFilter(String fieldname,String expression) implements Filter boolean include(IndexReader reader,int doc) { Document d = reader.document(doc); String val = d.getValue(fieldname); return {evaluate expression against val}; } Filter being an interface should incur very little overhead in the common case where it was backed by a BitSet as the modern JVM will inline it.
Re: FieldsReader synchronized access vs. ThreadLocal ?
Robert Engels wrote: It seems that in a highly multi-threaded server this synchronized method could lead to significant blocking when the documents are being retrieved? Perhaps, but I'd prefer to wait for someone to demonstrate this as a performance bottleneck before adding another ThreadLocal. Peter Keegan has recently demonstrated pretty good concurrency using mmap directory on four and eight CPU systems: http://www.mail-archive.com/java-user@lucene.apache.org/msg05074.html Peter also wondered if the SegmentReader.document(int) method might be a bottleneck, and tried patching it to run unsynchronized: http://www.mail-archive.com/java-user@lucene.apache.org/msg05891.html Unfortunately that did not improve his performance: http://www.mail-archive.com/java-user@lucene.apache.org/msg06163.html Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: non indexed field searching?
On May 16, 2006, at 3:37 PM, Robert Engels wrote: It seems that maybe a query could be separated into Filter and Query clauses (similar to how the query optimizer works in Nutch). Clauses that were based on non-indexed fields would be converted to a Filter. The problem is if you have some thing like (indexed:somevalue OR nonindexed:somevalue) would require a complete visit to every document. Not necessarily. A query optimizer could could extract these term query clauses, look up cached doc sets (bit sets) and union them. Scoring is the trickier part - I'm now curious to dig into Solr and see how it handles this. I understand that this is moving Lucene closer to a database, but it is just very difficult to perform some complex queries efficiently without it. Check out Solr - I think you'll find it fits this niche nicely. *** As an aside, I still don't understand why Filter is not an interface I saw that Paul Elschot has just done some refactoring work attached to a JIRA issue on this very topic. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Phrase IDF and collection frequency !
--- ABDOU Samir <[EMAIL PROTECTED]> wrote: > Hi, > > Are there any ideas on how to compute the "document > frequency" and "collection frequency" of phrases? Tokenize your input as phrases (instead of words), and you'll get this the same way you normally get stats for single-word tokens (Terms)? I did that for bigram frequency analysis. Of course, the problem is hardly getting these stats, problem is finding what constitutes a phrase. ;-) -+ Tatu +- __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Hacking Luke for bytecount-based strings
Greets, There does not seem to be a lot of demand for one implementation of Lucene to read indexes generated by another implementation of Lucene for the purposes of indexing or searching. However, there is a demand for index browsing via Luke. It occurred to me today that if Luke were powered by a version of Lucene with my bytecount-based-strings patch applied, it would be able to read indexes generated by Ferret. Ironically, it wouldn't be able to read KinoSearch indexes unless I reverted the change which causes the term vectors to be stored in the .fdt file. I'd probably do that. Luke is great. One possibility for distributing such a beast is to offer a patched jar for download from my website. Before I start down that road, though, I thought I'd bring up the subject here. Thoughts? Marvin Humphrey Rectangular Research http://www.rectangular.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Hacking Luke for bytecount-based strings
While you're at it, why not rewrite Luke in Perl as well... Seems like a great use of your time. -Original Message- From: Marvin Humphrey [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 11:36 PM To: java-dev@lucene.apache.org Cc: Andrzej Bialecki Subject: Hacking Luke for bytecount-based strings Greets, There does not seem to be a lot of demand for one implementation of Lucene to read indexes generated by another implementation of Lucene for the purposes of indexing or searching. However, there is a demand for index browsing via Luke. It occurred to me today that if Luke were powered by a version of Lucene with my bytecount-based-strings patch applied, it would be able to read indexes generated by Ferret. Ironically, it wouldn't be able to read KinoSearch indexes unless I reverted the change which causes the term vectors to be stored in the .fdt file. I'd probably do that. Luke is great. One possibility for distributing such a beast is to offer a patched jar for download from my website. Before I start down that road, though, I thought I'd bring up the subject here. Thoughts? Marvin Humphrey Rectangular Research http://www.rectangular.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Hacking Luke for bytecount-based strings
On Wednesday 17 May 2006 06:35, Marvin Humphrey wrote: > Greets, > > There does not seem to be a lot of demand for one implementation of > Lucene to read indexes generated by another implementation of Lucene > for the purposes of indexing or searching. However, there is a > demand for index browsing via Luke. > > It occurred to me today that if Luke were powered by a version of > Lucene with my bytecount-based-strings patch applied, it would be > able to read indexes generated by Ferret. Ironically, it wouldn't be > able to read KinoSearch indexes unless I reverted the change which > causes the term vectors to be stored in the .fdt file. I'd probably > do that. Luke is great. > > One possibility for distributing such a beast is to offer a patched > jar for download from my website. Before I start down that road, > though, I thought I'd bring up the subject here. > > Thoughts? Try and invoke luke with the a lucene jar of your choice on the classpath before luke itself: java -cp lucene-core-1.9-rc1-dev.jar:lukeall.jar org.getopt.luke.Luke Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]