Re: is the classes ended with PerThread(*PerThread) multithread
Hey there, so what you are looking at are classes that are created per Thread rather than shared with other threads. Lucene internally rarely creates threads or subclasses Thread, Runnable or Callable (ParallelMultiSearcher is an exception or some of the merging code). Yet, inside the indexer when you add (update) a document Lucene utilizes the callers thread rather than spanning a new one. When you look at DocumentsWriter.java there should be a method callled getThreadState. Each indexing thread, lets say in updateDocument, gets its Thread-Private DocumentsWriterThreadState. This thread state holds a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer (see the indexing chain). DocConsumerPerThread in that case is some kind of decorator that hold other DocConsumerPerThread instances like TermsHashPerThread etc. The general pattern is for each DocConsumer you can get a DocConsumerPerThread for your indexing thread which then consumes the document you are processing right now. I hope that helps simon On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote: hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: is the classes ended with PerThread(*PerThread) multithread
hi simon thanks for replying very much. after reading the source code with your suggestion, here's my understanding, and I don't know whether it's right: the DocumentsWriter actually don't create threads, but the codes that useDocumentsWriter can do the multithreading(say, several threads call updateDocument). and each thread has its DocumentsWriterThreadState, in the mean while, each DocumentsWriterThreadState has its own objects(the *PerThread such as DocFieldProcessorPerThread, DocInverterPerThread and so on ) as the methods of DocumentsWriter are called by multiple threads, for example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4 index chains, ( each index chain has it's own *PerThread objects , to process the document). am I right?? thanks for replying again! 2010/12/28 Simon Willnauer simon.willna...@googlemail.com Hey there, so what you are looking at are classes that are created per Thread rather than shared with other threads. Lucene internally rarely creates threads or subclasses Thread, Runnable or Callable (ParallelMultiSearcher is an exception or some of the merging code). Yet, inside the indexer when you add (update) a document Lucene utilizes the callers thread rather than spanning a new one. When you look at DocumentsWriter.java there should be a method callled getThreadState. Each indexing thread, lets say in updateDocument, gets its Thread-Private DocumentsWriterThreadState. This thread state holds a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer (see the indexing chain). DocConsumerPerThread in that case is some kind of decorator that hold other DocConsumerPerThread instances like TermsHashPerThread etc. The general pattern is for each DocConsumer you can get a DocConsumerPerThread for your indexing thread which then consumes the document you are processing right now. I hope that helps simon On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote: hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: is the classes ended with PerThread(*PerThread) multithread
On Tue, Dec 28, 2010 at 10:57 AM, xu cheng xcheng@gmail.com wrote: hi simon thanks for replying very much. after reading the source code with your suggestion, here's my understanding, and I don't know whether it's right: the DocumentsWriter actually don't create threads, but the codes that use DocumentsWriter can do the multithreading(say, several threads call updateDocument). and each thread has its DocumentsWriterThreadState, in the mean while, each DocumentsWriterThreadState has its own objects(the *PerThread such as DocFieldProcessorPerThread, DocInverterPerThread and so on ) as the methods of DocumentsWriter are called by multiple threads, for example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4 index chains, ( each index chain has it's own *PerThread objects , to process the document). am I right?? that sounds about right simon thanks for replying again! 2010/12/28 Simon Willnauer simon.willna...@googlemail.com Hey there, so what you are looking at are classes that are created per Thread rather than shared with other threads. Lucene internally rarely creates threads or subclasses Thread, Runnable or Callable (ParallelMultiSearcher is an exception or some of the merging code). Yet, inside the indexer when you add (update) a document Lucene utilizes the callers thread rather than spanning a new one. When you look at DocumentsWriter.java there should be a method callled getThreadState. Each indexing thread, lets say in updateDocument, gets its Thread-Private DocumentsWriterThreadState. This thread state holds a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer (see the indexing chain). DocConsumerPerThread in that case is some kind of decorator that hold other DocConsumerPerThread instances like TermsHashPerThread etc. The general pattern is for each DocConsumer you can get a DocConsumerPerThread for your indexing thread which then consumes the document you are processing right now. I hope that helps simon On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote: hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
AW: Improving String Distance calculation performance
Hi Robert, Thanks for your hint about LevensteinAutomata. Are AutomatonQueries planned for an upcoming release? At the moment, we build the reference to boost documents those at query time which contain fuzzily seldom used tokens within a queried region, in a manner of speaking a fuzzied localised idf() .The boosts are injected via payloads. Since levenstein must be calculated within a (fuzzied) region only, O(mn) applies only to each region. On the outside, we have O(#region). The problem could be equivalently solved query time. But this would mean to count the matched documents of each fuzzy query within a more complex queries. In Release 3.0.2. it looks quite complicated to me to incorporate a different scoring model that first count matches of each fuzzy sub-query and then apply the boosts to the matched tokens. I haven't seen a Scorer doing this so far. Furthermore we are sensible about query time. Do you have any ideas? -Ursprüngliche Nachricht- Von: Robert Muir [mailto:rcm...@gmail.com] Gesendet: Montag, 27. Dezember 2010 17:11 An: dev@lucene.apache.org Betreff: Re: Improving String Distance calculation performance On Mon, Dec 27, 2010 at 10:31 AM, Biedermann,S.,Fa. Post Direkt s.biederm...@postdirekt.de wrote: As for our problem: we are trying to build reference data against which requests shall be matched. In this case we need quite a huge amount of string distance measurements for preparing this reference. If this is your problem, i wouldn't recommend using the StringDistance directly. As i mentioned, its not designed for your use case because the way its used by spellchecker, it only needs something like 20-50 comparisons... If you try to use it the way you describe, it will be very slow, it must do O(k) comparisons, where k is the number of strings, and each comparison is O(mn), where m and n are the lengths of the input string and string being compared, respectively. Easier would be to index your terms and simply do FuzzyQuery (with trunk), specifying the exact max edit distance you want. Or if you care about getting all exact results within Levenshtein distance of some degree N, use AutomatonQuery built from LevenshteinAutomata. This will give you a sublinear number of comparisons, something complicated but more like O(sqrt(k)) where k is the number of strings, and each comparison is O(n), where n is the length of the target string. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2836) FieldCache rewrite method for MultiTermQueries
[ https://issues.apache.org/jira/browse/LUCENE-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975477#action_12975477 ] Uwe Schindler commented on LUCENE-2836: --- Hah, cool! The question is, does it really works correct with multivalued fields? I have to recapitulate the TermsIndex, but the method fcsi.getOrd(doc) returns only the term ord of the first term found in index for that document? For numeric queries with single-value fields thats fine, but for wildcards on analyzed fields? Maybe I miss something, but I am not sure if it works correct... Robert: Help me please :-) *g* FieldCache rewrite method for MultiTermQueries -- Key: LUCENE-2836 URL: https://issues.apache.org/jira/browse/LUCENE-2836 Project: Lucene - Java Issue Type: New Feature Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2836.patch For some MultiTermQueries, like RangeQuery we have a FieldCacheRangeFilter etc (in this case its particularly optimized). But in the general case, since LUCENE-2784 we can now have a rewrite method to rewrite any MultiTermQuery using the FieldCache, because MultiTermQuery's getEnum no longer takes IndexReader but Terms, and all the FilteredTermsEnums are now just real TermsEnum decorators. In cases like low frequency queries this is actually slower (I think this has been shown for numeric ranges before too), but for the really high-frequency cases like especially ugly wildcards, regexes, fuzzies, etc, this can be several times faster using the FieldCache instead, since all the terms are in RAM and automaton can blast through them quicker. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Improving String Distance calculation performance
On Tue, Dec 28, 2010 at 5:26 AM, Biedermann,S.,Fa. Post Direkt s.biederm...@postdirekt.de wrote: Hi Robert, Thanks for your hint about LevensteinAutomata. Are AutomatonQueries planned for an upcoming release? yes, but its in trunk, so you can use it now... At the moment, we build the reference to boost documents those at query time which contain fuzzily seldom used tokens within a queried region, in a manner of speaking a fuzzied localised idf() .The boosts are injected via payloads. Since levenstein must be calculated within a (fuzzied) region only, O(mn) applies only to each region. On the outside, we have O(#region). The problem could be equivalently solved query time. But this would mean to count the matched documents of each fuzzy query within a more complex queries. In Release 3.0.2. it looks quite complicated to me to incorporate a different scoring model that first count matches of each fuzzy sub-query and then apply the boosts to the matched tokens. I haven't seen a Scorer doing this so far. Furthermore we are sensible about query time. Do you have any ideas? Not sure I fully understand what your app needs to do, but you can take a look at using a different rewrite method. for example, it seems like rewriting to span queries (see SpanMultiTermQueryWrapper) might be close to what you want, except it suffers from the problem that boosting is completely broken in Lucene's span queries (since they don't combine with real Scorers but instead Spans)... - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2836) FieldCache rewrite method for MultiTermQueries
[ https://issues.apache.org/jira/browse/LUCENE-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975496#action_12975496 ] Robert Muir commented on LUCENE-2836: - The question is, does it really works correct with multivalued fields? of course not, its no different than any of the other fieldcache*filter stuff we have now. except that stuff is an aweful lot more code... do we really need all those specializations in fieldcacherangefilter? FieldCache rewrite method for MultiTermQueries -- Key: LUCENE-2836 URL: https://issues.apache.org/jira/browse/LUCENE-2836 Project: Lucene - Java Issue Type: New Feature Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2836.patch For some MultiTermQueries, like RangeQuery we have a FieldCacheRangeFilter etc (in this case its particularly optimized). But in the general case, since LUCENE-2784 we can now have a rewrite method to rewrite any MultiTermQuery using the FieldCache, because MultiTermQuery's getEnum no longer takes IndexReader but Terms, and all the FilteredTermsEnums are now just real TermsEnum decorators. In cases like low frequency queries this is actually slower (I think this has been shown for numeric ranges before too), but for the really high-frequency cases like especially ugly wildcards, regexes, fuzzies, etc, this can be several times faster using the FieldCache instead, since all the terms are in RAM and automaton can blast through them quicker. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2276) Support for cologne phonetic
[ https://issues.apache.org/jira/browse/SOLR-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975510#action_12975510 ] Robert Muir commented on SOLR-2276: --- bq. Seems ColognePhonetic will be supported in Apache Commons Codec 1.4.1. Thanks for your patch Marc. Has there been any discussion on a tentative release date for 1.4.1? When this happens I'll be happy to add it. {quote} Besides, do you think it is a good idea to allow a fully qualified class name as encoder in PhoneticFilterFactory? Extending solr by a custom phonetic filter could be much easier for developers. {quote} I think the reason its not done with reflection might be historical, before all tokenstreams were reused? But I still think its a good idea to avoid reflection when possible, so I think we should keep the statically built map. However, if you supply a string thats not in this map, I don't think it would hurt to try to reflect the name before throwing an exception, as in this case you would only get an exception anyway. Support for cologne phonetic Key: SOLR-2276 URL: https://issues.apache.org/jira/browse/SOLR-2276 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 1.4.1 Environment: Apache Commons Codec 1.5 Reporter: Marc Pompl Fix For: 4.0 Attachments: ColognePhonetic-patch.txt Original Estimate: 2h Remaining Estimate: 2h As soon as Apache Commons Codec 1.5 is released, support new encoder ColognePhonetic please. See JIRA for CODEC-106. It is fundamental for phonetic searches if you are indexing german names. Other indexers are optimizied for english (words). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Geospatial search in Lucene/Solr
(I was emailing Grant RE geospatial search and it occurred to me I should be doing this on the Lucene dev list so I've taken it there) On Dec 28, 2010, at 8:31 AM, Grant Ingersoll wrote: On Dec 27, 2010, at 11:52 PM, Smiley, David W. wrote: Hi Grant. I saw your latest blog post at Lucid Imagination in which you mentioned you were going to work on adding polygon search. FWIW, I finished polygon search last week to my code base based onhttps://issues.apache.org/jira/browse/SOLR-2155 (geohash prefix implementation). I've scanned this, but haven't had time to look in depth. I used the JTS library to do the heavy lifting. I’d be happy to release the code. I’ve iterated more on SOLR-2155 on my local project code and plan to re-integrate it with the patch at some point. There was almost 2 months of time in-between me releasing SOLR-2155 and me having other priorities but I’m back at it. Cool. The problem w/ JTS is it is LGPL. At least it's not GPL. Can you simply use JTS and make it an optional library, like Solr does for some other libs? There's a lot of expertise in that library that's been refined over the last 10 years. Even if you refuse to touch anything non-Apache licensed, I highly recommend you look through its code to see how geospatial point-in-polygon is efficiently done. It has a concept of a prepared geometry object that is optimized to make large numbers of point-in-polygon queries more efficient. It is implemented by putting the line segments of the polygon in an in-memory R-tree index. If you'd like me to point you at specific classes, I'd be happy to. Better yet, I could release an update to SOLR-2155 and you could debug step through. FWIW I used a separate class for the polygon search that implements my GeoShape interface. If a user doesn't need to do a polygon search (which is not a common requirement of geospatial search), then the JTS library need not ship with Lucene/Solr. Presently, I’m working on Lucene’s benchmark contrib module to evaluate the performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon range queries), and then I’ll work on a more efficient probably non-geohash implementation but based on the same underlying concept of a hierarchical grid. I’m using the geonames.orghttp://geonames.org/ data set. Unfortunately, the benchmark code seems very oriented to a generic title-body document whereas I’m looking to create lat-lon pairs… and furthermore to create documents containing multiple lat-lon pairs, and even furthermore a query generator that generates random box queries centered on a random location from the data set. I seem to be stretching the benchmark framework beyond the use-case it was designed for and so perhaps it won’t be committable but at least I’ll have a patch for other geospatial birds-of-a-feather like you to use. Stretch away. The Title/Body orientation is just a relic of what we have done in the past, it doesn't have to stay that way. I am interested in your thoughts on evaluating the performance of geospatial queries. Since reading LIA2, the Lucene's benchmark contrib module seems like the ideal way to test Lucene. I thought about programmatically generating points but I've warmed to the idea of using geonames.orghttp://geonames.org data as the space of possible points. Geonames has 7.5M unique lat-lon pairs[1], including population. In SOLR-2155, there are multiple points per document as a key feature. For testing, I want to be able to configure 1 point per document for comparison to algorithms that only support that but I must support a random variable number of them too. Consequently, each place does NOT correspond 1-1 to a document. The number of documents indexed should be configurable which will be randomly generated based on randomly picking one or more points from the input data set. The number of points available from the input data set should be configurable too (i.e. 7.5M). Assuming that a more populace place is more likely to be referenced than a less populace one, I want to skew choosing a place weighted by the place's population. This creates a much more realistic skew of documents mapped to points than an evenly distributed one, which is important. On the query side, I'd like to generate random queries centered on a random one of these points, with a radius that is either 1km, 10km, 100km, 1000km, or 1km, in order to match a wide variety of documents. For reporting, I'd like to see a chart of response time vs number of documents matched. I'm perhaps half-done with implementing all this. Because I need to randomly choose points in the data set, I can't stream it in, I need to read it all into memory as a singleton object used by both the indexing side, and the query side (since the queries need to pick a random point). [1] http://www.geonames.org/about.html ~ David Smiley
Re: Geospatial search in Lucene/Solr
On Tue, Dec 28, 2010 at 10:47 AM, Smiley, David W. dsmi...@mitre.org wrote: Presently, I’m working on Lucene’s benchmark contrib module to evaluate the performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon range queries), and then I’ll work on a more efficient probably non-geohash implementation but based on the same underlying concept of a hierarchical grid. I’m using the geonames.org data set. Unfortunately, the benchmark code seems very oriented to a generic title-body document whereas I’m looking to create lat-lon pairs… and furthermore to create documents containing multiple lat-lon pairs, and even furthermore a query generator that generates random box queries centered on a random location from the data set. I seem to be stretching the benchmark framework beyond the use-case it was designed for and so perhaps it won’t be committable but at least I’ll have a patch for other geospatial birds-of-a-feather like you to use. Stretch away. The Title/Body orientation is just a relic of what we have done in the past, it doesn't have to stay that way. just for reference, a couple of us are using a python front-end to contrib/benchmark that Mike developed: http://code.google.com/p/luceneutil/ This is nice as its designed for you to just declare 'competitors' (2 checkouts of solrcene), and then you run the python script and it gives you the relative comparison... because they are 2 different checkouts its simple to compare different approaches, and each checkout can run with a different index (e.g. different codecs or test index format changes). I thought it might be interesting to you, because there's a variety of queries tested here like numeric range, sorting, primary-key lookup, span queries etc beyond the standard set of queries. The framework also ensures that you are bringing back the same results in the same order, runs multiple iterations (including iterations in new JVMs), makes it easy to test optimized, optimized with deletions, multi-segment, multi-segment with deletions, and can output to txt, html, jira format for convenience. currently we are generally testing with a line file format from wikipedia, but besides geonames i wanted to point out that wikipedia does include lat/long information for many articles (this is a major source for much of geonames place data!). it would definitely be cool if we could test spatial queries with this as well... e.g by parsing out the lat/long from the wikipedia XML and adding to the line files, and adding some spatial queries to the default list of queries being tested. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: is the classes ended with PerThread(*PerThread) multithread
There is a single indexchain, with a single instance of each chain component, except those ending in -PerThread. Though that's gonna change with https://issues.apache.org/jira/browse/LUCENE-2324 On Tue, Dec 28, 2010 at 13:10, Simon Willnauer simon.willna...@googlemail.com wrote: On Tue, Dec 28, 2010 at 10:57 AM, xu cheng xcheng@gmail.com wrote: hi simon thanks for replying very much. after reading the source code with your suggestion, here's my understanding, and I don't know whether it's right: the DocumentsWriter actually don't create threads, but the codes that use DocumentsWriter can do the multithreading(say, several threads call updateDocument). and each thread has its DocumentsWriterThreadState, in the mean while, each DocumentsWriterThreadState has its own objects(the *PerThread such as DocFieldProcessorPerThread, DocInverterPerThread and so on ) as the methods of DocumentsWriter are called by multiple threads, for example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4 index chains, ( each index chain has it's own *PerThread objects , to process the document). am I right?? that sounds about right simon thanks for replying again! 2010/12/28 Simon Willnauer simon.willna...@googlemail.com Hey there, so what you are looking at are classes that are created per Thread rather than shared with other threads. Lucene internally rarely creates threads or subclasses Thread, Runnable or Callable (ParallelMultiSearcher is an exception or some of the merging code). Yet, inside the indexer when you add (update) a document Lucene utilizes the callers thread rather than spanning a new one. When you look at DocumentsWriter.java there should be a method callled getThreadState. Each indexing thread, lets say in updateDocument, gets its Thread-Private DocumentsWriterThreadState. This thread state holds a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer (see the indexing chain). DocConsumerPerThread in that case is some kind of decorator that hold other DocConsumerPerThread instances like TermsHashPerThread etc. The general pattern is for each DocConsumer you can get a DocConsumerPerThread for your indexing thread which then consumes the document you are processing right now. I hope that helps simon On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote: hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Geospatial search in Lucene/Solr
Thanks for letting me know about this Rob. I think geonames is much simpler (and much less data) to work with than wikipedia. It's plain tab-delimited and I like that it includes the population. I'll press forward with my benchmark module based patch. I can relatively easily switch between the lat-lon type and my geohash type since they both conform to the SpatialQueriable interface, and so consequently I don't need two complete Lucene checkouts. I had to add Solr spatial as dependencies to the benchmark module but it's worth it to me. ~ David On Dec 28, 2010, at 11:18 AM, Robert Muir wrote: On Tue, Dec 28, 2010 at 10:47 AM, Smiley, David W. dsmi...@mitre.org wrote: Presently, I’m working on Lucene’s benchmark contrib module to evaluate the performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon range queries), and then I’ll work on a more efficient probably non-geohash implementation but based on the same underlying concept of a hierarchical grid. I’m using the geonames.org data set. Unfortunately, the benchmark code seems very oriented to a generic title-body document whereas I’m looking to create lat-lon pairs… and furthermore to create documents containing multiple lat-lon pairs, and even furthermore a query generator that generates random box queries centered on a random location from the data set. I seem to be stretching the benchmark framework beyond the use-case it was designed for and so perhaps it won’t be committable but at least I’ll have a patch for other geospatial birds-of-a-feather like you to use. Stretch away. The Title/Body orientation is just a relic of what we have done in the past, it doesn't have to stay that way. just for reference, a couple of us are using a python front-end to contrib/benchmark that Mike developed: http://code.google.com/p/luceneutil/ This is nice as its designed for you to just declare 'competitors' (2 checkouts of solrcene), and then you run the python script and it gives you the relative comparison... because they are 2 different checkouts its simple to compare different approaches, and each checkout can run with a different index (e.g. different codecs or test index format changes). I thought it might be interesting to you, because there's a variety of queries tested here like numeric range, sorting, primary-key lookup, span queries etc beyond the standard set of queries. The framework also ensures that you are bringing back the same results in the same order, runs multiple iterations (including iterations in new JVMs), makes it easy to test optimized, optimized with deletions, multi-segment, multi-segment with deletions, and can output to txt, html, jira format for convenience. currently we are generally testing with a line file format from wikipedia, but besides geonames i wanted to point out that wikipedia does include lat/long information for many articles (this is a major source for much of geonames place data!). it would definitely be cool if we could test spatial queries with this as well... e.g by parsing out the lat/long from the wikipedia XML and adding to the line files, and adding some spatial queries to the default list of queries being tested. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Geospatial search in Lucene/Solr
On Tue, Dec 28, 2010 at 11:59 AM, Smiley, David W. dsmi...@mitre.org wrote: Thanks for letting me know about this Rob. I think geonames is much simpler (and much less data) to work with than wikipedia. It's plain tab-delimited and I like that it includes the population. I'll press forward with my benchmark module based patch. I can relatively easily switch between the lat-lon type and my geohash type since they both conform to the SpatialQueriable interface, and so consequently I don't need two complete Lucene checkouts. I had to add Solr spatial as dependencies to the benchmark module but it's worth it to me. well in my opinion there isn't a reason why benchmark couldn't be moved to /modules and depend on solr... others might disagree but i would prefer that benchmark be a module where you can benchmark everything. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2299) improve test-running from eclipse
[ https://issues.apache.org/jira/browse/SOLR-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2299: -- Attachment: SOLR-2299.patch Same patch, i only fixed a couple tests that explicitly used File to use resources instead. I looked at all the failing ones, its a little bit of work but I think its definitely feasible that we get the solr tests working so that: # you put test resources in classpath # tests are independent of current working directory. I think this is just a good general simplification (nothing to do with eclipse) and is good for the ant build too, so that tests dont have to run with a CWD of src/test/test-files in ant. The problem with the way we do this now (even in ant) is that i've sporatically seen tests actually create files in this CWD, which means we are creating leftovers in a src directory that could accidentally be committed, among other problems. I'd like to commit this now and iterate on the remaining individual tests so that they open their test resources all as resources... its not that many left but some are tricky. improve test-running from eclipse - Key: SOLR-2299 URL: https://issues.apache.org/jira/browse/SOLR-2299 Project: Solr Issue Type: Test Components: Build Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: SOLR-2299.patch, SOLR-2299.patch In eclipse, its currently difficult to get a solr development environment working. One big thing that would help would be to make it easier to run the tests. When loading resources, if we checked the config dir + file directory from the resource path, then users could simply add src/test/test-files to their eclipse build classpath, and tests would just work from the IDE. I gather that this might make things easier for other IDEs too, though I'm aware that ones like Intellij let you configure the test 'working directory' on a project basis, but eclipse doesn't (you have to make a custom run configuration every time you run the tests) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2301) RSS Feed URL Breaking
[ https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975618#action_12975618 ] Adam Estrada commented on SOLR-2301: Thanks Carl, I heard somewhere that Manifold or the Connector Framework were all going to be integrated in to Lucene/Solr. Any thoughts on that? Adam RSS Feed URL Breaking - Key: SOLR-2301 URL: https://issues.apache.org/jira/browse/SOLR-2301 Project: Solr Issue Type: Bug Components: clients - C# Affects Versions: 1.4.1, 4.0 Environment: Windows 7 Reporter: Adam Estrada This is an odd oneI am trying to index RSS feeds and have come across several issues. Some are more pressing than others. Referring to SOLR-2286 ;-) Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work with Home page: http://emergency.cdc.gov/rss/ Page to Index: http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19 The console reports the following and as you can see it's because it does not like the param c. Any ideas on how to fix this? INFO: Processing configuration from solrconfig.xml: {config=./solr/conf/dataimpo rthandler/rss.xml} [Fatal Error] :18:63: The reference to entity c must end with the ';' delimite r. Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler inf orm SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurre d while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm porter.java:193) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j ava:100) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor tHandler.java:112) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav a:539) at org.apache.solr.core.SolrCore.init(SolrCore.java:596) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain er.java:243) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975649#action_12975649 ] Robert Muir commented on LUCENE-2837: - {quote} but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. {quote} I did this on LUCENE-2751, but the tests won't all pass until we fix the FieldCache autodetect synchronization bug (the Numerics tests will fail with multiple threads)... Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1566) Allow components to add fields to outgoing documents
[ https://issues.apache.org/jira/browse/SOLR-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-1566: Attachment: SOLR-1566-rm.patch updated patch to work with trunk note, this does not do anything yet, but points towards a direction Allow components to add fields to outgoing documents Key: SOLR-1566 URL: https://issues.apache.org/jira/browse/SOLR-1566 Project: Solr Issue Type: New Feature Components: search Reporter: Noble Paul Assignee: Grant Ingersoll Fix For: Next Attachments: SOLR-1566-gsi.patch, SOLR-1566-rm.patch, SOLR-1566-rm.patch, SOLR-1566-rm.patch, SOLR-1566.patch, SOLR-1566.patch, SOLR-1566.patch, SOLR-1566.patch Currently it is not possible for components to add fields to outgoing documents which are not in the the stored fields of the document. This makes it cumbersome to add computed fields/metadata . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2838) ConstantScoreQuery should directly support wrapping Query and simply strip off scores
ConstantScoreQuery should directly support wrapping Query and simply strip off scores - Key: LUCENE-2838 URL: https://issues.apache.org/jira/browse/LUCENE-2838 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Especially in MultiTermQuery rewrite modes we often simply need to strip off scores from Queries and make them constant score. Currently the code to do this looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query)) The name say, QueryWrapperFilter should make any other Query constant score, so why does it not take a Query as ctor param. This question was aldso askedquite often by my customers and is simply stupid. Looking closer into the code, it is clear that this would also speed up MTQs: - One additional wrapping and method calls can be removed - Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only used in tests and the use-case for this class is not really available) and LUCENE-2831 does not need the stupid hack to make Simon's assertions pass - CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on top-level now directly feeds the Collector. For that a small trick is used: The score(Collector) calls are directly delegated and the scores are stripped by wrapping the setScorer() method in Collector During that I found a visibility bug in Scorer: The method boolean score(Collector collector, int max, int firstDocID) should be public not protected, as its not solely intended to be overridden by subclasses and is called from other classes, too! This leads to no compiler bugs as the other classes that calls it is mainly BooleanScorer(2) and thats in same package, but visibility is wrong. I will open an issue for that and fix it at least in trunk where we have no backwards-requirement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr pseudo fields in response
With Yonik's work on SOLR-2297, I figured i would dive back into figuing out how we could do this I updated SOLR-1566 to compile with trunk, but when i looked into really making it work, there are some issues. It looks like the APIs now support extra fields *BUT* they don't do anything, and there is no clear way how to best make them work. For example TextResponseWriter.writeSolrDocument() includes a parameter Map pseudoFields and TextResponseWriter.writeSolrDocumentList() includes Map otherFields Any idea how this is supposed to work? I see a few problems with this approach: * this requires each format (XML/JSON/binary/etc) to do its own implementing * only supports SolrDocument, not Document BaseResponseWriter may be a reasonable approach -- it abstracts the Document creation into one place. It may have performance issues since every document would get turned into a SolrDocument before getting written. This only applies to the docs that are written so it may not be a big deal. BUT BaseResponseWriter does not have any concrete implementation and converting the XML/TextResponseWriter to use it is not clear to me. Thoughts? ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2838) ConstantScoreQuery should directly support wrapping Query and simply strip off scores
[ https://issues.apache.org/jira/browse/LUCENE-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2838: -- Attachment: LUCENE-2838.patch ConstantScoreQuery should directly support wrapping Query and simply strip off scores - Key: LUCENE-2838 URL: https://issues.apache.org/jira/browse/LUCENE-2838 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2838.patch Especially in MultiTermQuery rewrite modes we often simply need to strip off scores from Queries and make them constant score. Currently the code to do this looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query)) The name say, QueryWrapperFilter should make any other Query constant score, so why does it not take a Query as ctor param. This question was aldso askedquite often by my customers and is simply stupid. Looking closer into the code, it is clear that this would also speed up MTQs: - One additional wrapping and method calls can be removed - Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only used in tests and the use-case for this class is not really available) and LUCENE-2831 does not need the stupid hack to make Simon's assertions pass - CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on top-level now directly feeds the Collector. For that a small trick is used: The score(Collector) calls are directly delegated and the scores are stripped by wrapping the setScorer() method in Collector During that I found a visibility bug in Scorer: The method boolean score(Collector collector, int max, int firstDocID) should be public not protected, as its not solely intended to be overridden by subclasses and is called from other classes, too! This leads to no compiler bugs as the other classes that calls it is mainly BooleanScorer(2) and thats in same package, but visibility is wrong. I will open an issue for that and fix it at least in trunk where we have no backwards-requirement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2839) Visibility of Scorer.score(Collector, int, int) is wrong
Visibility of Scorer.score(Collector, int, int) is wrong Key: LUCENE-2839 URL: https://issues.apache.org/jira/browse/LUCENE-2839 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Uwe Schindler Fix For: 4.0 The method for scoring subsets in Scorer has wrong visibility, its marked protected, but protected methods should not be called from other classes. Protected methods are intended for methods that should be overridden by subclasses and are called by (often) final methods of the same class. They should never be called from foreign classes. This method is called from another class out-of-scope: BooleanScorer(2) - so it must be public, but it's protected. This does not lead to a compiler error because BS(2) is in same package, but may lead to problems if subclasses from other packages override it. When implementing LUCENE-2838 I hit a trap, as I thought tis method should only be called from the class or Scorer itsself, but in fact its called from outside, leading to bugs, because I had not overridden it. As ConstantScorer did not use it I have overridden it with throw UOE and suddenly BooleanQuery was broken, which made it clear that it's called from outside (which is not the intention of protected methods). We cannot fix this in 3.x, as it would break backwards for classes that overwrite this method, but we can fix visibility in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
SV:
I bought some items from a commercial site, because of the unique channel of purchases, product prices unexpected, I think you can go to see: elesales.com , high-quality products can also attract you. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re:
I bought some items from a commercial site, because of the unique channel of purchases, product prices unexpected, I think you can go to see: elesales.com , high-quality products can also attract you. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975670#action_12975670 ] Uwe Schindler commented on LUCENE-2837: --- {quote} I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. {quote} Query.combine is simply broken, this is another issue. It violates DeMorgans law...: LUCENE-2756 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2299) improve test-running from eclipse
[ https://issues.apache.org/jira/browse/SOLR-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2299: -- Attachment: SOLR-2299_part2.patch here's a patch fixing a lot of tests, only a few core tests left. improve test-running from eclipse - Key: SOLR-2299 URL: https://issues.apache.org/jira/browse/SOLR-2299 Project: Solr Issue Type: Test Components: Build Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: SOLR-2299.patch, SOLR-2299.patch, SOLR-2299_part2.patch In eclipse, its currently difficult to get a solr development environment working. One big thing that would help would be to make it easier to run the tests. When loading resources, if we checked the config dir + file directory from the resource path, then users could simply add src/test/test-files to their eclipse build classpath, and tests would just work from the IDE. I gather that this might make things easier for other IDEs too, though I'm aware that ones like Intellij let you configure the test 'working directory' on a project basis, but eclipse doesn't (you have to make a custom run configuration every time you run the tests) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975676#action_12975676 ] Robert Muir commented on LUCENE-2837: - bq. Query.combine is simply broken, this is another issue. I agree, but with this issue we don't need Query.combine anymore, so its then fixed. This method only exists for MultiSearcher (and there is some other dead code in Query.java related to it, that we could even delete now, totally unused today!) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-3.x - Build # 225 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/225/ All tests passed Build Log (for compile errors): [...truncated 20926 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-trunk - Build # 1409 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1409/ All tests passed Build Log (for compile errors): [...truncated 17903 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2611) IntelliJ IDEA setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2611: Attachment: LUCENE-2611_eclipse.patch here's the eclipse part (following the same conventions of your patch). basically for eclipse, getting things to work is just setting up the classpath and setting the whole project to UTF-8... but this takes a while even if you know everything you need to do. IntelliJ IDEA setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming. The attached patch adds a new top level directory {{dev-tools/}} with sub-dir {{idea/}} containing basic setup files for trunk, as well as a top-level ant target named idea that copies these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit test run per module is included. Once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. If this patch is committed, Subversion svn:ignore properties should be added/modified to ignore the destination module files (*.iml) in each module's directory. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 3097 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/3097/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability Error Message: No live SolrServers available to handle this request Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:220) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at org.apache.solr.client.solrj.TestLBHttpSolrServer.waitForServer(TestLBHttpSolrServer.java:188) at org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:181) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:956) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:894) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:204) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:146) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) Build Log (for compile errors): [...truncated 9798 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
any issues about the *perthread classes
hi all I noticed that there are plenty *PerThread classes in the trunk http://svn.apache.org/repos/asf/lucene/dev/trunk/ while in the realtime_search version http://svn.apache.org/repos/asf/lucene/dev/branches/realtime_search/ the *PerThread classes are gone! this just confused me, cos I'm new here. what's the purpose of such a design?what's the advantage? any issues refer to this ?? any suggestion or references are appreciated! regards. xu
Solr-3.x - Build # 211 - Still Failing
Build: https://hudson.apache.org/hudson/job/Solr-3.x/211/ All tests passed Build Log (for compile errors): [...truncated 20482 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org