[jira] [Updated] (LUCENE-3419) Resolve JUnit assert deprecations
[ https://issues.apache.org/jira/browse/LUCENE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3419: --- Attachment: LUCENE-3419.patch Patch which adds appropriate epsilons to the float and double assertions and converts array assertions to assertArrayEquals. Everything passes. Once this is committed, I want to nuke the deprecated assert* methods from LuceneTestCase, as they're no longer used. > Resolve JUnit assert deprecations > - > > Key: LUCENE-3419 > URL: https://issues.apache.org/jira/browse/LUCENE-3419 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3419.patch > > > Many tests use assertEquals methods which have been deprecated. The culprits > are assertEquals(float, float), assertEquals(double, double) and > assertEquals(Object[], Object[]). Although not a big issue, they annoy me > every time I see them so I'm going to fix them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2959: Attachment: LUCENE-2959_nocommits.patch patch removing all nocommits for the fake IDF/phrase issue, i thought it best not to "fake" statistics to SimilarityBase, since the whole point is to make it simpler for implementing/testing ranking models. instead it sums scores across terms (kinda like boolean query) for DFR P and D, I don't think there are really any great practical ways out of the fundamental problem. I added notes to both of these. i think the workaround for dirichlet is fine, i looked around and found another implementation of this smoothing by hiemstra and it had the same workaround (http://mirex.sourceforge.net / trec.nist.gov/pubs/trec19/papers/univ.twente.web.rev.pdf) all the other similarities seem to work fine being randomly swapped into lucene's tests. > [GSoC] Implementing State of the Art Ranking for Lucene > --- > > Key: LUCENE-2959 > URL: https://issues.apache.org/jira/browse/LUCENE-2959 > Project: Lucene - Java > Issue Type: New Feature > Components: core/query/scoring, general/javadocs, modules/examples >Reporter: David Mark Nemeskey >Assignee: Robert Muir > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: flexscoring branch > > Attachments: LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, > implementation_plan.pdf, proposal.pdf > > > Lucene employs the Vector Space Model (VSM) to rank documents, which compares > unfavorably to state of the art algorithms, such as BM25. Moreover, the > architecture is > tailored specically to VSM, which makes the addition of new ranking functions > a non- > trivial task. > This project aims to bring state of the art ranking methods to Lucene and to > implement a > query architecture with pluggable ranking functions. > The wiki page for the project can be found at > http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3419) Resolve JUnit assert deprecations
Resolve JUnit assert deprecations - Key: LUCENE-3419 URL: https://issues.apache.org/jira/browse/LUCENE-3419 Project: Lucene - Java Issue Type: Improvement Reporter: Chris Male Priority: Minor Many tests use assertEquals methods which have been deprecated. The culprits are assertEquals(float, float), assertEquals(double, double) and assertEquals(Object[], Object[]). Although not a big issue, they annoy me every time I see them so I'm going to fix them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable
[ https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved LUCENE-3410. Resolution: Fixed Fix Version/s: 4.0 Assignee: Chris Male Committed revision 1165995. > Make WordDelimiterFilter's instantiation more readable > -- > > Key: LUCENE-3410 > URL: https://issues.apache.org/jira/browse/LUCENE-3410 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male >Assignee: Chris Male >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3410.patch, LUCENE-3410.patch, LUCENE-3410.patch > > > Currently WordDelimiterFilter's constructor is: > {code} > public WordDelimiterFilter(TokenStream in, >byte[] charTypeTable, >int generateWordParts, >int generateNumberParts, >int catenateWords, >int catenateNumbers, >int catenateAll, >int splitOnCaseChange, >int preserveOriginal, >int splitOnNumerics, >int stemEnglishPossessive, >CharArraySet protWords) { > {code} > which means its instantiation is an unreadable combination of 1s and 0s. > We should improve this by either using a Builder, 'int flags' or an EnumSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable
[ https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3410: --- Attachment: LUCENE-3410.patch Better patch. > Make WordDelimiterFilter's instantiation more readable > -- > > Key: LUCENE-3410 > URL: https://issues.apache.org/jira/browse/LUCENE-3410 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3410.patch, LUCENE-3410.patch, LUCENE-3410.patch > > > Currently WordDelimiterFilter's constructor is: > {code} > public WordDelimiterFilter(TokenStream in, >byte[] charTypeTable, >int generateWordParts, >int generateNumberParts, >int catenateWords, >int catenateNumbers, >int catenateAll, >int splitOnCaseChange, >int preserveOriginal, >int splitOnNumerics, >int stemEnglishPossessive, >CharArraySet protWords) { > {code} > which means its instantiation is an unreadable combination of 1s and 0s. > We should improve this by either using a Builder, 'int flags' or an EnumSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable
[ https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3410: --- Attachment: LUCENE-3410.patch Patch with the Iterator back to using booleans. Going to commit. > Make WordDelimiterFilter's instantiation more readable > -- > > Key: LUCENE-3410 > URL: https://issues.apache.org/jira/browse/LUCENE-3410 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3410.patch, LUCENE-3410.patch > > > Currently WordDelimiterFilter's constructor is: > {code} > public WordDelimiterFilter(TokenStream in, >byte[] charTypeTable, >int generateWordParts, >int generateNumberParts, >int catenateWords, >int catenateNumbers, >int catenateAll, >int splitOnCaseChange, >int preserveOriginal, >int splitOnNumerics, >int stemEnglishPossessive, >CharArraySet protWords) { > {code} > which means its instantiation is an unreadable combination of 1s and 0s. > We should improve this by either using a Builder, 'int flags' or an EnumSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-2308: --- Attachment: LUCENE-2308-FT-interface.patch Patch updated following Yonik's advice. I'd removed the freeze() calls from Field so that it can now accept a FieldType instance. If freezing is important, its up to the created of the CoreFieldType. > Separately specify a field's type > - > > Key: LUCENE-2308 > URL: https://issues.apache.org/jira/browse/LUCENE-2308 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, > LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, > LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, > LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, > LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, > LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, > LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, > LUCENE-2308-FT-interface.patch, LUCENE-2308-FT-interface.patch, > LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, > LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, > LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, > LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, > LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch > > > This came up from dicussions on IRC. I'm summarizing here... > Today when you make a Field to add to a document you can set things > index or not, stored or not, analyzed or not, details like omitTfAP, > omitNorms, index term vectors (separately controlling > offsets/positions), etc. > I think we should factor these out into a new class (FieldType?). > Then you could re-use this FieldType instance across multiple fields. > The Field instance would still hold the actual value. > We could then do per-field analyzers by adding a setAnalyzer on the > FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise > for per-field codecs (with flex), where we now have > PerFieldCodecWrapper). > This would NOT be a schema! It's just refactoring what we already > specify today. EG it's not serialized into the index. > This has been discussed before, and I know Michael Busch opened a more > ambitious (I think?) issue. I think this is a good first baby step. We could > consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold > off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3396: --- Attachment: LUCENE-3396-rab.patch Patch updated to reset now returns void. I'll make sure to note this compat break in the CHANGES.txt. > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098580#comment-13098580 ] Chris Male commented on LUCENE-3396: Hmmm, I agree that we should change it to void. If the source cannot be reset, it should throw an Exception. We need to be able to rely on the fact that we are using reusable components. I'll update the patch. > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments
[ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098568#comment-13098568 ] Koji Sekiguchi commented on LUCENE-1824: Uh, I forgot to add testSentenceBoundary(), testLineBoundary() etc., rather than not only word boundary test. Will add in the next patch. > FastVectorHighlighter truncates words at beginning and end of fragments > --- > > Key: LUCENE-1824 > URL: https://issues.apache.org/jira/browse/LUCENE-1824 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter > Environment: any >Reporter: Alex Vigdor >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-1824.patch, LUCENE-1824.patch, LUCENE-1824.patch > > > FastVectorHighlighter does not take word boundaries into consideration when > building fragments, so that in most cases the first and last word of a > fragment are truncated. This makes the highlights less legible than they > should be. I will attach a patch to BaseFragmentBuilder that resolves this > by expanding the start and end boundaries of the fragment to the first > whitespace character on either side of the fragment, or the beginning or end > of the source text, whichever comes first. This significantly improves > legibility, at the cost of returning a slightly larger number of characters > than specified for the fragment size. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments
[ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-1824: --- Attachment: LUCENE-1824.patch I added test cases for BoundaryScanner. Still need to modify FragmentsBuilderTests so that they can pass. > FastVectorHighlighter truncates words at beginning and end of fragments > --- > > Key: LUCENE-1824 > URL: https://issues.apache.org/jira/browse/LUCENE-1824 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter > Environment: any >Reporter: Alex Vigdor >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-1824.patch, LUCENE-1824.patch, LUCENE-1824.patch > > > FastVectorHighlighter does not take word boundaries into consideration when > building fragments, so that in most cases the first and last word of a > fragment are truncated. This makes the highlights less legible than they > should be. I will attach a patch to BaseFragmentBuilder that resolves this > by expanding the start and end boundaries of the fragment to the first > whitespace character on either side of the fragment, or the beginning or end > of the source text, whichever comes first. This significantly improves > legibility, at the cost of returning a slightly larger number of characters > than specified for the fragment size. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Rollback to old index stored with SolrDeletionPolicy
Apparently this feature does not exist so I copy the same mail I sent to "user". What are your opinions on creating a Jira issue requesting this new feature: With SolrDeletionPolicy you can chose the number of "versions" of the index to store ( maxCommitsToKeep, it defaults to 1). Well, how can you revert to an arbitrary version that you have stored? Is there anything in Solr or in Lucene to pick the version of the index to load? The idea rose from a discussion with some fellows about "really paranoid users" that want to keep several backup versions of the index and pick one that worked in the past (after the index was corrupted in some way, probably not immediately noticeable, and not having the possibility to re index the data) Thank you Emmanuel
[jira] [Resolved] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation
[ https://issues.apache.org/jira/browse/SOLR-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2745. Resolution: Duplicate This is a dup of SOLR-2606, fixed in 3x (will be 3.4) > Sorting on a field whose name resembles an integer in scientific notation > - > > Key: SOLR-2745 > URL: https://issues.apache.org/jira/browse/SOLR-2745 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.3 >Reporter: Joey >Priority: Minor > > I have created a schema where the field names are in a uuid format eg: > 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic > fields via the 'star underscore' format eg: *_t. > Whenever I try and sort on a field name that has a format of one or more > integers followed by an 'e', I get a NumberFormatException like the > following: *java.lang.NumberFormatException: For input string: "8e"*. This > particular error comes from trying to sort on a field name > *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with > 12345e, I would get an error *java.lang.NumberFormatException: For input > string: "12345e"*. > I'm not sure if this is a major issue or not but it is something that has > appeared in our testing quite often. You would be surprised at how often > randomly generated uuid's start with a number and then 'e'... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation
[ https://issues.apache.org/jira/browse/SOLR-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098517#comment-13098517 ] Erick Erickson commented on SOLR-2745: -- That is odd. I just reproduced it with 3.3. Here are the field defs (stock Solr schema): It happens when sorting on either field. I have one doc in my index with these fields, put in this way: stuff and nonsense is rampant in our society 2010-02-03T00:00:00Z One other peculiar thing, when I named the fields 8e32_dt and 8e32 it succeeded. but 832e and 832e_dt produced the number format exception, a little at odds with the original statement, but still. Why the field #name# should show a number format exception is...er...interesting. note also that my sort fragment was: &sort=832e_dt, which seems to be getting the _dt truncated. But it's late, so I may be seeing things. Full stack trace (minus most of the Jetty stuff): HTTP ERROR 500 Problem accessing /solr/select. Reason: For input string: "832e" java.lang.NumberFormatException: For input string: "832e" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222) at java.lang.Double.parseDouble(Double.java:510) at org.apache.solr.search.QueryParsing$StrParser.getNumber(QueryParsing.java:694) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:293) at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:67) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:303) at org.apache.solr.search.QParser.getSort(QParser.java:222) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > Sorting on a field whose name resembles an integer in scientific notation > - > > Key: SOLR-2745 > URL: https://issues.apache.org/jira/browse/SOLR-2745 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.3 >Reporter: Joey >Priority: Minor > > I have created a schema where the field names are in a uuid format eg: > 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic > fields via the 'star underscore' format eg: *_t. > Whenever I try and sort on a field name that has a format of one or more > integers followed by an 'e', I get a NumberFormatException like the > following: *java.lang.NumberFormatException: For input string: "8e"*. This > particular error comes from trying to sort on a field name > *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with > 12345e, I would get an error *java.lang.NumberFormatException: For input > string: "12345e"*. > I'm not sure if this is a major issue or not but it is something that has > appeared in our testing quite often. You would be surprised at how often > randomly generated uuid's start with a number and then 'e'... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation
[ https://issues.apache.org/jira/browse/SOLR-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098507#comment-13098507 ] Joey commented on SOLR-2745: I store data that I assume the dynamic field expects. ie: *_t is text, *_dt is storing only dates, *_i is storing only integers. I'm not putting unexpected data into these fields. Also, I'm not allowing users from the front-end of my system to sort on text fields but that's beside the point. I reinstalled 3.3 to make sure I had the latest version and started from scratch with only 3 docs indexed. Sorting on the _dt field (actual field name *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*) still gives me that same error. I can confirm that there is only datestamps in my dynamic date fields in the 3 indexed docs I now have. I am no expert but from this it seems that the error is not a data one but a field name parsing issue. > Sorting on a field whose name resembles an integer in scientific notation > - > > Key: SOLR-2745 > URL: https://issues.apache.org/jira/browse/SOLR-2745 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.3 >Reporter: Joey >Priority: Minor > > I have created a schema where the field names are in a uuid format eg: > 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic > fields via the 'star underscore' format eg: *_t. > Whenever I try and sort on a field name that has a format of one or more > integers followed by an 'e', I get a NumberFormatException like the > following: *java.lang.NumberFormatException: For input string: "8e"*. This > particular error comes from trying to sort on a field name > *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with > 12345e, I would get an error *java.lang.NumberFormatException: For input > string: "12345e"*. > I'm not sure if this is a major issue or not but it is something that has > appeared in our testing quite often. You would be surprised at how often > randomly generated uuid's start with a number and then 'e'... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation
[ https://issues.apache.org/jira/browse/SOLR-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098499#comment-13098499 ] Lance Norskog commented on SOLR-2745: - What kind of data do you want to store in these fields? What do you expect to get when you sort on the field? Fields named "*_dt" are datestamps. They are stored in a numerical format. Solr should not even let you index non-dates into this field. Fields named '*_t' are text fields. These are indexed, and sorting on them does not make sense. It used to be that sorting would blow up if there were more term facets (what you sort on) than documents. In recent Solr (trunk) this will not blow up, but it still does not make sense. > Sorting on a field whose name resembles an integer in scientific notation > - > > Key: SOLR-2745 > URL: https://issues.apache.org/jira/browse/SOLR-2745 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.3 >Reporter: Joey >Priority: Minor > > I have created a schema where the field names are in a uuid format eg: > 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic > fields via the 'star underscore' format eg: *_t. > Whenever I try and sort on a field name that has a format of one or more > integers followed by an 'e', I get a NumberFormatException like the > following: *java.lang.NumberFormatException: For input string: "8e"*. This > particular error comes from trying to sort on a field name > *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with > 12345e, I would get an error *java.lang.NumberFormatException: For input > string: "12345e"*. > I'm not sure if this is a major issue or not but it is something that has > appeared in our testing quite often. You would be surprised at how often > randomly generated uuid's start with a number and then 'e'... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation
Sorting on a field whose name resembles an integer in scientific notation - Key: SOLR-2745 URL: https://issues.apache.org/jira/browse/SOLR-2745 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.3 Reporter: Joey Priority: Minor I have created a schema where the field names are in a uuid format eg: 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic fields via the 'star underscore' format eg: *_t. Whenever I try and sort on a field name that has a format of one or more integers followed by an 'e', I get a NumberFormatException like the following: *java.lang.NumberFormatException: For input string: "8e"*. This particular error comes from trying to sort on a field name *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with 12345e, I would get an error *java.lang.NumberFormatException: For input string: "12345e"*. I'm not sure if this is a major issue or not but it is something that has appeared in our testing quite often. You would be surprised at how often randomly generated uuid's start with a number and then 'e'... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: edismax and wildcards
Hmmm, I looked at this a bit more, and I don't think we can legitimately call this a bug. The exact same thing happens with non-edismax. That is, with a default field of "text", entering nonsense * silliness parses to text:nonsens text:* text:silli Doing the same thing with a minimal edismax (qf of text and features) produces: +((title:nonsens | text:nonsens) (title:* | text:*) (title:silli | text:silli)) After all, the asterisk #is# just a token. And we do accept things like field:* So this seems consistent. And nonsense ~ silliness parses to +((title:nonsense~2.0 | text:nonsense~2.0) (title:silli | text:silli)) So I think GIGO is accurate here Erick On Tue, Sep 6, 2011 at 6:10 PM, Chris Hostetter wrote: > > : GIGO is a valid response... > > I don't think GIGO is a valid attidude for a parser who'se whole purpose > is to accept anything an end user might through at it and "try to do it's > best" > > I agree with David: I think it's a bug that 0 length prefix/wildcard > queries are accepted by default with no config option to disable them. > > (FWIW: this kind of scneerio is exactly the type of thing i was > worried about i fought to not map "dismax"=>ExtendedDisMasQParser in 3.x) > > : >> blah blah * blah blah > : >> > : >> or > : >> > : >> blah blah ~ blah blah > : >> > : >> edismax happily spreads these across all of the fields leading > : >> to...er...interesting behavior, and 5+ minute query responses. > > > -Hoss > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3418) Lucene is not fsync'ing files on commit
[ https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3418. Resolution: Fixed > Lucene is not fsync'ing files on commit > --- > > Key: LUCENE-3418 > URL: https://issues.apache.org/jira/browse/LUCENE-3418 > Project: Lucene - Java > Issue Type: Bug > Components: core/store >Affects Versions: 3.1, 3.2, 3.3, 3.4, 4.0 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Blocker > Fix For: 3.4, 4.0 > > > Thanks to hurricane Irene, when Mark's electricity became unreliable, he > discovered that on power loss Lucene could easily corrumpt the index, which > of course should never happen... > I was able to easily repro, by pulling the plug on an Ubuntu box during > indexing. On digging, I discovered, to my horror, that Lucene is failing to > fsync any files, ever! > This bug was unfortunately created when we committed LUCENE-2328... that > issue added tracking, in FSDir, of which files have been closed but not > sync'd, so that when sync is called during IW.commit we only sync those files > that haven't already been sync'd. > That tracking is done via the FSDir.onIndexOutputClosed callback, called when > an FSIndexOutput is closed. The bug is that we only call it on exception > during close: > {noformat} > @Override > public void close() throws IOException { > // only close the file if it has not been closed yet > if (isOpen) { > boolean success = false; > try { > super.close(); > success = true; > } finally { > isOpen = false; > if (!success) { > try { > file.close(); > parent.onIndexOutputClosed(this); > } catch (Throwable t) { > // Suppress so we don't mask original exception > } > } else > file.close(); > } > } > } > {noformat} > And so FSDir thinks no files need syncing when its sync method is called > I think instead we should call it up-front; better to over-sync than > under-sync. > The fix is trivial (move the callback up-front), but I'd love to somehow have > a test that can catch such a bad regression in the future still I think > we can do that test separately and commit this fix first. > Note that even though LUCENE-2328 was backported to 2.9.x and 3.0.x, this bug > wasn't, ie the backport was a much simpler fix (to just address the original > memory leak); it's 3.1, 3.2, 3.3 and trunk when this bug is present. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1165902 - in /lucene/dev/trunk/lucene: CHANGES.txt src/java/org/apache/lucene/store/FSDirectory.java
On Tue, Sep 6, 2011 at 6:12 PM, wrote: > +* LUCENE-3418: Lucene was failing to fsync index files on commit, > + meaning a crash or power loss could easily corrupt the index (Mark > + Miller, Robert Muir, Mike McCandless) Perhaps "crash" should be expanded to "operating system crash"? Some might not realize that the more common JVM "crash", OOM, kill -9, etc, wouldn't be an issue here. -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: edismax and wildcards
: GIGO is a valid response... I don't think GIGO is a valid attidude for a parser who'se whole purpose is to accept anything an end user might through at it and "try to do it's best" I agree with David: I think it's a bug that 0 length prefix/wildcard queries are accepted by default with no config option to disable them. (FWIW: this kind of scneerio is exactly the type of thing i was worried about i fought to not map "dismax"=>ExtendedDisMasQParser in 3.x) : >> blah blah * blah blah : >> : >> or : >> : >> blah blah ~ blah blah : >> : >> edismax happily spreads these across all of the fields leading : >> to...er...interesting behavior, and 5+ minute query responses. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3418) Lucene is not fsync'ing files on commit
Lucene is not fsync'ing files on commit --- Key: LUCENE-3418 URL: https://issues.apache.org/jira/browse/LUCENE-3418 Project: Lucene - Java Issue Type: Bug Components: core/store Affects Versions: 3.3, 3.2, 3.1, 3.4, 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Blocker Fix For: 3.4, 4.0 Thanks to hurricane Irene, when Mark's electricity became unreliable, he discovered that on power loss Lucene could easily corrumpt the index, which of course should never happen... I was able to easily repro, by pulling the plug on an Ubuntu box during indexing. On digging, I discovered, to my horror, that Lucene is failing to fsync any files, ever! This bug was unfortunately created when we committed LUCENE-2328... that issue added tracking, in FSDir, of which files have been closed but not sync'd, so that when sync is called during IW.commit we only sync those files that haven't already been sync'd. That tracking is done via the FSDir.onIndexOutputClosed callback, called when an FSIndexOutput is closed. The bug is that we only call it on exception during close: {noformat} @Override public void close() throws IOException { // only close the file if it has not been closed yet if (isOpen) { boolean success = false; try { super.close(); success = true; } finally { isOpen = false; if (!success) { try { file.close(); parent.onIndexOutputClosed(this); } catch (Throwable t) { // Suppress so we don't mask original exception } } else file.close(); } } } {noformat} And so FSDir thinks no files need syncing when its sync method is called I think instead we should call it up-front; better to over-sync than under-sync. The fix is trivial (move the callback up-front), but I'd love to somehow have a test that can catch such a bad regression in the future still I think we can do that test separately and commit this fix first. Note that even though LUCENE-2328 was backported to 2.9.x and 3.0.x, this bug wasn't, ie the backport was a much simpler fix (to just address the original memory leak); it's 3.1, 3.2, 3.3 and trunk when this bug is present. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2656) realtime get
[ https://issues.apache.org/jira/browse/SOLR-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2656. Resolution: Fixed Fix Version/s: 4.0 I just committed the implementation attached to SOLR-2700. Since the transaction logging does not yet provide durability, realtime-get is the actual feature competed and hence I used this issue number in CHANGES. > realtime get > > > Key: SOLR-2656 > URL: https://issues.apache.org/jira/browse/SOLR-2656 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Yonik Seeley >Assignee: Yonik Seeley > Fix For: 4.0 > > Attachments: SOLR-2656.patch, SOLR-2656_test.patch > > > Provide a non point-in-time interface to get a document. > For example, if you add a new document, you will be able to get it, > regardless of if the searcher has been refreshed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098385#comment-13098385 ] Martijn van Groningen edited comment on SOLR-2066 at 9/6/11 9:38 PM: - Updated patch. * Fixes for the errors that Matt reported. * If group.ngroups is specified the groupCount is also merged. It is important that all documents of one group are in the same shard. Otherwise the groupCount will be incorrect. * A lot of renames and refactorings. was (Author: martijn.v.groningen): Updated patch. * Fixes the errors that Matt reported. * If group.ngroups is specified the groupCount is also merged. It is important that all documents of one group are in the same shard. Otherwise the groupCount will be incorrect. * A lot of renames and refactorings. > Search Grouping: support distributed search > --- > > Key: SOLR-2066 > URL: https://issues.apache.org/jira/browse/SOLR-2066 > Project: Solr > Issue Type: Sub-task >Reporter: Yonik Seeley > Fix For: 3.4, 4.0 > > Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, > SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch > > > Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2066: Attachment: SOLR-2066.patch Updated patch. * Fixes the errors that Matt reported. * If group.ngroups is specified the groupCount is also merged. It is important that all documents of one group are in the same shard. Otherwise the groupCount will be incorrect. * A lot of renames and refactorings. > Search Grouping: support distributed search > --- > > Key: SOLR-2066 > URL: https://issues.apache.org/jira/browse/SOLR-2066 > Project: Solr > Issue Type: Sub-task >Reporter: Yonik Seeley > Fix For: 3.4, 4.0 > > Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, > SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch > > > Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
DirectoryReader package protected?
I was browsing code, and noticed DirectoryReader is package protected. Why is this? Ie, SegmentReader is not. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2650) Empty docs array on response with grouping and result pagination
[ https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098333#comment-13098333 ] Mike Lerley edited comment on SOLR-2650 at 9/6/11 8:39 PM: --- I seem to be having the same problem. I've just tried the latest code from branch_3x (r1165749) and it's still a problem. Note that I'm trying to output JSON, not XML. I get a similar exception: {noformat} Sep 6, 2011 4:11:31 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: 49 at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:117) at org.apache.solr.response.JSONWriter.writeDocList(JSONResponseWriter.java:492) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:129) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:180) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:296) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:93) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:52) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) {noformat} I seem to be able to trigger it using quoted strings, among other random things. I hope this can get fixed soon. was (Author: mlerley): I seem to be having the same problem. I've just tried the latest code from branch_3x (r1165749) and it's still a problem. I get a similar exception: {noformat} Sep 6, 2011 4:11:31 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: 49 at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:117) at org.apache.solr.response.JSONWriter.writeDocList(JSONResponseWriter.java:492) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:129) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:180) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:296) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:93) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:52) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) {noformat} I seem to be able to trigger it using quoted strings, among other random things. I ho
[jira] [Commented] (SOLR-2650) Empty docs array on response with grouping and result pagination
[ https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098333#comment-13098333 ] Mike Lerley commented on SOLR-2650: --- I seem to be having the same problem. I've just tried the latest code from branch_3x (r1165749) and it's still a problem. I get a similar exception: {noformat} Sep 6, 2011 4:11:31 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: 49 at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:117) at org.apache.solr.response.JSONWriter.writeDocList(JSONResponseWriter.java:492) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:129) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:180) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:296) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:93) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:52) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) {noformat} I seem to be able to trigger it using quoted strings, among other random things. I hope this can get fixed soon. > Empty docs array on response with grouping and result pagination > > > Key: SOLR-2650 > URL: https://issues.apache.org/jira/browse/SOLR-2650 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.3 >Reporter: Massimo Schiavon > Attachments: grouping_patch.txt > > > Requesting a certain number of rows and setting start parameter to a greater > value returns 0 results with grouping enabled. > For example, requesting: > http://localhost:8080/solr/web/select/?q=*:*&rows=1&start=2 > (grouping and highlighting are enabled by default) > I get this response: > [...] > response: { > numFound: 117852 > start: 2 > docs: [ ] > } > highlighting: { > 0938630598: { > title: [ "..." ] > content: [ "..." ] > } > } > [...] > docs array is empty while the highlighted values of the document are present > Debugging the request in > org.apache.solr.search.Grouping.Command.createSimpleResponse() at row 534 > [...] > int len = Math.min(numGroups, docsGathered); > if (offset > len) { > len = 0; > } > [...] > The initial vars values are: > numGroups = 1 > docsGathered = 3 > offset = 2 > so after the execution len = 0 > I've tried commenting the if statement and this resolves the issue but could > introduce some other bugs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3417) DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word.
[ https://issues.apache.org/jira/browse/LUCENE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Njal Karevoll updated LUCENE-3417: -- Description: Due to an off-by-one error, a subword placed at the end of a compound word will not get a token added to the token stream. For example (from the unit test in the attached patch): Dictionary: {"ab", "cd", "ef"} Input: "abcdef" Created tokens: {"abcdef", "ab", "cd"} Expected tokens: {"abcdef", "ab", "cd", "ef"} Additionally, it could produce tokens that were shorter than the minSubwordSize due to another off-by-one error. For example (again, from the attached patch): Dictionary: {"abc", "d", "efg"} Minimum subword length: 2 Input: "abcdefg" Created tokens: {"abcdef", "abc", "d", "efg"} Expected tokens: {"abcdef", "abc", "efg"} was: Due to an off-by-one error, a subword placed at the end of a compound word will not get a token added to the token stream. Example: Dictionary: {"ab", "cd", "ef"} word: "abcdef" Created tokens: {"abcdef", "ab", "cd"} Expected tokens: {"abcdef", "ab", "cd", "ef"} Additionally, it could produce tokens that were shorter than the minSubwordSize due to another off-by-one error. > DictionaryCompoundWordTokenFilter does not properly add tokens from the end > compound word. > -- > > Key: LUCENE-3417 > URL: https://issues.apache.org/jira/browse/LUCENE-3417 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Affects Versions: 3.3, 4.0 >Reporter: Njal Karevoll > Attachments: LUCENE-3417.patch > > Original Estimate: 5m > Remaining Estimate: 5m > > Due to an off-by-one error, a subword placed at the end of a compound word > will not get a token added to the token stream. > For example (from the unit test in the attached patch): > Dictionary: {"ab", "cd", "ef"} > Input: "abcdef" > Created tokens: {"abcdef", "ab", "cd"} > Expected tokens: {"abcdef", "ab", "cd", "ef"} > Additionally, it could produce tokens that were shorter than the > minSubwordSize due to another off-by-one error. For example (again, from the > attached patch): > Dictionary: {"abc", "d", "efg"} > Minimum subword length: 2 > Input: "abcdefg" > Created tokens: {"abcdef", "abc", "d", "efg"} > Expected tokens: {"abcdef", "abc", "efg"} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] 2.9.4
+1 for an official release. DIGY -Original Message- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: Monday, September 05, 2011 9:22 PM To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: [Lucene.Net] 2.9.4 Hey All, How do people feel about the 2.9.4 code base? I've been using it for sometime, for my use cases it's be excellent. Do we feel we are ready to package this up and make it an official release? Or do we have some tasks left to take care of? ~Prescott = - Bu iletide virüs bulunamadı. AVG tarafından kontrol edildi - www.avg.com Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4478 - Sürüm Tarihi: 05.09.2011
[jira] [Commented] (LUCENE-3417) DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word.
[ https://issues.apache.org/jira/browse/LUCENE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098313#comment-13098313 ] Njal Karevoll commented on LUCENE-3417: --- The above patch is trivial to backport for 3.3/3.4. It is similar to LUCENE-3038, but is not duplicated by LUCENE-3022, which deals with issues surrounding the interpretation of onlyLongestMatch. > DictionaryCompoundWordTokenFilter does not properly add tokens from the end > compound word. > -- > > Key: LUCENE-3417 > URL: https://issues.apache.org/jira/browse/LUCENE-3417 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Affects Versions: 3.3, 4.0 >Reporter: Njal Karevoll > Attachments: LUCENE-3417.patch > > Original Estimate: 5m > Remaining Estimate: 5m > > Due to an off-by-one error, a subword placed at the end of a compound word > will not get a token added to the token stream. > Example: > Dictionary: {"ab", "cd", "ef"} > word: "abcdef" > Created tokens: {"abcdef", "ab", "cd"} > Expected tokens: {"abcdef", "ab", "cd", "ef"} > Additionally, it could produce tokens that were shorter than the > minSubwordSize due to another off-by-one error. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3417) DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word.
[ https://issues.apache.org/jira/browse/LUCENE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Njal Karevoll updated LUCENE-3417: -- Attachment: LUCENE-3417.patch Adds two unit tests, one showing each behavior, and a fix for both issues. > DictionaryCompoundWordTokenFilter does not properly add tokens from the end > compound word. > -- > > Key: LUCENE-3417 > URL: https://issues.apache.org/jira/browse/LUCENE-3417 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Affects Versions: 3.3, 4.0 >Reporter: Njal Karevoll > Attachments: LUCENE-3417.patch > > Original Estimate: 5m > Remaining Estimate: 5m > > Due to an off-by-one error, a subword placed at the end of a compound word > will not get a token added to the token stream. > Example: > Dictionary: {"ab", "cd", "ef"} > word: "abcdef" > Created tokens: {"abcdef", "ab", "cd"} > Expected tokens: {"abcdef", "ab", "cd", "ef"} > Additionally, it could produce tokens that were shorter than the > minSubwordSize due to another off-by-one error. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3417) DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word.
DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word. -- Key: LUCENE-3417 URL: https://issues.apache.org/jira/browse/LUCENE-3417 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.3, 4.0 Reporter: Njal Karevoll Due to an off-by-one error, a subword placed at the end of a compound word will not get a token added to the token stream. Example: Dictionary: {"ab", "cd", "ef"} word: "abcdef" Created tokens: {"abcdef", "ab", "cd"} Expected tokens: {"abcdef", "ab", "cd", "ef"} Additionally, it could produce tokens that were shorter than the minSubwordSize due to another off-by-one error. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098286#comment-13098286 ] Simon Willnauer commented on LUCENE-3416: - I see, I guess that is kind of overkill here. This patch looks fine to me though while I wonder why this needs to be synchronized since we don't read it from a synced block. if you want this to take immediate effect you should rather use volatile here? I doubt that this is necessary in this context - I'd rather not invalidate a cache line for each IndexOutput creation. > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-3416: --- Assignee: Simon Willnauer > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon >Assignee: Simon Willnauer > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098249#comment-13098249 ] Martijn van Groningen commented on SOLR-2066: - Thanks for reporting these issues Matt! I'll update the patch soon. > Search Grouping: support distributed search > --- > > Key: SOLR-2066 > URL: https://issues.apache.org/jira/browse/SOLR-2066 > Project: Solr > Issue Type: Sub-task >Reporter: Yonik Seeley > Fix For: 3.4, 4.0 > > Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, > SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch > > > Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2540) CommitWithin as an Update Request parameter
[ https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-2540. --- Resolution: Fixed > CommitWithin as an Update Request parameter > --- > > Key: SOLR-2540 > URL: https://issues.apache.org/jira/browse/SOLR-2540 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: commit, commitWithin > Fix For: 3.4, 4.0 > > Attachments: SOLR-2540.patch, SOLR-2540.patch > > > It would be useful to support commitWithin HTTP GET request param on all > UpdateRequestHandlers. > That way, you could set commitWithin on the request (for XML, JSON, CSV, > Binary and Extracting handlers) with this syntax: > {code} > curl > http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1 >-H "Content-Type: application/pdf" --data-binary @file.pdf > {code} > PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already > support this syntax. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2540) CommitWithin as an Update Request parameter
[ https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2540: -- Affects Version/s: (was: 3.1) Fix Version/s: 4.0 3.4 > CommitWithin as an Update Request parameter > --- > > Key: SOLR-2540 > URL: https://issues.apache.org/jira/browse/SOLR-2540 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: commit, commitWithin > Fix For: 3.4, 4.0 > > Attachments: SOLR-2540.patch, SOLR-2540.patch > > > It would be useful to support commitWithin HTTP GET request param on all > UpdateRequestHandlers. > That way, you could set commitWithin on the request (for XML, JSON, CSV, > Binary and Extracting handlers) with this syntax: > {code} > curl > http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1 >-H "Content-Type: application/pdf" --data-binary @file.pdf > {code} > PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already > support this syntax. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher reassigned SOLR-2703: -- Assignee: Erik Hatcher > Add support for the Lucene Surround Parser > -- > > Key: SOLR-2703 > URL: https://issues.apache.org/jira/browse/SOLR-2703 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.0 >Reporter: Simon Rosenthal >Assignee: Erik Hatcher >Priority: Minor > Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch > > > The Lucene/contrib surround parser provides support for span queries. This > issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098133#comment-13098133 ] Simon Rosenthal commented on SOLR-2703: --- Should hold up on the commit until https://issues.apache.org/jira/browse/LUCENE-2945 patch has been committed, otherwise query caching is very broken. I updated the patch for that issue to work with trunk a few weeks ago. > Add support for the Lucene Surround Parser > -- > > Key: SOLR-2703 > URL: https://issues.apache.org/jira/browse/SOLR-2703 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.0 >Reporter: Simon Rosenthal >Assignee: Erik Hatcher >Priority: Minor > Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch > > > The Lucene/contrib surround parser provides support for span queries. This > issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098106#comment-13098106 ] Simon Rosenthal commented on SOLR-2703: --- Wiki page to follow at http://wiki.apache.org/solr/SurroundQueryParser > Add support for the Lucene Surround Parser > -- > > Key: SOLR-2703 > URL: https://issues.apache.org/jira/browse/SOLR-2703 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.0 >Reporter: Simon Rosenthal >Priority: Minor > Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch > > > The Lucene/contrib surround parser provides support for span queries. This > issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-2703: -- Attachment: SOLR-2703.patch New patch. Query parser not registered by default, and a commented out entry in example solrconfig was added. Hopefully ready to commit > Add support for the Lucene Surround Parser > -- > > Key: SOLR-2703 > URL: https://issues.apache.org/jira/browse/SOLR-2703 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.0 >Reporter: Simon Rosenthal >Priority: Minor > Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch > > > The Lucene/contrib surround parser provides support for span queries. This > issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098057#comment-13098057 ] Robert Muir commented on LUCENE-3396: - patch looks great: one question, should we keep the 'reset can return false and we do not reuse' ? this seems like it might be obselete, though it would introduce an API break, I think maybe we should change it to void? > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable
[ https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098053#comment-13098053 ] Chris Male commented on LUCENE-3410: {quote} should the iterator maybe keep the booleans and not use flags? just an idea, because the iterator doesn't "make use" of all the flags. its also not a public class and just a helper class to simplify the filter, so i think its ok for it to take 3 booleans? {quote} Yeah I thought about this as well. It would make the iterator clearer since it wouldn't rely on people looking at the Filter's flags. I will make the change. > Make WordDelimiterFilter's instantiation more readable > -- > > Key: LUCENE-3410 > URL: https://issues.apache.org/jira/browse/LUCENE-3410 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3410.patch > > > Currently WordDelimiterFilter's constructor is: > {code} > public WordDelimiterFilter(TokenStream in, >byte[] charTypeTable, >int generateWordParts, >int generateNumberParts, >int catenateWords, >int catenateNumbers, >int catenateAll, >int splitOnCaseChange, >int preserveOriginal, >int splitOnNumerics, >int stemEnglishPossessive, >CharArraySet protWords) { > {code} > which means its instantiation is an unreadable combination of 1s and 0s. > We should improve this by either using a Builder, 'int flags' or an EnumSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098051#comment-13098051 ] Yonik Seeley commented on LUCENE-2308: -- Instead of introducing a dependency on CoreFieldType in many places (only to have to change it back later when some sort of consensus is finally reached), it would seem much cleaner to either - remove freeze() until we decide on the right approach - move freeze() to the FieldType interface temporarily (and remove it later if the approach changes) The other changes in the patch look fine. > Separately specify a field's type > - > > Key: LUCENE-2308 > URL: https://issues.apache.org/jira/browse/LUCENE-2308 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, > LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, > LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, > LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, > LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, > LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, > LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, > LUCENE-2308-FT-interface.patch, LUCENE-2308-branch.patch, > LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, > LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, > LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, > LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch > > > This came up from dicussions on IRC. I'm summarizing here... > Today when you make a Field to add to a document you can set things > index or not, stored or not, analyzed or not, details like omitTfAP, > omitNorms, index term vectors (separately controlling > offsets/positions), etc. > I think we should factor these out into a new class (FieldType?). > Then you could re-use this FieldType instance across multiple fields. > The Field instance would still hold the actual value. > We could then do per-field analyzers by adding a setAnalyzer on the > FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise > for per-field codecs (with flex), where we now have > PerFieldCodecWrapper). > This would NOT be a schema! It's just refactoring what we already > specify today. EG it's not serialized into the index. > This has been discussed before, and I know Michael Busch opened a more > ambitious (I think?) issue. I think this is a good first baby step. We could > consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold > off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable
[ https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098046#comment-13098046 ] Robert Muir commented on LUCENE-3410: - looks good overall, a couple tiny nitpicks: * looks like there is some dead code in WordDelimiterIterator (the booleans) * should the iterator maybe keep the booleans and not use flags? just an idea, because the iterator doesn't "make use" of all the flags. its also not a public class and just a helper class to simplify the filter, so i think its ok for it to take 3 booleans? > Make WordDelimiterFilter's instantiation more readable > -- > > Key: LUCENE-3410 > URL: https://issues.apache.org/jira/browse/LUCENE-3410 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3410.patch > > > Currently WordDelimiterFilter's constructor is: > {code} > public WordDelimiterFilter(TokenStream in, >byte[] charTypeTable, >int generateWordParts, >int generateNumberParts, >int catenateWords, >int catenateNumbers, >int catenateAll, >int splitOnCaseChange, >int preserveOriginal, >int splitOnNumerics, >int stemEnglishPossessive, >CharArraySet protWords) { > {code} > which means its instantiation is an unreadable combination of 1s and 0s. > We should improve this by either using a Builder, 'int flags' or an EnumSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098045#comment-13098045 ] Chris Male commented on LUCENE-3396: I'm going to commit this soon and then work on the remaining Analyzers. > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable
[ https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098041#comment-13098041 ] Uwe Schindler commented on LUCENE-3410: --- +1 > Make WordDelimiterFilter's instantiation more readable > -- > > Key: LUCENE-3410 > URL: https://issues.apache.org/jira/browse/LUCENE-3410 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3410.patch > > > Currently WordDelimiterFilter's constructor is: > {code} > public WordDelimiterFilter(TokenStream in, >byte[] charTypeTable, >int generateWordParts, >int generateNumberParts, >int catenateWords, >int catenateNumbers, >int catenateAll, >int splitOnCaseChange, >int preserveOriginal, >int splitOnNumerics, >int stemEnglishPossessive, >CharArraySet protWords) { > {code} > which means its instantiation is an unreadable combination of 1s and 0s. > We should improve this by either using a Builder, 'int flags' or an EnumSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable
[ https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098036#comment-13098036 ] Chris Male commented on LUCENE-3410: Plan to commit soon if theres no objections. > Make WordDelimiterFilter's instantiation more readable > -- > > Key: LUCENE-3410 > URL: https://issues.apache.org/jira/browse/LUCENE-3410 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3410.patch > > > Currently WordDelimiterFilter's constructor is: > {code} > public WordDelimiterFilter(TokenStream in, >byte[] charTypeTable, >int generateWordParts, >int generateNumberParts, >int catenateWords, >int catenateNumbers, >int catenateAll, >int splitOnCaseChange, >int preserveOriginal, >int splitOnNumerics, >int stemEnglishPossessive, >CharArraySet protWords) { > {code} > which means its instantiation is an unreadable combination of 1s and 0s. > We should improve this by either using a Builder, 'int flags' or an EnumSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098032#comment-13098032 ] Chris Male commented on LUCENE-2308: Anyone else have any thoughts? Any objections to committing this patch as a first step? > Separately specify a field's type > - > > Key: LUCENE-2308 > URL: https://issues.apache.org/jira/browse/LUCENE-2308 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, > LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, > LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, > LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, > LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, > LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, > LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, > LUCENE-2308-FT-interface.patch, LUCENE-2308-branch.patch, > LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, > LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, > LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, > LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch > > > This came up from dicussions on IRC. I'm summarizing here... > Today when you make a Field to add to a document you can set things > index or not, stored or not, analyzed or not, details like omitTfAP, > omitNorms, index term vectors (separately controlling > offsets/positions), etc. > I think we should factor these out into a new class (FieldType?). > Then you could re-use this FieldType instance across multiple fields. > The Field instance would still hold the actual value. > We could then do per-field analyzers by adding a setAnalyzer on the > FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise > for per-field codecs (with flex), where we now have > PerFieldCodecWrapper). > This would NOT be a schema! It's just refactoring what we already > specify today. EG it's not serialized into the index. > This has been discussed before, and I know Michael Busch opened a more > ambitious (I think?) issue. I think this is a good first baby step. We could > consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold > off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098018#comment-13098018 ] Shay Banon commented on LUCENE-3416: It is possible, but requires more work to do, and depends on overriding the createOutput method (as well as all the other methods in Directory). If rate limiting makes sense on the directory level to be exposed as a "feature", I think that this small change allows for greater control over it. > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2383) Velocity: Generalize range and date facet display
[ https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-2383. --- Resolution: Fixed Checked in backport to 3.x > Velocity: Generalize range and date facet display > - > > Key: SOLR-2383 > URL: https://issues.apache.org/jira/browse/SOLR-2383 > Project: Solr > Issue Type: Bug > Components: Response Writers >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: facet, range, velocity > Fix For: 3.4, 4.0 > > Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, > SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, > SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, > SOLR-2383.patch, SOLR-2383.patch > > > Velocity (/browse) GUI has hardcoded price range facet and a hardcoded > manufacturedate_dt date facet. Need general solution which work for any > facet.range and facet.date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2741) Bugs in facet range display in trunk
[ https://issues.apache.org/jira/browse/SOLR-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-2741. --- Resolution: Fixed > Bugs in facet range display in trunk > > > Key: SOLR-2741 > URL: https://issues.apache.org/jira/browse/SOLR-2741 > Project: Solr > Issue Type: Sub-task > Components: web gui >Affects Versions: 4.0 >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Fix For: 4.0 > > Attachments: SOLR-2741.patch, SOLR-2741.patch > > > In SOLR-2383 the hardcoded display of some facet ranges were replaced with > automatic, dynamic display. > There were some shortcomings: > a) Float range to-values were sometimes displayed as int > b) Capitalizing the facet name was a mistake, sometimes looks good, sometimes > not > c) facet.range on a date did not work - dates were displayed in whatever > locale formatting > d) The deprecated facet.date syntax was used in solrconfig.xml instead of the > new facet.range -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Resolved] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
[ https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-414. Resolution: Fixed Fixed. DIGY > The definition of CharArraySet is dangerously confusing and leads to bugs > when used. > > > Key: LUCENENET-414 > URL: https://issues.apache.org/jira/browse/LUCENENET-414 > Project: Lucene.Net > Issue Type: Bug > Components: Lucene.Net Core >Affects Versions: Lucene.Net 2.9.2 > Environment: Irrelevant >Reporter: Vincent Van Den Berghe >Priority: Minor > Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g > > > Right now, CharArraySet derives from System.Collections.Hashtable, but > doesn't actually use this base type for storing elements. > However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a > System.Collections.Hashtable. The trivial code to build your own stopword set > using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of > stopwords like this: > CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, > ignoreCase: false); > foreach (string domainSpecificStopWord in DomainSpecificStopWords) > stopWords.Add(domainSpecificStopWord); > ... will fail because the CharArraySet accepts an ICollection, which will be > passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords > will only contain the DomainSpecificStopWords, and not those from > STOP_WORDS_SET. > One workaround would be to replace the first line with this: > CharArraySet stopWords = new > CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + > DomainSpecificStopWords.Length, ignoreCase: false); > foreach (string domainSpecificStopWord in > (CharArraySet)StandardAnalyzer.STOP_WORDS_SET) > stopWords.Add(domainSpecificStopWord); > ... but this makes use of the implementation detail (the STOP_WORDS_SET is > really an UnmodifiableCharArraySet which is itself a CharArraySet). It works > because it forces the foreach() to use the correct > CharArraySet.GetEnumerator(), which is defined as a "new" method (this has a > bad code smell to it) > At least 2 possibilities exist to solve this problem: > - Make CharArraySet use the Hashtable instance and a custom comparator, > instead of its own implementation. > - Make CharArraySet use HashSet, defined in .NET 4.0. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
[ https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097945#comment-13097945 ] Robert Muir commented on LUCENE-3414: - I don't think we should do anything with the dictionaries ever, its much better to make small "test" dictionaries that are actually more like unit tests and test certain things, like what you did in the patch. > Bring Hunspell for Lucene into analysis module > -- > > Key: LUCENE-3414 > URL: https://issues.apache.org/jira/browse/LUCENE-3414 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3414.patch > > > Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the > Hunspell algorithm. It has the benefit of supporting dictionaries for a wide > array of languages. > It seems to still be being used but has fallen out of date. I think it would > benefit from being inside the analysis module where additional features such > as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
[ https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097937#comment-13097937 ] Chris Male commented on LUCENE-3414: Okay good spotting. so how do we want to proceed? Do we want to bring some of the dictionaries in? Should we address that in a later issue once its become clearer in OO what they're doing? > Bring Hunspell for Lucene into analysis module > -- > > Key: LUCENE-3414 > URL: https://issues.apache.org/jira/browse/LUCENE-3414 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3414.patch > > > Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the > Hunspell algorithm. It has the benefit of supporting dictionaries for a wide > array of languages. > It seems to still be being used but has fallen out of date. I think it would > benefit from being inside the analysis module where additional features such > as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
[ https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097916#comment-13097916 ] Robert Muir commented on LUCENE-3414: - {quote} Bizarrely, from what I can see in the OpenOffice SVN, they are still under their original license. {quote} I don't think we should read too much into that text file: its not even obvious which of the many dictionaries in that folder it applies to! I know for a fact that some of the files in there are *NOT* GPL, for example the en_US dictionary: http://svn.apache.org/viewvc/incubator/ooo/trunk/main/dictionaries/en/README_en_US.txt?revision=1162288&view=markup > Bring Hunspell for Lucene into analysis module > -- > > Key: LUCENE-3414 > URL: https://issues.apache.org/jira/browse/LUCENE-3414 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3414.patch > > > Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the > Hunspell algorithm. It has the benefit of supporting dictionaries for a wide > array of languages. > It seems to still be being used but has fallen out of date. I think it would > benefit from being inside the analysis module where additional features such > as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too
[ https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097907#comment-13097907 ] Robert Muir commented on LUCENE-3415: - {quote} The index size increases because we dont have the option of with / without stemming in a single field and as a reason, we have to store in 2 separate fields. {quote} This is not true. there are just as many postings either way. > Snowball filter to include original word too > > > Key: LUCENE-3415 > URL: https://issues.apache.org/jira/browse/LUCENE-3415 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 3.3 > Environment: All >Reporter: Manish > Labels: features > Fix For: 3.4, 4.0 > > > 1. Currently, snowball filter deletes the original word and adds the stemmed > word to the index. So, if i want to do search with / without stemming, i have > to keep 2 fields, one with stemming and one without it. > 2. Rather than doing this, if we have configurable item to preserve original, > it would solve more business problem. > 3. Using single field, i can do search using stemming / without stemming by > changing the query filters. > The same can also be done for phonetic filters too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too
[ https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097904#comment-13097904 ] Manish commented on LUCENE-3415: The index size increases because we dont have the option of with / without stemming in a single field and as a reason, we have to store in 2 separate fields. Even with highlighting, we can hightlight another field also, but since the term vector information are different, it cannot hightlight it properly. > Snowball filter to include original word too > > > Key: LUCENE-3415 > URL: https://issues.apache.org/jira/browse/LUCENE-3415 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 3.3 > Environment: All >Reporter: Manish > Labels: features > Fix For: 3.4, 4.0 > > > 1. Currently, snowball filter deletes the original word and adds the stemmed > word to the index. So, if i want to do search with / without stemming, i have > to keep 2 fields, one with stemming and one without it. > 2. Rather than doing this, if we have configurable item to preserve original, > it would solve more business problem. > 3. Using single field, i can do search using stemming / without stemming by > changing the query filters. > The same can also be done for phonetic filters too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too
[ https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097898#comment-13097898 ] Robert Muir commented on LUCENE-3415: - The index size is increasing because you are storing both fields, this has nothing to do with how the analysis is done. I don't think we should modify every tokenfilter to optionally inject things instead of changing terms: or create a hack with KeywordAttribute. Instead if the problem is the Highlighter, why not propose a modification to the highlighter so it can highlight field A with field B's stored value? > Snowball filter to include original word too > > > Key: LUCENE-3415 > URL: https://issues.apache.org/jira/browse/LUCENE-3415 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 3.3 > Environment: All >Reporter: Manish > Labels: features > Fix For: 3.4, 4.0 > > > 1. Currently, snowball filter deletes the original word and adds the stemmed > word to the index. So, if i want to do search with / without stemming, i have > to keep 2 fields, one with stemming and one without it. > 2. Rather than doing this, if we have configurable item to preserve original, > it would solve more business problem. > 3. Using single field, i can do search using stemming / without stemming by > changing the query filters. > The same can also be done for phonetic filters too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too
[ https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097890#comment-13097890 ] Manish commented on LUCENE-3415: The index size becomes huge (infact double). We have 2 fields both indexed and stored, one with stemming and one without stemming. We thought of removing the stored=true from one of the fields, but highlighting becomes the problem(the field 1 wont have original words and hence term vectors wont highlight it properly) I have an idea bases on Simon's comments, dont know if it going to work or not. 1. Create new Filter Factory which will put both the stemmed word and original word. 2. Field 1-> indexed=true, stored=true, use the above filter 3. Field 2-> indexed=true, stored=false, dont use the above filter. I can make searches against the corresponding fields. for highlighting, i can always use Field 1 and since term vectors, offsets and positions are present for original words too, it will highlight properly. Do let me know your thoughts on this. > Snowball filter to include original word too > > > Key: LUCENE-3415 > URL: https://issues.apache.org/jira/browse/LUCENE-3415 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 3.3 > Environment: All >Reporter: Manish > Labels: features > Fix For: 3.4, 4.0 > > > 1. Currently, snowball filter deletes the original word and adds the stemmed > word to the index. So, if i want to do search with / without stemming, i have > to keep 2 fields, one with stemming and one without it. > 2. Rather than doing this, if we have configurable item to preserve original, > it would solve more business problem. > 3. Using single field, i can do search using stemming / without stemming by > changing the query filters. > The same can also be done for phonetic filters too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too
[ https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097878#comment-13097878 ] Robert Muir commented on LUCENE-3415: - Manish, why not put your content in a different field without stemming? You can use e.g. MultiFieldQueryParser to make this transparent. > Snowball filter to include original word too > > > Key: LUCENE-3415 > URL: https://issues.apache.org/jira/browse/LUCENE-3415 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 3.3 > Environment: All >Reporter: Manish > Labels: features > Fix For: 3.4, 4.0 > > > 1. Currently, snowball filter deletes the original word and adds the stemmed > word to the index. So, if i want to do search with / without stemming, i have > to keep 2 fields, one with stemming and one without it. > 2. Rather than doing this, if we have configurable item to preserve original, > it would solve more business problem. > 3. Using single field, i can do search using stemming / without stemming by > changing the query filters. > The same can also be done for phonetic filters too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097873#comment-13097873 ] Simon Willnauer commented on LUCENE-3416: - Shay, can't you use a Input / Output wrapper on a RateLimitingDirectoryDelegate? With lucene 4.0 you get the IOContext when open / creating streams so you can decide based on this? > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3416: --- Attachment: LUCENE-3416.patch > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too
[ https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097864#comment-13097864 ] Manish commented on LUCENE-3415: How to handle 2 different analyzers for query? Guess, not possible in current design. > Snowball filter to include original word too > > > Key: LUCENE-3415 > URL: https://issues.apache.org/jira/browse/LUCENE-3415 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 3.3 > Environment: All >Reporter: Manish > Labels: features > Fix For: 3.4, 4.0 > > > 1. Currently, snowball filter deletes the original word and adds the stemmed > word to the index. So, if i want to do search with / without stemming, i have > to keep 2 fields, one with stemming and one without it. > 2. Rather than doing this, if we have configurable item to preserve original, > it would solve more business problem. > 3. Using single field, i can do search using stemming / without stemming by > changing the query filters. > The same can also be done for phonetic filters too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
[ https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097861#comment-13097861 ] Chris Male commented on LUCENE-3414: bq. how is OpenOffice dealing with those dictionaries since they are now an ASF incubation project? Maybe the dictionaries are under ASL eventually? Bizarrely, from what I can see in the OpenOffice [SVN|http://svn.apache.org/viewvc/incubator/ooo/trunk/main/dictionaries/en/license.txt?revision=1162288&view=markup], they are still under their original license. I guess thats something they will have to sort out during incubation. I don't see the licenses changing since the dictionaries tend to be developed by national language organisations, but maybe the ASF will negotiate. > Bring Hunspell for Lucene into analysis module > -- > > Key: LUCENE-3414 > URL: https://issues.apache.org/jira/browse/LUCENE-3414 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3414.patch > > > Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the > Hunspell algorithm. It has the benefit of supporting dictionaries for a wide > array of languages. > It seems to still be being used but has fallen out of date. I think it would > benefit from being inside the analysis module where additional features such > as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
[ https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097859#comment-13097859 ] Simon Willnauer commented on LUCENE-3414: - bq. ...so it should really be in Lucene, except the dictionaries. how is OpenOffice dealing with those dictionaries since they are now an ASF incubation project? Maybe the dictionaries are under ASL eventually? > Bring Hunspell for Lucene into analysis module > -- > > Key: LUCENE-3414 > URL: https://issues.apache.org/jira/browse/LUCENE-3414 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3414.patch > > > Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the > Hunspell algorithm. It has the benefit of supporting dictionaries for a wide > array of languages. > It seems to still be being used but has fallen out of date. I think it would > benefit from being inside the analysis module where additional features such > as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too
[ https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097857#comment-13097857 ] Simon Willnauer commented on LUCENE-3415: - instead of modifying snowball filter you could write a filter that buffers the term and emits it twice. First you simply pass on the term and the second time you set KeywordAttribute#setKeyword(boolean) to true. This will force the stemmer to ignore this term an pass it along the tokenstream pipeline without modification. Would that solve your problem? I am not sure we should actually provide such a filter but others have more insight into this, robert? > Snowball filter to include original word too > > > Key: LUCENE-3415 > URL: https://issues.apache.org/jira/browse/LUCENE-3415 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 3.3 > Environment: All >Reporter: Manish > Labels: features > Fix For: 3.4, 4.0 > > > 1. Currently, snowball filter deletes the original word and adds the stemmed > word to the index. So, if i want to do search with / without stemming, i have > to keep 2 fields, one with stemming and one without it. > 2. Rather than doing this, if we have configurable item to preserve original, > it would solve more business problem. > 3. Using single field, i can do search using stemming / without stemming by > changing the query filters. > The same can also be done for phonetic filters too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097855#comment-13097855 ] Matt Beaumont commented on SOLR-2066: - Found two issues with this patch: 1. Using Faceting with the combination of sharding and grouping in our queries an error occurs. 2. When a shard returns no results and other shards do an error occurs. Thanks Matt. > Search Grouping: support distributed search > --- > > Key: SOLR-2066 > URL: https://issues.apache.org/jira/browse/SOLR-2066 > Project: Solr > Issue Type: Sub-task >Reporter: Yonik Seeley > Fix For: 3.4, 4.0 > > Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, > SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch > > > Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
[ https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097852#comment-13097852 ] Uwe Schindler commented on LUCENE-3414: --- Thanks Chris for adding this to Lucene Analysis module. We did lots of work on Google Code, so it should really be in Lucene, except the dictionaries. We should only add links to web pages where to get them. > Bring Hunspell for Lucene into analysis module > -- > > Key: LUCENE-3414 > URL: https://issues.apache.org/jira/browse/LUCENE-3414 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3414.patch > > > Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the > Hunspell algorithm. It has the benefit of supporting dictionaries for a wide > array of languages. > It seems to still be being used but has fallen out of date. I think it would > benefit from being inside the analysis module where additional features such > as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3415) Snowball filter to include original word too
Snowball filter to include original word too Key: LUCENE-3415 URL: https://issues.apache.org/jira/browse/LUCENE-3415 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.3 Environment: All Reporter: Manish Fix For: 3.4, 4.0 1. Currently, snowball filter deletes the original word and adds the stemmed word to the index. So, if i want to do search with / without stemming, i have to keep 2 fields, one with stemming and one without it. 2. Rather than doing this, if we have configurable item to preserve original, it would solve more business problem. 3. Using single field, i can do search using stemming / without stemming by changing the query filters. The same can also be done for phonetic filters too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats
[ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3412: Attachment: LUCENE-3412.patch I am able to see this inconsistent behavior! Attached patch contains a test that fails on this. The test currently prints the trial number, and the first loop always pass in all 30 trials (expected) while the second loop always fail (for me) but is inconsistent about when it fails. Sometimes, it fails on the first iteration. Some other times it fails on the 3rd, 9th, etc. Quite peculiar... investigating... > SloppyPhraseScorer returns non-deterministic results for queries with many > repeats > -- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.1, 3.2, 3.3, 4.0 >Reporter: Michael Ryan >Assignee: Doron Cohen > Attachments: LUCENE-3412.patch > > > Proximity queries with many repeats (four or more, based on my testing) > return non-deterministic results. I run the same query multiple times with > the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 > trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". > http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. > http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with > id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog > dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in > SloppyPhraseScorer.initPhrasePositions() - depending on the order of the > elements in this array, the document may or may not match. I think the > HashSet may be to blame, but I'm not sure - that at least seems to be where > the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats
[ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-3412: --- Assignee: Doron Cohen > SloppyPhraseScorer returns non-deterministic results for queries with many > repeats > -- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.1, 3.2, 3.3, 4.0 >Reporter: Michael Ryan >Assignee: Doron Cohen > > Proximity queries with many repeats (four or more, based on my testing) > return non-deterministic results. I run the same query multiple times with > the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 > trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". > http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. > http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with > id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog > dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in > SloppyPhraseScorer.initPhrasePositions() - depending on the order of the > elements in this array, the document may or may not match. I think the > HashSet may be to blame, but I'm not sure - that at least seems to be where > the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org