[jira] [Updated] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
[ https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3414: --- Attachment: LUCENE-3414.patch Patch with a port of the code. Because most of the dictionaries are L/GPL, I've written my own dumb stupid dictionary for test purposes. During testing I discovered a long standing bug to do with recursive application of rules This has now been fixed. Code now is also version aware, as required by the CharArray* data structures. > Bring Hunspell for Lucene into analysis module > -- > > Key: LUCENE-3414 > URL: https://issues.apache.org/jira/browse/LUCENE-3414 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3414.patch > > > Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the > Hunspell algorithm. It has the benefit of supporting dictionaries for a wide > array of languages. > It seems to still be being used but has fallen out of date. I think it would > benefit from being inside the analysis module where additional features such > as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting
[ https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097737#comment-13097737 ] Chris A. Mattmann commented on LUCENE-3413: --- BTW, I couldn't get it to work by removing the firstCall variable using Simon's suggestion, so I left it in there. If you guys want to figure it out, go for it, but the patch I attached right now is working...thanks! > CombiningFilter to recombine tokens into a single token for sorting > --- > > Key: LUCENE-3413 > URL: https://issues.apache.org/jira/browse/LUCENE-3413 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 2.9.3 >Reporter: Chris A. Mattmann >Priority: Minor > Attachments: LUCENE-3413.Mattmann.090311.patch.txt, > LUCENE-3413.Mattmann.090511.patch.txt > > > I whipped up this CombiningFilter for the following use case: > I've got a bunch of titles of e.g., Books, such as: > The Grapes of Wrath > Tommy Tommerson saves the World > Top of the World > The Tales of Beedle the Bard > Born Free > etc. > I want to sort these titles using a String field that includes stopword > analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), > etc. I created an analysis chain in Solr for this that was based off of > *alphaOnlySort*, which looks like this: > {code:xml} > omitNorms="true"> > > > > > > > > > pattern="([^a-z])" replacement="" replace="all" > /> > > > {code} > The issue with alphaOnlySort is that it doesn't support stopword remove or > synonyms because those are based on the original token level instead of the > full strings produced by the KeywordTokenizer (which does not do > tokenization). I needed a filter that would allow me to change alphaOnlySort > and its analysis chain from using KeywordTokenizer to using > WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, > take "The Grapes of Wrath". I needed a way for it to get turned into: > {noformat} > grapes of wrath > {noformat} > And then to combine those tokens into a single token: > {noformat} > grapesofwrath > {noformat} > The attached CombiningFilter takes care of that. It doesn't do it super > efficiently I'm guessing (since I used a StringBuffer), but I'm open to > suggestions on how to make it better. > One other thing is that apparently this analyzer works fine for analysis > (e.g., it produces the desired tokens), however, for sorting in Solr I'm > getting null sort tokens. Need to figure out why. > Here ya go! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting
[ https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated LUCENE-3413: -- Attachment: LUCENE-3413.Mattmann.090511.patch.txt - final updated patch > CombiningFilter to recombine tokens into a single token for sorting > --- > > Key: LUCENE-3413 > URL: https://issues.apache.org/jira/browse/LUCENE-3413 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 2.9.3 >Reporter: Chris A. Mattmann >Priority: Minor > Attachments: LUCENE-3413.Mattmann.090311.patch.txt, > LUCENE-3413.Mattmann.090511.patch.txt > > > I whipped up this CombiningFilter for the following use case: > I've got a bunch of titles of e.g., Books, such as: > The Grapes of Wrath > Tommy Tommerson saves the World > Top of the World > The Tales of Beedle the Bard > Born Free > etc. > I want to sort these titles using a String field that includes stopword > analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), > etc. I created an analysis chain in Solr for this that was based off of > *alphaOnlySort*, which looks like this: > {code:xml} > omitNorms="true"> > > > > > > > > > pattern="([^a-z])" replacement="" replace="all" > /> > > > {code} > The issue with alphaOnlySort is that it doesn't support stopword remove or > synonyms because those are based on the original token level instead of the > full strings produced by the KeywordTokenizer (which does not do > tokenization). I needed a filter that would allow me to change alphaOnlySort > and its analysis chain from using KeywordTokenizer to using > WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, > take "The Grapes of Wrath". I needed a way for it to get turned into: > {noformat} > grapes of wrath > {noformat} > And then to combine those tokens into a single token: > {noformat} > grapesofwrath > {noformat} > The attached CombiningFilter takes care of that. It doesn't do it super > efficiently I'm guessing (since I used a StringBuffer), but I'm open to > suggestions on how to make it better. > One other thing is that apparently this analyzer works fine for analysis > (e.g., it produces the desired tokens), however, for sorting in Solr I'm > getting null sort tokens. Need to figure out why. > Here ya go! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting
[ https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated LUCENE-3413: -- Attachment: (was: LUCENE-3413.Mattmann.090311.2.patch) > CombiningFilter to recombine tokens into a single token for sorting > --- > > Key: LUCENE-3413 > URL: https://issues.apache.org/jira/browse/LUCENE-3413 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 2.9.3 >Reporter: Chris A. Mattmann >Priority: Minor > Attachments: LUCENE-3413.Mattmann.090311.patch.txt > > > I whipped up this CombiningFilter for the following use case: > I've got a bunch of titles of e.g., Books, such as: > The Grapes of Wrath > Tommy Tommerson saves the World > Top of the World > The Tales of Beedle the Bard > Born Free > etc. > I want to sort these titles using a String field that includes stopword > analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), > etc. I created an analysis chain in Solr for this that was based off of > *alphaOnlySort*, which looks like this: > {code:xml} > omitNorms="true"> > > > > > > > > > pattern="([^a-z])" replacement="" replace="all" > /> > > > {code} > The issue with alphaOnlySort is that it doesn't support stopword remove or > synonyms because those are based on the original token level instead of the > full strings produced by the KeywordTokenizer (which does not do > tokenization). I needed a filter that would allow me to change alphaOnlySort > and its analysis chain from using KeywordTokenizer to using > WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, > take "The Grapes of Wrath". I needed a way for it to get turned into: > {noformat} > grapes of wrath > {noformat} > And then to combine those tokens into a single token: > {noformat} > grapesofwrath > {noformat} > The attached CombiningFilter takes care of that. It doesn't do it super > efficiently I'm guessing (since I used a StringBuffer), but I'm open to > suggestions on how to make it better. > One other thing is that apparently this analyzer works fine for analysis > (e.g., it produces the desired tokens), however, for sorting in Solr I'm > getting null sort tokens. Need to figure out why. > Here ya go! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting
[ https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated LUCENE-3413: -- Attachment: (was: LUCENE-3413.Mattmann.090511.patch.txt) > CombiningFilter to recombine tokens into a single token for sorting > --- > > Key: LUCENE-3413 > URL: https://issues.apache.org/jira/browse/LUCENE-3413 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 2.9.3 >Reporter: Chris A. Mattmann >Priority: Minor > Attachments: LUCENE-3413.Mattmann.090311.patch.txt > > > I whipped up this CombiningFilter for the following use case: > I've got a bunch of titles of e.g., Books, such as: > The Grapes of Wrath > Tommy Tommerson saves the World > Top of the World > The Tales of Beedle the Bard > Born Free > etc. > I want to sort these titles using a String field that includes stopword > analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), > etc. I created an analysis chain in Solr for this that was based off of > *alphaOnlySort*, which looks like this: > {code:xml} > omitNorms="true"> > > > > > > > > > pattern="([^a-z])" replacement="" replace="all" > /> > > > {code} > The issue with alphaOnlySort is that it doesn't support stopword remove or > synonyms because those are based on the original token level instead of the > full strings produced by the KeywordTokenizer (which does not do > tokenization). I needed a filter that would allow me to change alphaOnlySort > and its analysis chain from using KeywordTokenizer to using > WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, > take "The Grapes of Wrath". I needed a way for it to get turned into: > {noformat} > grapes of wrath > {noformat} > And then to combine those tokens into a single token: > {noformat} > grapesofwrath > {noformat} > The attached CombiningFilter takes care of that. It doesn't do it super > efficiently I'm guessing (since I used a StringBuffer), but I'm open to > suggestions on how to make it better. > One other thing is that apparently this analyzer works fine for analysis > (e.g., it produces the desired tokens), however, for sorting in Solr I'm > getting null sort tokens. Need to figure out why. > Here ya go! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting
[ https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated LUCENE-3413: -- Attachment: (was: LUCENE-3413.Mattmann.090511.patch.txt) > CombiningFilter to recombine tokens into a single token for sorting > --- > > Key: LUCENE-3413 > URL: https://issues.apache.org/jira/browse/LUCENE-3413 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 2.9.3 >Reporter: Chris A. Mattmann >Priority: Minor > Attachments: LUCENE-3413.Mattmann.090311.2.patch, > LUCENE-3413.Mattmann.090311.patch.txt, LUCENE-3413.Mattmann.090511.patch.txt > > > I whipped up this CombiningFilter for the following use case: > I've got a bunch of titles of e.g., Books, such as: > The Grapes of Wrath > Tommy Tommerson saves the World > Top of the World > The Tales of Beedle the Bard > Born Free > etc. > I want to sort these titles using a String field that includes stopword > analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), > etc. I created an analysis chain in Solr for this that was based off of > *alphaOnlySort*, which looks like this: > {code:xml} > omitNorms="true"> > > > > > > > > > pattern="([^a-z])" replacement="" replace="all" > /> > > > {code} > The issue with alphaOnlySort is that it doesn't support stopword remove or > synonyms because those are based on the original token level instead of the > full strings produced by the KeywordTokenizer (which does not do > tokenization). I needed a filter that would allow me to change alphaOnlySort > and its analysis chain from using KeywordTokenizer to using > WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, > take "The Grapes of Wrath". I needed a way for it to get turned into: > {noformat} > grapes of wrath > {noformat} > And then to combine those tokens into a single token: > {noformat} > grapesofwrath > {noformat} > The attached CombiningFilter takes care of that. It doesn't do it super > efficiently I'm guessing (since I used a StringBuffer), but I'm open to > suggestions on how to make it better. > One other thing is that apparently this analyzer works fine for analysis > (e.g., it produces the desired tokens), however, for sorting in Solr I'm > getting null sort tokens. Need to figure out why. > Here ya go! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting
[ https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated LUCENE-3413: -- Attachment: LUCENE-3413.Mattmann.090511.patch.txt - updated patch fix package names. This patch applies against the latest trunk. > CombiningFilter to recombine tokens into a single token for sorting > --- > > Key: LUCENE-3413 > URL: https://issues.apache.org/jira/browse/LUCENE-3413 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 2.9.3 >Reporter: Chris A. Mattmann >Priority: Minor > Attachments: LUCENE-3413.Mattmann.090311.2.patch, > LUCENE-3413.Mattmann.090311.patch.txt, LUCENE-3413.Mattmann.090511.patch.txt > > > I whipped up this CombiningFilter for the following use case: > I've got a bunch of titles of e.g., Books, such as: > The Grapes of Wrath > Tommy Tommerson saves the World > Top of the World > The Tales of Beedle the Bard > Born Free > etc. > I want to sort these titles using a String field that includes stopword > analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), > etc. I created an analysis chain in Solr for this that was based off of > *alphaOnlySort*, which looks like this: > {code:xml} > omitNorms="true"> > > > > > > > > > pattern="([^a-z])" replacement="" replace="all" > /> > > > {code} > The issue with alphaOnlySort is that it doesn't support stopword remove or > synonyms because those are based on the original token level instead of the > full strings produced by the KeywordTokenizer (which does not do > tokenization). I needed a filter that would allow me to change alphaOnlySort > and its analysis chain from using KeywordTokenizer to using > WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, > take "The Grapes of Wrath". I needed a way for it to get turned into: > {noformat} > grapes of wrath > {noformat} > And then to combine those tokens into a single token: > {noformat} > grapesofwrath > {noformat} > The attached CombiningFilter takes care of that. It doesn't do it super > efficiently I'm guessing (since I used a StringBuffer), but I'm open to > suggestions on how to make it better. > One other thing is that apparently this analyzer works fine for analysis > (e.g., it produces the desired tokens), however, for sorting in Solr I'm > getting null sort tokens. Need to figure out why. > Here ya go! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting
[ https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated LUCENE-3413: -- Attachment: LUCENE-3413.Mattmann.090511.patch.txt - updated patch addressing comments from Simon. Chris Male suggested renaming it, but I couldn't come up with a better name. Maybe we could call it CombiningTokenFilter, or something for specificity, but I'll leave that part up to you guys. > CombiningFilter to recombine tokens into a single token for sorting > --- > > Key: LUCENE-3413 > URL: https://issues.apache.org/jira/browse/LUCENE-3413 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 2.9.3 >Reporter: Chris A. Mattmann >Priority: Minor > Attachments: LUCENE-3413.Mattmann.090311.2.patch, > LUCENE-3413.Mattmann.090311.patch.txt, LUCENE-3413.Mattmann.090511.patch.txt > > > I whipped up this CombiningFilter for the following use case: > I've got a bunch of titles of e.g., Books, such as: > The Grapes of Wrath > Tommy Tommerson saves the World > Top of the World > The Tales of Beedle the Bard > Born Free > etc. > I want to sort these titles using a String field that includes stopword > analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), > etc. I created an analysis chain in Solr for this that was based off of > *alphaOnlySort*, which looks like this: > {code:xml} > omitNorms="true"> > > > > > > > > > pattern="([^a-z])" replacement="" replace="all" > /> > > > {code} > The issue with alphaOnlySort is that it doesn't support stopword remove or > synonyms because those are based on the original token level instead of the > full strings produced by the KeywordTokenizer (which does not do > tokenization). I needed a filter that would allow me to change alphaOnlySort > and its analysis chain from using KeywordTokenizer to using > WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, > take "The Grapes of Wrath". I needed a way for it to get turned into: > {noformat} > grapes of wrath > {noformat} > And then to combine those tokens into a single token: > {noformat} > grapesofwrath > {noformat} > The attached CombiningFilter takes care of that. It doesn't do it super > efficiently I'm guessing (since I used a StringBuffer), but I'm open to > suggestions on how to make it better. > One other thing is that apparently this analyzer works fine for analysis > (e.g., it produces the desired tokens), however, for sorting in Solr I'm > getting null sort tokens. Need to figure out why. > Here ya go! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments
[ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097727#comment-13097727 ] Koji Sekiguchi commented on LUCENE-1824: Forgot one comment. I've not taken care of Solr yet in the patch. > FastVectorHighlighter truncates words at beginning and end of fragments > --- > > Key: LUCENE-1824 > URL: https://issues.apache.org/jira/browse/LUCENE-1824 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter > Environment: any >Reporter: Alex Vigdor >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-1824.patch, LUCENE-1824.patch > > > FastVectorHighlighter does not take word boundaries into consideration when > building fragments, so that in most cases the first and last word of a > fragment are truncated. This makes the highlights less legible than they > should be. I will attach a patch to BaseFragmentBuilder that resolves this > by expanding the start and end boundaries of the fragment to the first > whitespace character on either side of the fragment, or the beginning or end > of the source text, whichever comes first. This significantly improves > legibility, at the cost of returning a slightly larger number of characters > than specified for the fragment size. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments
[ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-1824: --- Attachment: LUCENE-1824.patch First draft. I introduced BoundaryScanner interface and two implementations of the interface, Simple and BreakIterator. SimpleBoundaryScanner uses the following default boundary chars: {code} public static final Character[] DEFAULT_BOUNDARY_CHARS = {'.', ',', '!', '?', '(', '[', '{', '\t', '\n'}; {code} And they are used by SimpleBoundaryScanner to find word/sentence boundary. BreakIteratorBoundaryScanner can also be used to find the break of char/word/sentence/line. I made BaseFragmentsBuilder boundary-aware, rather than creating a new FragmentsBuilder something like BoundaryAwareFragmentsBuilder. As a result, all FragmentsBuilder is now boundary-aware natively, as long as using an appropriate BoundaryScanner. I've not touched test yet. Because this patch changes fragments boundaries, the existing test should go fail! > FastVectorHighlighter truncates words at beginning and end of fragments > --- > > Key: LUCENE-1824 > URL: https://issues.apache.org/jira/browse/LUCENE-1824 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter > Environment: any >Reporter: Alex Vigdor >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-1824.patch, LUCENE-1824.patch > > > FastVectorHighlighter does not take word boundaries into consideration when > building fragments, so that in most cases the first and last word of a > fragment are truncated. This makes the highlights less legible than they > should be. I will attach a patch to BaseFragmentBuilder that resolves this > by expanding the start and end boundaries of the fragment to the first > whitespace character on either side of the fragment, or the beginning or end > of the source text, whichever comes first. This significantly improves > legibility, at the cost of returning a slightly larger number of characters > than specified for the fragment size. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2204) Cross-version replication broken by new javabin format
[ https://issues.apache.org/jira/browse/SOLR-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated SOLR-2204: --- Attachment: SOLR-2204.patch > Cross-version replication broken by new javabin format > -- > > Key: SOLR-2204 > URL: https://issues.apache.org/jira/browse/SOLR-2204 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 3.1 > Environment: Linux idxst0-a 2.6.18-194.3.1.el5.centos.plusxen #1 SMP > Wed May 19 09:59:34 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >Reporter: Shawn Heisey > Fix For: 3.4, 4.0 > > Attachments: SOLR-2204.patch, SOLR-2204.patch > > > Slave server is branch_3x, revision 1027974. Master server is 1.4.1. > Replication fails because of the new javabin format. > SEVERE: Master at: http://HOST:8983/solr/live/replication is not available. > Index fetch failed. Exception: Invalid version or the data in not in > 'javabin' format > Switching Solr's internally generated requests to XML, or adding support for > both javabin versions would get rid of this problem. I do not know how to do > either of these things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2204) Cross-version replication broken by new javabin format
[ https://issues.apache.org/jira/browse/SOLR-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097676#comment-13097676 ] Mike Sokolov commented on SOLR-2204: I'm posting a more fully-realized patch now. This is an important issue for us, not just because of replication, but also because we may support a bunch of different apps on a single server, would like to upgrade such a server, but can't upgrade all the apps at once. Some might be stuck on an old version for some time since we are locked into our client's update schedules. We could set up old and new servers and migrate the apps one by one, but it just seemed to me that the flexibility of being able to mix versions was worth some degree of pain. This patch restores support for version 1 utf-8 encoding to JavaBinCodec to be used as a fallback when communicating with older peers. When a v2 server detects a v1 client, it responds using v1. The javabin version is inferred from the version byte read when unmarshalling binary content. However, non-update requests won't have any such version info, so I increased the version passed on every HTTP request, from 2.2 to 3.4 and also use this string to detect older peers. I may have missed the significance of this value and broken something else: wiser heads, please review! The SolrJ client behaves a bit differently since it has no way of knowing in advance what version the server is. With this patch, v2 clients detect a version mismatch error by parsing the HTTP response text, retry and then fall back to v1 for all future requests by recording the server javabin version in the RequestWriter. Testing this requires simulating the old behavior (ie forcing either the client or server into v1 mode). To do this via jetty seemed to require a built-in hook (in BinaryUpdateRequestHandler) for that, used only for testing, which would be nice to avoid, but I didn't see how. Also - JettySolrRunner offers a configfile param, but it didn't seem to have any effect, so I added a check for the system property in CoreContainer, but maybe I missed something and there is a better way to do this. > Cross-version replication broken by new javabin format > -- > > Key: SOLR-2204 > URL: https://issues.apache.org/jira/browse/SOLR-2204 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 3.1 > Environment: Linux idxst0-a 2.6.18-194.3.1.el5.centos.plusxen #1 SMP > Wed May 19 09:59:34 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >Reporter: Shawn Heisey > Fix For: 3.4, 4.0 > > Attachments: SOLR-2204.patch, SOLR-2204.patch > > > Slave server is branch_3x, revision 1027974. Master server is 1.4.1. > Replication fails because of the new javabin format. > SEVERE: Master at: http://HOST:8983/solr/live/replication is not available. > Index fetch failed. Exception: Invalid version or the data in not in > 'javabin' format > Switching Solr's internally generated requests to XML, or adding support for > both javabin versions would get rid of this problem. I do not know how to do > either of these things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2700) transaction logging
[ https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097593#comment-13097593 ] Yonik Seeley commented on SOLR-2700: bq. OK, I think we're getting close to committing now. Urggg - scratch that. At some point in the past, some of the asserts were commented out to aid in debugging and I never re-enabled them. The realtime-get test now fails, so I need to dig into that again. > transaction logging > --- > > Key: SOLR-2700 > URL: https://issues.apache.org/jira/browse/SOLR-2700 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, > SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch > > > A transaction log is needed for durability of updates, for a more performant > realtime-get, and for replaying updates to recovering peers. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] 2.9.4
Not bad idea, but I would prefer community's feedback instead of testing against all projects using Lucene.Net DIGY -Original Message- From: Matt Warren [mailto:mattd...@gmail.com] Sent: Monday, September 05, 2011 11:09 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] 2.9.4 If you want to test it against a large project you could take a look at how RavenDB uses it? At the moment it's using 2.9.2 ( https://github.com/ayende/ravendb/tree/master/SharedLibs/Sources/Lucene2.9.2 ) but if you were to recompile it against 2.9.4 and check that all it's unit-tests still run that would give you quite a large test case. On 5 September 2011 19:22, Prescott Nasser wrote: > > Hey All, > > How do people feel about the 2.9.4 code base? I've been using it for > sometime, for my use cases it's be excellent. Do we feel we are ready to > package this up and make it an official release? Or do we have some tasks > left to take care of? > > ~Prescott - Bu iletide virüs bulunamadı. AVG tarafından kontrol edildi - www.avg.com Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4478 - Sürüm Tarihi: 05.09.2011
[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
[ https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097318#comment-13097318 ] Jan Høydahl commented on LUCENE-3414: - +1 We now use Lucene Hunspell for a few customer deployments, and it would be great to have it the analysis module, since it supports some 70-80 languages out of the box, and gives great flexibility since you can edit - or augment - the dictionaries to change behaviour and fix stemming bugs. As a side benefit I also expect that when the Ooo dictionaries get more use in Lucene, users will over time be able to extend and improve the dictionaries, and contribute their changes back, benefiting also Ooo users. > Bring Hunspell for Lucene into analysis module > -- > > Key: LUCENE-3414 > URL: https://issues.apache.org/jira/browse/LUCENE-3414 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Chris Male > > Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the > Hunspell algorithm. It has the benefit of supporting dictionaries for a wide > array of languages. > It seems to still be being used but has fallen out of date. I think it would > benefit from being inside the analysis module where additional features such > as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2383) Velocity: Generalize range and date facet display
[ https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097315#comment-13097315 ] Jan Høydahl commented on SOLR-2383: --- @Bill Yes, the exclusive upper range syntax [x TO y} only works on 4.0, and I haven't found a good way to emulate the same behaviour in 3.x. This means that you'll sometimes see more hits when clicking a facet than the number presented, being the values exactly on the upper bound. Do you have a suggestion? > Velocity: Generalize range and date facet display > - > > Key: SOLR-2383 > URL: https://issues.apache.org/jira/browse/SOLR-2383 > Project: Solr > Issue Type: Bug > Components: Response Writers >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: facet, range, velocity > Fix For: 3.4, 4.0 > > Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, > SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, > SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, > SOLR-2383.patch, SOLR-2383.patch > > > Velocity (/browse) GUI has hardcoded price range facet and a hardcoded > manufacturedate_dt date facet. Need general solution which work for any > facet.range and facet.date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2280) commitWithin ignored for a delete query
[ https://issues.apache.org/jira/browse/SOLR-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juan Grande updated SOLR-2280: -- Attachment: SOLR-2280.patch I'm submitting a patch that implements commitWithin on deletes. The patch is for the 3x branch. Two things should be noted: # The commit is fired even if the delete doesn't really delete any document. # When using the BinaryUpdateRequestHandler the params of the UpdateRequest are loaded when parsing the docs. If the request doesn't include a docs list, then the params aren't loaded. I added a workaround for this, but SOLR-1164 should solve this problem definitely. > commitWithin ignored for a delete query > --- > > Key: SOLR-2280 > URL: https://issues.apache.org/jira/browse/SOLR-2280 > Project: Solr > Issue Type: Bug > Components: clients - java >Reporter: David Smiley >Priority: Minor > Fix For: 3.4, 4.0 > > Attachments: SOLR-2280.patch > > > The commitWithin option on an UpdateRequest is only honored for requests > containing new documents. It does not, for example, work with a delete > query. The following doesn't work as expected: > {code:java} > UpdateRequest request = new UpdateRequest(); > request.deleteById("id123"); > request.setCommitWithin(1000); > solrServer.request(request); > {code} > In my opinion, the commitWithin attribute should be permitted on the > xml tag as well as . Such a change would go in > XMLLoader.java and its would have some ramifications elsewhere too. Once > this is done, then UpdateRequest.getXml() can be updated to generate the > right XML. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-1834) Document level security
Yes, there has been much work and discussions on doc-level security in Solr. The main problem with building in application-level security into Solr is that there are myriad ways to approach it, depending on requirements, as well as plenty of issues to address generally regarding security - e.g. where do the permissions come from, how to verify the caller, etc. etc. Currently, there are 3 patches available to this end: SOLR-1834 SOLR-1895 SOLR-1872 1834 and 1895 use LCF to provide the security permissions. 1872 uses a solr-local ACL file to deliver permissions. The current trunk status quo is to leave security up to the web container (e.g. Tomcat). This makes sense, as the approaches above are relevant (or not) depending on your specific requirements. HTH Peter On Mon, Sep 5, 2011 at 11:18 AM, Ravish Bhagdev (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097108#comment-13097108 > ] > > Ravish Bhagdev commented on SOLR-1834: > -- > > are there any plans for adding this or other document level or other search > security solutions into solr? This requirement is quite critical for most > enterprise search apps I would have thought? Has this been discussed in > detail elsewhere? > >> Document level security >> --- >> >> Key: SOLR-1834 >> URL: https://issues.apache.org/jira/browse/SOLR-1834 >> Project: Solr >> Issue Type: New Feature >> Components: SearchComponents - other >> Affects Versions: 1.4 >> Reporter: Anders Rask >> Attachments: SOLR-1834-with-LCF.patch, SOLR-1834.patch, html.rar >> >> >> Attached to this issue is a patch that includes a framework for enabling >> document level security in Solr as a search component. I did this as a >> Master thesis project at Findwise in Stockholm and Findwise has now decided >> to contribute it back to the community. The component was developed in >> spring 2009 and has been in use at a customer since autumn the same year. >> There is a simple demo application up at >> http://demo.findwise.se:8880/SolrSecurity/ which also explains more about >> the component and how to set it up. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash
[ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097257#comment-13097257 ] Jason Rutherglen commented on LUCENE-3199: -- Ok, solved the above comment by taking the sorted ord array and building a new reverse array from that... > Add non-desctructive sort to BytesRefHash > - > > Key: LUCENE-3199 > URL: https://issues.apache.org/jira/browse/LUCENE-3199 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, > LUCENE-3199.patch > > > Currently the BytesRefHash is destructive. We can add a method that returns > a non-destructively generated int[]. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash
[ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097246#comment-13097246 ] Jason Rutherglen commented on LUCENE-3199: -- I started integrating the patch into LUCENE-2312. I think the main functionality missing is a reverse int[] that points from a term id to the sorted ords array. That array would be used for implementing the RT version of DocTermsIndex, where a doc id -> term id -> sorted term id index. > Add non-desctructive sort to BytesRefHash > - > > Key: LUCENE-3199 > URL: https://issues.apache.org/jira/browse/LUCENE-3199 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, > LUCENE-3199.patch > > > Currently the BytesRefHash is destructive. We can add a method that returns > a non-destructively generated int[]. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2383) Velocity: Generalize range and date facet display
[ https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097197#comment-13097197 ] Bill Bell commented on SOLR-2383: - Does this mean the [0 TO 8} will not work? popularity:[3 TO 6} ? Thanks. > Velocity: Generalize range and date facet display > - > > Key: SOLR-2383 > URL: https://issues.apache.org/jira/browse/SOLR-2383 > Project: Solr > Issue Type: Bug > Components: Response Writers >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: facet, range, velocity > Fix For: 3.4, 4.0 > > Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, > SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, > SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, > SOLR-2383.patch, SOLR-2383.patch > > > Velocity (/browse) GUI has hardcoded price range facet and a hardcoded > manufacturedate_dt date facet. Need general solution which work for any > facet.range and facet.date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Solr Wiki] Update of "NewSolrCloudDesign" by YonikSeeley
I'm wondering if we shouldn't ditch the new term "partition" here and just use "replica"? In the past, we've sort of used "shard" to mean both a single physical index, and the logical piece of the larger collection. In practice, this ambiguity normally isn't much of a problem as it's normally clear by context and when it's not we sometimes throw in the word "replica". Examples: "Doc X belongs on Shard Z", "Shard Z on this node is corrupt". Refreshing my memory on our ZK layout, it seems like we are using "shards" in the logical sense there. /COLLECTIONS (v=6 children=1) COLLECTION1 (v=0 children=1) "configName=myconf" SHARDS (v=0 children=1) SHARD1 (v=0 children=1) ROGUE.LOCAL:8983_SOLR_ (v=0) "node_name=Rogue.local:8983_solr url=http://Rogue.local:8983/solr/"; So perhaps we should just continue that, and change "partition" to "replica" when necessary to prevent ambiguity? -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097182#comment-13097182 ] Varun Thacker commented on LUCENE-3178: --- bq. If we pass down IOContext to NMapIndexInput and in the ctor use mmap and then use madvise with the appropriate flag ( depending on the Context). Is that the correct way to go about it ? Any suggestions on this? > Native MMapDir > -- > > Key: LUCENE-3178 > URL: https://issues.apache.org/jira/browse/LUCENE-3178 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless > > Spinoff from LUCENE-2793. > Just like we will create native Dir impl (UnixDirectory) to pass the right OS > level IO flags depending on the IOContext, we could in theory do something > similar with MMapDir. > The problem is MMap is apparently quite hairy... and to pass the flags the > native code would need to invoke mmap (I think?), unlike UnixDir where the > code "only" has to open the file handle. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097168#comment-13097168 ] Robert Muir commented on LUCENE-3390: - +1 to revisit how this was done in trunk. > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Assignee: Doron Cohen >Priority: Minor > Labels: double, float, int, long, numeric, sort > Fix For: 3.4 > > Attachments: LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097164#comment-13097164 ] Michael McCandless commented on LUCENE-3390: Also, can we use FastBitSet, not OpenBitSet, here? > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Assignee: Doron Cohen >Priority: Minor > Labels: double, float, int, long, numeric, sort > Fix For: 3.4 > > Attachments: LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097161#comment-13097161 ] Michael McCandless commented on LUCENE-3390: I like how we solved this in 3.x! Ie, a whole separate entry for holding a bitset indicating if the doc has a value. This is generally useful, alone, ie one can just pull this bitset and use it directly. It's also nice because it's one source that computes this, vs N copies (one per value) that we have on trunk. I guess the downside is it takes 2 passes over the terms (one to get the values, another to fill this bitset), but maybe that tradeoff is worth not duplicating the code all over... maybe we should take a similar approach in trunk? > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Assignee: Doron Cohen >Priority: Minor > Labels: double, float, int, long, numeric, sort > Fix For: 3.4 > > Attachments: LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3414) Bring Hunspell for Lucene into analysis module
Bring Hunspell for Lucene into analysis module -- Key: LUCENE-3414 URL: https://issues.apache.org/jira/browse/LUCENE-3414 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Chris Male Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the Hunspell algorithm. It has the benefit of supporting dictionaries for a wide array of languages. It seems to still be being used but has fallen out of date. I think it would benefit from being inside the analysis module where additional features such as decompounding support, could be added. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Heads up for a few planned commit's
Jan, I haven't looked at these issues but you should go ahead and commit if you are comfortable with the changes! Nobody responding despite pleas for review means "lazy" consensus, ie it means others are OK with the change. Mike McCandless http://blog.mikemccandless.com On Mon, Sep 5, 2011 at 6:34 AM, Jan Høydahl wrote: > Hi, > > As I'm quite new as a committer, I want to make sure I follow the right > procedures. > I have several Jira's with patches that I feel are ready for commit. > They have tests which pass, but there has been limited peer review despite > requests for such in the issues themselves. > > These are the issues I plan to commit shortly. Would be great to get thumbs > up/down from more senior committers: > > SOLR-2741: Bugs in facet range display in trunk > These are bug-fixes on previously committed SOLR-2383 code in trunk. > > SOLR-2383: Velocity: Generalize range and date facet display > I plan to commit the patch SOLR-2383-branch_3x.patch which is a backport to > 3x, including the improvements from SOLR-2741 > > SOLR-2540: CommitWithin as an Update Request parameter > This gives &commitWithin=xxx capabilities to XML-URH, CSV-URH and > Extracting-URH (similar to what's in Binary-URH and JSON-URH already) > I plan to commit this both to trunk and 3x > > SOLR-2742: Add commitWithin to convenience signatures for SolrServer.add(..) > This one simply introduces convenience signatures in SolrJ to more easily > specify commitWithin on ADDs > I plan to commit this both to trunk and 3x > > Thanks for any feedback on any of these! > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)
[ https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2742: -- Attachment: SOLR-2742.patch Made better JavaDocs for all public methods in SolrServer, including @param tags. > Add commitWithin to convenience signatures for SolrServer.add(..) > - > > Key: SOLR-2742 > URL: https://issues.apache.org/jira/browse/SOLR-2742 > Project: Solr > Issue Type: Improvement > Components: clients - java >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: SolrJ, commitWithin > Fix For: 3.4, 4.0 > > Attachments: SOLR-2742.patch, SOLR-2742.patch, SOLR-2742.patch > > > Today you need to manually create an UpdateRequest in order to set the > commitWithin value. > We should provide an optional commitWithin parameter on all > SolrServer.add(..) methods as a convenience -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)
[ https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097123#comment-13097123 ] Chris Male commented on SOLR-2742: -- Hey Jan, Looks great! +1 to committing to trunk and back porting. Just one personal nitpick, if we're going to add Javadocs to the SolrServer methods, can we add full javadocs? > Add commitWithin to convenience signatures for SolrServer.add(..) > - > > Key: SOLR-2742 > URL: https://issues.apache.org/jira/browse/SOLR-2742 > Project: Solr > Issue Type: Improvement > Components: clients - java >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: SolrJ, commitWithin > Fix For: 3.4, 4.0 > > Attachments: SOLR-2742.patch, SOLR-2742.patch > > > Today you need to manually create an UpdateRequest in order to set the > commitWithin value. > We should provide an optional commitWithin parameter on all > SolrServer.add(..) methods as a convenience -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3396: --- Attachment: LUCENE-3396-rab.patch Patch updated to trunk. Generic is removed from ReuseStrategy. > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Heads up for a few planned commit's
Hi, As I'm quite new as a committer, I want to make sure I follow the right procedures. I have several Jira's with patches that I feel are ready for commit. They have tests which pass, but there has been limited peer review despite requests for such in the issues themselves. These are the issues I plan to commit shortly. Would be great to get thumbs up/down from more senior committers: SOLR-2741: Bugs in facet range display in trunk These are bug-fixes on previously committed SOLR-2383 code in trunk. SOLR-2383: Velocity: Generalize range and date facet display I plan to commit the patch SOLR-2383-branch_3x.patch which is a backport to 3x, including the improvements from SOLR-2741 SOLR-2540: CommitWithin as an Update Request parameter This gives &commitWithin=xxx capabilities to XML-URH, CSV-URH and Extracting-URH (similar to what's in Binary-URH and JSON-URH already) I plan to commit this both to trunk and 3x SOLR-2742: Add commitWithin to convenience signatures for SolrServer.add(..) This one simply introduces convenience signatures in SolrJ to more easily specify commitWithin on ADDs I plan to commit this both to trunk and 3x Thanks for any feedback on any of these! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)
[ https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097115#comment-13097115 ] Jan Høydahl commented on SOLR-2742: --- Plan to commit this to both trunk and 3x branch in a couple of days > Add commitWithin to convenience signatures for SolrServer.add(..) > - > > Key: SOLR-2742 > URL: https://issues.apache.org/jira/browse/SOLR-2742 > Project: Solr > Issue Type: Improvement > Components: clients - java >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: SolrJ, commitWithin > Fix For: 3.4, 4.0 > > Attachments: SOLR-2742.patch, SOLR-2742.patch > > > Today you need to manually create an UpdateRequest in order to set the > commitWithin value. > We should provide an optional commitWithin parameter on all > SolrServer.add(..) methods as a convenience -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2383) Velocity: Generalize range and date facet display
[ https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097109#comment-13097109 ] Jan Høydahl commented on SOLR-2383: --- Plan to commit this in a day or two, if no objections > Velocity: Generalize range and date facet display > - > > Key: SOLR-2383 > URL: https://issues.apache.org/jira/browse/SOLR-2383 > Project: Solr > Issue Type: Bug > Components: Response Writers >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: facet, range, velocity > Fix For: 3.4, 4.0 > > Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, > SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, > SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, > SOLR-2383.patch, SOLR-2383.patch > > > Velocity (/browse) GUI has hardcoded price range facet and a hardcoded > manufacturedate_dt date facet. Need general solution which work for any > facet.range and facet.date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2741) Bugs in facet range display in trunk
[ https://issues.apache.org/jira/browse/SOLR-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097110#comment-13097110 ] Jan Høydahl commented on SOLR-2741: --- Plan to commit this in a day or two if no objections > Bugs in facet range display in trunk > > > Key: SOLR-2741 > URL: https://issues.apache.org/jira/browse/SOLR-2741 > Project: Solr > Issue Type: Sub-task > Components: web gui >Affects Versions: 4.0 >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Fix For: 4.0 > > Attachments: SOLR-2741.patch, SOLR-2741.patch > > > In SOLR-2383 the hardcoded display of some facet ranges were replaced with > automatic, dynamic display. > There were some shortcomings: > a) Float range to-values were sometimes displayed as int > b) Capitalizing the facet name was a mistake, sometimes looks good, sometimes > not > c) facet.range on a date did not work - dates were displayed in whatever > locale formatting > d) The deprecated facet.date syntax was used in solrconfig.xml instead of the > new facet.range -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1834) Document level security
[ https://issues.apache.org/jira/browse/SOLR-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097108#comment-13097108 ] Ravish Bhagdev commented on SOLR-1834: -- are there any plans for adding this or other document level or other search security solutions into solr? This requirement is quite critical for most enterprise search apps I would have thought? Has this been discussed in detail elsewhere? > Document level security > --- > > Key: SOLR-1834 > URL: https://issues.apache.org/jira/browse/SOLR-1834 > Project: Solr > Issue Type: New Feature > Components: SearchComponents - other >Affects Versions: 1.4 >Reporter: Anders Rask > Attachments: SOLR-1834-with-LCF.patch, SOLR-1834.patch, html.rar > > > Attached to this issue is a patch that includes a framework for enabling > document level security in Solr as a search component. I did this as a Master > thesis project at Findwise in Stockholm and Findwise has now decided to > contribute it back to the community. The component was developed in spring > 2009 and has been in use at a customer since autumn the same year. > There is a simple demo application up at > http://demo.findwise.se:8880/SolrSecurity/ which also explains more about the > component and how to set it up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2540) CommitWithin as an Update Request parameter
[ https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2540: -- Attachment: SOLR-2540.patch Updated patch with more tests. Will commit in a day or two > CommitWithin as an Update Request parameter > --- > > Key: SOLR-2540 > URL: https://issues.apache.org/jira/browse/SOLR-2540 > Project: Solr > Issue Type: New Feature > Components: update >Affects Versions: 3.1 >Reporter: Jan Høydahl >Assignee: Jan Høydahl > Labels: commit, commitWithin > Attachments: SOLR-2540.patch, SOLR-2540.patch > > > It would be useful to support commitWithin HTTP GET request param on all > UpdateRequestHandlers. > That way, you could set commitWithin on the request (for XML, JSON, CSV, > Binary and Extracting handlers) with this syntax: > {code} > curl > http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1 >-H "Content-Type: application/pdf" --data-binary @file.pdf > {code} > PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already > support this syntax. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097083#comment-13097083 ] Uwe Schindler commented on LUCENE-3396: --- I agree its somehow overkill. But if not on class level I would even remove the T parameter from ther getter method, because it does not really fit, it is only used there, not even on the setter. There is no type enforcement anywhere, so the extra T is just to remove the casting on the caller of the protected method, but adding a SuppressWarnings on the implementor's side. So either make all T or use Object everywhere. > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097076#comment-13097076 ] Chris Male edited comment on LUCENE-3396 at 9/5/11 9:44 AM: Hi Uwe, I originally had ReuseStrategy with a generic type but then decided it was overkill since it only benefits implementations, not users of ReuseStrategy. If we want the extra type safety, I'll happily make the change. was (Author: cmale): Hi Uwe, I originally had ReuseStrategy with a generic type but then decided it was overkill. If we want the extra type safety, I'll happily make the change. > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097076#comment-13097076 ] Chris Male commented on LUCENE-3396: Hi Uwe, I originally had ReuseStrategy with a generic type but then decided it was overkill. If we want the extra type safety, I'll happily make the change. > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097072#comment-13097072 ] Uwe Schindler commented on LUCENE-3396: --- Hi, the ReuseStrategies look fine. I am just confused about the Generics. Why not make the whole abstract ReuseStrategy T-typed? Then also ThreadLocal is used and no casting anywhere. The subclasses for PerField and global then are typed to the correct class (Map<> or TSComponents). > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097069#comment-13097069 ] Simon Willnauer commented on LUCENE-3396: - this patch looks good to me! I like the reuse strategy and how you factored out the thread local stuff. I think we should commit and let bake this in? > Make TokenStream Reuse Mandatory for Analyzers > -- > > Key: LUCENE-3396 > URL: https://issues.apache.org/jira/browse/LUCENE-3396 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, > LUCENE-3396-rab.patch, LUCENE-3396-rab.patch > > > In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having > to return reusable TokenStreams. This is a big chunk of work, but its time > to bite the bullet. > I plan to attack this in the following way: > - Collapse the logic of ReusableAnalyzerBase into Analyzer > - Add a ReuseStrategy abstraction to Analyzer which controls whether the > TokenStreamComponents are reused globally (as they are today) or per-field. > - Convert all Analyzers over to using TokenStreamComponents. I've already > seen that some of the TokenStreams created in tests need some work to be > reusable (even if they aren't reused). > - Remove Analyzer.reusableTokenStream and convert everything over to using > .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org