[jira] [Updated] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-05 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3414:
---

Attachment: LUCENE-3414.patch

Patch with a port of the code.

Because most of the dictionaries are L/GPL, I've written my own dumb stupid 
dictionary for test purposes.

During testing I discovered a long standing bug to do with recursive 
application of rules This has now been fixed.

Code now is also version aware, as required by the CharArray* data structures.

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting

2011-09-05 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097737#comment-13097737
 ] 

Chris A. Mattmann commented on LUCENE-3413:
---

BTW, I couldn't get it to work by removing the firstCall variable using Simon's 
suggestion, so I left it in there. If you guys want to figure it out, go for 
it, but the patch I attached right now is working...thanks!

> CombiningFilter to recombine tokens into a single token for sorting
> ---
>
> Key: LUCENE-3413
> URL: https://issues.apache.org/jira/browse/LUCENE-3413
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 2.9.3
>Reporter: Chris A. Mattmann
>Priority: Minor
> Attachments: LUCENE-3413.Mattmann.090311.patch.txt, 
> LUCENE-3413.Mattmann.090511.patch.txt
>
>
> I whipped up this CombiningFilter for the following use case:
> I've got a bunch of titles of e.g., Books, such as:
> The Grapes of Wrath
> Tommy Tommerson saves the World
> Top of the World
> The Tales of Beedle the Bard
> Born Free
> etc.
> I want to sort these titles using a String field that includes stopword 
> analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), 
> etc. I created an analysis chain in Solr for this that was based off of 
> *alphaOnlySort*, which looks like this:
> {code:xml}
>  omitNorms="true">
>
> 
> 
> 
> 
> 
> 
> 
>  pattern="([^a-z])" replacement="" replace="all"
> /> 
>
> 
> {code}
> The issue with alphaOnlySort is that it doesn't support stopword remove or 
> synonyms because those are based on the original token level instead of the 
> full strings produced by the KeywordTokenizer (which does not do 
> tokenization). I needed a filter that would allow me to change alphaOnlySort 
> and its analysis chain from using KeywordTokenizer to using 
> WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, 
> take "The Grapes of Wrath". I needed a way for it to get turned into:
> {noformat}
> grapes of wrath
> {noformat}
> And then to combine those tokens into a single token:
> {noformat}
> grapesofwrath
> {noformat}
> The attached CombiningFilter takes care of that. It doesn't do it super 
> efficiently I'm guessing (since I used a StringBuffer), but I'm open to 
> suggestions on how to make it better. 
> One other thing is that apparently this analyzer works fine for analysis 
> (e.g., it produces the desired tokens), however, for sorting in Solr I'm 
> getting null sort tokens. Need to figure out why. 
> Here ya go!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting

2011-09-05 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated LUCENE-3413:
--

Attachment: LUCENE-3413.Mattmann.090511.patch.txt

- final updated patch

> CombiningFilter to recombine tokens into a single token for sorting
> ---
>
> Key: LUCENE-3413
> URL: https://issues.apache.org/jira/browse/LUCENE-3413
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 2.9.3
>Reporter: Chris A. Mattmann
>Priority: Minor
> Attachments: LUCENE-3413.Mattmann.090311.patch.txt, 
> LUCENE-3413.Mattmann.090511.patch.txt
>
>
> I whipped up this CombiningFilter for the following use case:
> I've got a bunch of titles of e.g., Books, such as:
> The Grapes of Wrath
> Tommy Tommerson saves the World
> Top of the World
> The Tales of Beedle the Bard
> Born Free
> etc.
> I want to sort these titles using a String field that includes stopword 
> analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), 
> etc. I created an analysis chain in Solr for this that was based off of 
> *alphaOnlySort*, which looks like this:
> {code:xml}
>  omitNorms="true">
>
> 
> 
> 
> 
> 
> 
> 
>  pattern="([^a-z])" replacement="" replace="all"
> /> 
>
> 
> {code}
> The issue with alphaOnlySort is that it doesn't support stopword remove or 
> synonyms because those are based on the original token level instead of the 
> full strings produced by the KeywordTokenizer (which does not do 
> tokenization). I needed a filter that would allow me to change alphaOnlySort 
> and its analysis chain from using KeywordTokenizer to using 
> WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, 
> take "The Grapes of Wrath". I needed a way for it to get turned into:
> {noformat}
> grapes of wrath
> {noformat}
> And then to combine those tokens into a single token:
> {noformat}
> grapesofwrath
> {noformat}
> The attached CombiningFilter takes care of that. It doesn't do it super 
> efficiently I'm guessing (since I used a StringBuffer), but I'm open to 
> suggestions on how to make it better. 
> One other thing is that apparently this analyzer works fine for analysis 
> (e.g., it produces the desired tokens), however, for sorting in Solr I'm 
> getting null sort tokens. Need to figure out why. 
> Here ya go!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting

2011-09-05 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated LUCENE-3413:
--

Attachment: (was: LUCENE-3413.Mattmann.090311.2.patch)

> CombiningFilter to recombine tokens into a single token for sorting
> ---
>
> Key: LUCENE-3413
> URL: https://issues.apache.org/jira/browse/LUCENE-3413
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 2.9.3
>Reporter: Chris A. Mattmann
>Priority: Minor
> Attachments: LUCENE-3413.Mattmann.090311.patch.txt
>
>
> I whipped up this CombiningFilter for the following use case:
> I've got a bunch of titles of e.g., Books, such as:
> The Grapes of Wrath
> Tommy Tommerson saves the World
> Top of the World
> The Tales of Beedle the Bard
> Born Free
> etc.
> I want to sort these titles using a String field that includes stopword 
> analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), 
> etc. I created an analysis chain in Solr for this that was based off of 
> *alphaOnlySort*, which looks like this:
> {code:xml}
>  omitNorms="true">
>
> 
> 
> 
> 
> 
> 
> 
>  pattern="([^a-z])" replacement="" replace="all"
> /> 
>
> 
> {code}
> The issue with alphaOnlySort is that it doesn't support stopword remove or 
> synonyms because those are based on the original token level instead of the 
> full strings produced by the KeywordTokenizer (which does not do 
> tokenization). I needed a filter that would allow me to change alphaOnlySort 
> and its analysis chain from using KeywordTokenizer to using 
> WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, 
> take "The Grapes of Wrath". I needed a way for it to get turned into:
> {noformat}
> grapes of wrath
> {noformat}
> And then to combine those tokens into a single token:
> {noformat}
> grapesofwrath
> {noformat}
> The attached CombiningFilter takes care of that. It doesn't do it super 
> efficiently I'm guessing (since I used a StringBuffer), but I'm open to 
> suggestions on how to make it better. 
> One other thing is that apparently this analyzer works fine for analysis 
> (e.g., it produces the desired tokens), however, for sorting in Solr I'm 
> getting null sort tokens. Need to figure out why. 
> Here ya go!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting

2011-09-05 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated LUCENE-3413:
--

Attachment: (was: LUCENE-3413.Mattmann.090511.patch.txt)

> CombiningFilter to recombine tokens into a single token for sorting
> ---
>
> Key: LUCENE-3413
> URL: https://issues.apache.org/jira/browse/LUCENE-3413
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 2.9.3
>Reporter: Chris A. Mattmann
>Priority: Minor
> Attachments: LUCENE-3413.Mattmann.090311.patch.txt
>
>
> I whipped up this CombiningFilter for the following use case:
> I've got a bunch of titles of e.g., Books, such as:
> The Grapes of Wrath
> Tommy Tommerson saves the World
> Top of the World
> The Tales of Beedle the Bard
> Born Free
> etc.
> I want to sort these titles using a String field that includes stopword 
> analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), 
> etc. I created an analysis chain in Solr for this that was based off of 
> *alphaOnlySort*, which looks like this:
> {code:xml}
>  omitNorms="true">
>
> 
> 
> 
> 
> 
> 
> 
>  pattern="([^a-z])" replacement="" replace="all"
> /> 
>
> 
> {code}
> The issue with alphaOnlySort is that it doesn't support stopword remove or 
> synonyms because those are based on the original token level instead of the 
> full strings produced by the KeywordTokenizer (which does not do 
> tokenization). I needed a filter that would allow me to change alphaOnlySort 
> and its analysis chain from using KeywordTokenizer to using 
> WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, 
> take "The Grapes of Wrath". I needed a way for it to get turned into:
> {noformat}
> grapes of wrath
> {noformat}
> And then to combine those tokens into a single token:
> {noformat}
> grapesofwrath
> {noformat}
> The attached CombiningFilter takes care of that. It doesn't do it super 
> efficiently I'm guessing (since I used a StringBuffer), but I'm open to 
> suggestions on how to make it better. 
> One other thing is that apparently this analyzer works fine for analysis 
> (e.g., it produces the desired tokens), however, for sorting in Solr I'm 
> getting null sort tokens. Need to figure out why. 
> Here ya go!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting

2011-09-05 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated LUCENE-3413:
--

Attachment: (was: LUCENE-3413.Mattmann.090511.patch.txt)

> CombiningFilter to recombine tokens into a single token for sorting
> ---
>
> Key: LUCENE-3413
> URL: https://issues.apache.org/jira/browse/LUCENE-3413
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 2.9.3
>Reporter: Chris A. Mattmann
>Priority: Minor
> Attachments: LUCENE-3413.Mattmann.090311.2.patch, 
> LUCENE-3413.Mattmann.090311.patch.txt, LUCENE-3413.Mattmann.090511.patch.txt
>
>
> I whipped up this CombiningFilter for the following use case:
> I've got a bunch of titles of e.g., Books, such as:
> The Grapes of Wrath
> Tommy Tommerson saves the World
> Top of the World
> The Tales of Beedle the Bard
> Born Free
> etc.
> I want to sort these titles using a String field that includes stopword 
> analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), 
> etc. I created an analysis chain in Solr for this that was based off of 
> *alphaOnlySort*, which looks like this:
> {code:xml}
>  omitNorms="true">
>
> 
> 
> 
> 
> 
> 
> 
>  pattern="([^a-z])" replacement="" replace="all"
> /> 
>
> 
> {code}
> The issue with alphaOnlySort is that it doesn't support stopword remove or 
> synonyms because those are based on the original token level instead of the 
> full strings produced by the KeywordTokenizer (which does not do 
> tokenization). I needed a filter that would allow me to change alphaOnlySort 
> and its analysis chain from using KeywordTokenizer to using 
> WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, 
> take "The Grapes of Wrath". I needed a way for it to get turned into:
> {noformat}
> grapes of wrath
> {noformat}
> And then to combine those tokens into a single token:
> {noformat}
> grapesofwrath
> {noformat}
> The attached CombiningFilter takes care of that. It doesn't do it super 
> efficiently I'm guessing (since I used a StringBuffer), but I'm open to 
> suggestions on how to make it better. 
> One other thing is that apparently this analyzer works fine for analysis 
> (e.g., it produces the desired tokens), however, for sorting in Solr I'm 
> getting null sort tokens. Need to figure out why. 
> Here ya go!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting

2011-09-05 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated LUCENE-3413:
--

Attachment: LUCENE-3413.Mattmann.090511.patch.txt

- updated patch fix package names. This patch applies against the latest trunk.

> CombiningFilter to recombine tokens into a single token for sorting
> ---
>
> Key: LUCENE-3413
> URL: https://issues.apache.org/jira/browse/LUCENE-3413
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 2.9.3
>Reporter: Chris A. Mattmann
>Priority: Minor
> Attachments: LUCENE-3413.Mattmann.090311.2.patch, 
> LUCENE-3413.Mattmann.090311.patch.txt, LUCENE-3413.Mattmann.090511.patch.txt
>
>
> I whipped up this CombiningFilter for the following use case:
> I've got a bunch of titles of e.g., Books, such as:
> The Grapes of Wrath
> Tommy Tommerson saves the World
> Top of the World
> The Tales of Beedle the Bard
> Born Free
> etc.
> I want to sort these titles using a String field that includes stopword 
> analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), 
> etc. I created an analysis chain in Solr for this that was based off of 
> *alphaOnlySort*, which looks like this:
> {code:xml}
>  omitNorms="true">
>
> 
> 
> 
> 
> 
> 
> 
>  pattern="([^a-z])" replacement="" replace="all"
> /> 
>
> 
> {code}
> The issue with alphaOnlySort is that it doesn't support stopword remove or 
> synonyms because those are based on the original token level instead of the 
> full strings produced by the KeywordTokenizer (which does not do 
> tokenization). I needed a filter that would allow me to change alphaOnlySort 
> and its analysis chain from using KeywordTokenizer to using 
> WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, 
> take "The Grapes of Wrath". I needed a way for it to get turned into:
> {noformat}
> grapes of wrath
> {noformat}
> And then to combine those tokens into a single token:
> {noformat}
> grapesofwrath
> {noformat}
> The attached CombiningFilter takes care of that. It doesn't do it super 
> efficiently I'm guessing (since I used a StringBuffer), but I'm open to 
> suggestions on how to make it better. 
> One other thing is that apparently this analyzer works fine for analysis 
> (e.g., it produces the desired tokens), however, for sorting in Solr I'm 
> getting null sort tokens. Need to figure out why. 
> Here ya go!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3413) CombiningFilter to recombine tokens into a single token for sorting

2011-09-05 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated LUCENE-3413:
--

Attachment: LUCENE-3413.Mattmann.090511.patch.txt

- updated patch addressing comments from Simon. Chris Male suggested renaming 
it, but I couldn't come up with a better name. Maybe we could call it 
CombiningTokenFilter, or something for specificity, but I'll leave that part up 
to you guys. 

> CombiningFilter to recombine tokens into a single token for sorting
> ---
>
> Key: LUCENE-3413
> URL: https://issues.apache.org/jira/browse/LUCENE-3413
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 2.9.3
>Reporter: Chris A. Mattmann
>Priority: Minor
> Attachments: LUCENE-3413.Mattmann.090311.2.patch, 
> LUCENE-3413.Mattmann.090311.patch.txt, LUCENE-3413.Mattmann.090511.patch.txt
>
>
> I whipped up this CombiningFilter for the following use case:
> I've got a bunch of titles of e.g., Books, such as:
> The Grapes of Wrath
> Tommy Tommerson saves the World
> Top of the World
> The Tales of Beedle the Bard
> Born Free
> etc.
> I want to sort these titles using a String field that includes stopword 
> analysis (e.g., to remove "The"), and synonym filtering (e.g., for grouping), 
> etc. I created an analysis chain in Solr for this that was based off of 
> *alphaOnlySort*, which looks like this:
> {code:xml}
>  omitNorms="true">
>
> 
> 
> 
> 
> 
> 
> 
>  pattern="([^a-z])" replacement="" replace="all"
> /> 
>
> 
> {code}
> The issue with alphaOnlySort is that it doesn't support stopword remove or 
> synonyms because those are based on the original token level instead of the 
> full strings produced by the KeywordTokenizer (which does not do 
> tokenization). I needed a filter that would allow me to change alphaOnlySort 
> and its analysis chain from using KeywordTokenizer to using 
> WhitespaceTokenizer, and then a way to recombine the tokens at the end. So, 
> take "The Grapes of Wrath". I needed a way for it to get turned into:
> {noformat}
> grapes of wrath
> {noformat}
> And then to combine those tokens into a single token:
> {noformat}
> grapesofwrath
> {noformat}
> The attached CombiningFilter takes care of that. It doesn't do it super 
> efficiently I'm guessing (since I used a StringBuffer), but I'm open to 
> suggestions on how to make it better. 
> One other thing is that apparently this analyzer works fine for analysis 
> (e.g., it produces the desired tokens), however, for sorting in Solr I'm 
> getting null sort tokens. Need to figure out why. 
> Here ya go!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2011-09-05 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097727#comment-13097727
 ] 

Koji Sekiguchi commented on LUCENE-1824:


Forgot one comment. I've not taken care of Solr yet in the patch.

> FastVectorHighlighter truncates words at beginning and end of fragments
> ---
>
> Key: LUCENE-1824
> URL: https://issues.apache.org/jira/browse/LUCENE-1824
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1824.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when 
> building fragments, so that in most cases the first and last word of a 
> fragment are truncated.  This makes the highlights less legible than they 
> should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
> by expanding the start and end boundaries of the fragment to the first 
> whitespace character on either side of the fragment, or the beginning or end 
> of the source text, whichever comes first.  This significantly improves 
> legibility, at the cost of returning a slightly larger number of characters 
> than specified for the fragment size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2011-09-05 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-1824:
---

Attachment: LUCENE-1824.patch

First draft. I introduced BoundaryScanner interface and two implementations of 
the interface, Simple and BreakIterator.

SimpleBoundaryScanner uses the following default boundary chars:

{code}
public static final Character[] DEFAULT_BOUNDARY_CHARS = {'.', ',', '!', '?', 
'(', '[', '{', '\t', '\n'};
{code}

And they are used by SimpleBoundaryScanner to find word/sentence boundary.

BreakIteratorBoundaryScanner can also be used to find the break of 
char/word/sentence/line.

I made BaseFragmentsBuilder boundary-aware, rather than creating a new 
FragmentsBuilder something like BoundaryAwareFragmentsBuilder. As a result, all 
FragmentsBuilder is now boundary-aware natively, as long as using an 
appropriate BoundaryScanner.

I've not touched test yet. Because this patch changes fragments boundaries, the 
existing test should go fail!

> FastVectorHighlighter truncates words at beginning and end of fragments
> ---
>
> Key: LUCENE-1824
> URL: https://issues.apache.org/jira/browse/LUCENE-1824
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1824.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when 
> building fragments, so that in most cases the first and last word of a 
> fragment are truncated.  This makes the highlights less legible than they 
> should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
> by expanding the start and end boundaries of the fragment to the first 
> whitespace character on either side of the fragment, or the beginning or end 
> of the source text, whichever comes first.  This significantly improves 
> legibility, at the cost of returning a slightly larger number of characters 
> than specified for the fragment size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2204) Cross-version replication broken by new javabin format

2011-09-05 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated SOLR-2204:
---

Attachment: SOLR-2204.patch

> Cross-version replication broken by new javabin format
> --
>
> Key: SOLR-2204
> URL: https://issues.apache.org/jira/browse/SOLR-2204
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 3.1
> Environment: Linux idxst0-a 2.6.18-194.3.1.el5.centos.plusxen #1 SMP 
> Wed May 19 09:59:34 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>Reporter: Shawn Heisey
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2204.patch, SOLR-2204.patch
>
>
> Slave server is branch_3x, revision 1027974.  Master server is 1.4.1.  
> Replication fails because of the new javabin format.
> SEVERE: Master at: http://HOST:8983/solr/live/replication is not available. 
> Index fetch failed. Exception: Invalid version or the data in not in 
> 'javabin' format
> Switching Solr's internally generated requests to XML, or adding support for 
> both javabin versions would get rid of this problem.  I do not know how to do 
> either of these things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2204) Cross-version replication broken by new javabin format

2011-09-05 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097676#comment-13097676
 ] 

Mike Sokolov commented on SOLR-2204:


I'm posting a more fully-realized patch now.  This is an important issue for 
us, not just because of replication, but also because we may support a bunch of 
different apps on a single server, would like to upgrade such a server, but 
can't upgrade all the apps at once.  Some might be stuck on an old version for 
some time since we are locked into our client's update schedules.  We could set 
up old and new servers and migrate the apps one by one, but it just seemed to 
me that the flexibility of being able to mix versions was worth some degree of 
pain.

This patch restores support for version 1 utf-8 encoding to JavaBinCodec to be 
used as a fallback when communicating with older peers.

When a v2 server detects a v1 client, it responds using v1. The javabin version 
is inferred from the version byte read when unmarshalling binary content.  
However, non-update requests won't have any such version info, so I increased 
the version passed on every HTTP request, from 2.2 to 3.4 and also use this 
string to detect older peers.  I may have missed the significance of this value 
and broken something else: wiser heads, please review!

The SolrJ client behaves a bit differently since it has no way of knowing in 
advance what version the server is.  With this patch, v2 clients detect a 
version mismatch error by parsing the HTTP response text, retry and then fall 
back to v1 for all future requests by recording the server javabin version in 
the RequestWriter.

Testing this requires simulating the old behavior (ie forcing either the client 
or server into v1 mode).  To do this via jetty seemed to require a built-in 
hook (in BinaryUpdateRequestHandler) for that, used only for testing, which 
would be nice to avoid, but I didn't see how.  Also - JettySolrRunner offers a 
configfile param, but it didn't seem to have any effect, so I added a check for 
the system property in CoreContainer, but maybe I missed something and there is 
a better way to do this.


> Cross-version replication broken by new javabin format
> --
>
> Key: SOLR-2204
> URL: https://issues.apache.org/jira/browse/SOLR-2204
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 3.1
> Environment: Linux idxst0-a 2.6.18-194.3.1.el5.centos.plusxen #1 SMP 
> Wed May 19 09:59:34 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>Reporter: Shawn Heisey
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2204.patch, SOLR-2204.patch
>
>
> Slave server is branch_3x, revision 1027974.  Master server is 1.4.1.  
> Replication fails because of the new javabin format.
> SEVERE: Master at: http://HOST:8983/solr/live/replication is not available. 
> Index fetch failed. Exception: Invalid version or the data in not in 
> 'javabin' format
> Switching Solr's internally generated requests to XML, or adding support for 
> both javabin versions would get rid of this problem.  I do not know how to do 
> either of these things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2700) transaction logging

2011-09-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097593#comment-13097593
 ] 

Yonik Seeley commented on SOLR-2700:


bq. OK, I think we're getting close to committing now.

Urggg - scratch that.  At some point in the past, some of the asserts were 
commented out to aid in debugging and I never re-enabled them.  The 
realtime-get test now fails, so I need to dig into that again.

> transaction logging
> ---
>
> Key: SOLR-2700
> URL: https://issues.apache.org/jira/browse/SOLR-2700
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, 
> SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch
>
>
> A transaction log is needed for durability of updates, for a more performant 
> realtime-get, and for replaying updates to recovering peers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Lucene.Net] 2.9.4

2011-09-05 Thread Digy
Not bad idea, but I would prefer community's feedback instead of testing
against all projects using Lucene.Net
DIGY

-Original Message-
From: Matt Warren [mailto:mattd...@gmail.com] 
Sent: Monday, September 05, 2011 11:09 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] 2.9.4

If you want to test it against a large project you could take a look at how
RavenDB uses it?

At the moment it's using 2.9.2 (
https://github.com/ayende/ravendb/tree/master/SharedLibs/Sources/Lucene2.9.2
)
but if you were to recompile it against 2.9.4 and check that all it's
unit-tests still run that would give you quite a large test case.

On 5 September 2011 19:22, Prescott Nasser  wrote:

>
> Hey All,
>
> How do people feel about the 2.9.4 code base? I've been using it for
> sometime, for my use cases it's be excellent. Do we feel we are ready to
> package this up and make it an official release? Or do we have some tasks
> left to take care of?
>
> ~Prescott

-
Bu iletide virüs bulunamadı.
AVG tarafından kontrol edildi - www.avg.com
Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4478 - Sürüm Tarihi: 05.09.2011



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097318#comment-13097318
 ] 

Jan Høydahl commented on LUCENE-3414:
-

+1

We now use Lucene Hunspell for a few customer deployments, and it would be 
great to have it the analysis module, since it supports some 70-80 languages 
out of the box, and gives great flexibility since you can edit - or augment - 
the dictionaries to change behaviour and fix stemming bugs.

As a side benefit I also expect that when the Ooo dictionaries get more use in 
Lucene, users will over time be able to extend and improve the dictionaries, 
and contribute their changes back, benefiting also Ooo users.

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2383) Velocity: Generalize range and date facet display

2011-09-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097315#comment-13097315
 ] 

Jan Høydahl commented on SOLR-2383:
---

@Bill
Yes, the exclusive upper range syntax [x TO y} only works on 4.0, and I haven't 
found a good way to emulate the same behaviour in 3.x. This means that you'll 
sometimes see more hits when clicking a facet than the number presented, being 
the values exactly on the upper bound. Do you have a suggestion?


> Velocity: Generalize range and date facet display
> -
>
> Key: SOLR-2383
> URL: https://issues.apache.org/jira/browse/SOLR-2383
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: facet, range, velocity
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, 
> SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, 
> SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, 
> SOLR-2383.patch, SOLR-2383.patch
>
>
> Velocity (/browse) GUI has hardcoded price range facet and a hardcoded 
> manufacturedate_dt date facet. Need general solution which work for any 
> facet.range and facet.date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2280) commitWithin ignored for a delete query

2011-09-05 Thread Juan Grande (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Grande updated SOLR-2280:
--

Attachment: SOLR-2280.patch

I'm submitting a patch that implements commitWithin on deletes. The patch is 
for the 3x branch.

Two things should be noted:
# The commit is fired even if the delete doesn't really delete any document.
# When using the BinaryUpdateRequestHandler the params of the UpdateRequest are 
loaded when parsing the docs. If the request doesn't include a docs list, then 
the params aren't loaded. I added a workaround for this, but SOLR-1164 should 
solve this problem definitely.


> commitWithin ignored for a delete query
> ---
>
> Key: SOLR-2280
> URL: https://issues.apache.org/jira/browse/SOLR-2280
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Reporter: David Smiley
>Priority: Minor
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2280.patch
>
>
> The commitWithin option on an UpdateRequest is only honored for requests 
> containing new documents.  It does not, for example, work with a delete 
> query.  The following doesn't work as expected:
> {code:java}
> UpdateRequest request = new UpdateRequest();
> request.deleteById("id123");
> request.setCommitWithin(1000);
> solrServer.request(request);
> {code}
> In my opinion, the commitWithin attribute should be  permitted on the 
>  xml tag as well as .  Such a change would go in 
> XMLLoader.java and its would have some ramifications elsewhere too.  Once 
> this is done, then UpdateRequest.getXml() can be updated to generate the 
> right XML.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-1834) Document level security

2011-09-05 Thread Peter Sturge
Yes, there has been much work and discussions on doc-level security in Solr.
The main problem with building in application-level security into Solr
is that there are myriad ways to approach it, depending on
requirements, as well as plenty of issues to address generally
regarding security - e.g. where do the permissions come from, how to
verify the caller, etc. etc.

Currently, there are 3 patches available to this end:
SOLR-1834
SOLR-1895
SOLR-1872

1834 and 1895 use LCF to provide the security permissions. 1872 uses a
solr-local ACL file to deliver permissions.

The current trunk status quo is to leave security up to the web
container (e.g. Tomcat).
This makes sense, as the approaches above are relevant (or not)
depending on your specific requirements.

HTH

Peter



On Mon, Sep 5, 2011 at 11:18 AM, Ravish Bhagdev (JIRA)  wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/SOLR-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097108#comment-13097108
>  ]
>
> Ravish Bhagdev commented on SOLR-1834:
> --
>
> are there any plans for adding this or other document level or other search 
> security solutions into solr? This requirement is quite critical for most 
> enterprise search apps I would have thought?  Has this been discussed in 
> detail elsewhere?
>
>> Document level security
>> ---
>>
>>                 Key: SOLR-1834
>>                 URL: https://issues.apache.org/jira/browse/SOLR-1834
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: SearchComponents - other
>>    Affects Versions: 1.4
>>            Reporter: Anders Rask
>>         Attachments: SOLR-1834-with-LCF.patch, SOLR-1834.patch, html.rar
>>
>>
>> Attached to this issue is a patch that includes a framework for enabling 
>> document level security in Solr as a search component. I did this as a 
>> Master thesis project at Findwise in Stockholm and Findwise has now decided 
>> to contribute it back to the community. The component was developed in 
>> spring 2009 and has been in use at a customer since autumn the same year.
>> There is a simple demo application up at 
>> http://demo.findwise.se:8880/SolrSecurity/ which also explains more about 
>> the component and how to set it up.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-05 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097257#comment-13097257
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

Ok, solved the above comment by taking the sorted ord array and building a new 
reverse array from that... 

> Add non-desctructive sort to BytesRefHash
> -
>
> Key: LUCENE-3199
> URL: https://issues.apache.org/jira/browse/LUCENE-3199
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
> LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns 
> a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-05 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097246#comment-13097246
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

I started integrating the patch into LUCENE-2312.  I think the main 
functionality missing is a reverse int[] that points from a term id to the 
sorted ords array.  That array would be used for implementing the RT version of 
DocTermsIndex, where a doc id -> term id -> sorted term id index.  

> Add non-desctructive sort to BytesRefHash
> -
>
> Key: LUCENE-3199
> URL: https://issues.apache.org/jira/browse/LUCENE-3199
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
> LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns 
> a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2383) Velocity: Generalize range and date facet display

2011-09-05 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097197#comment-13097197
 ] 

Bill Bell commented on SOLR-2383:
-

Does this mean the [0 TO 8} will not work?

popularity:[3 TO 6} ?

Thanks.



> Velocity: Generalize range and date facet display
> -
>
> Key: SOLR-2383
> URL: https://issues.apache.org/jira/browse/SOLR-2383
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: facet, range, velocity
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, 
> SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, 
> SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, 
> SOLR-2383.patch, SOLR-2383.patch
>
>
> Velocity (/browse) GUI has hardcoded price range facet and a hardcoded 
> manufacturedate_dt date facet. Need general solution which work for any 
> facet.range and facet.date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Solr Wiki] Update of "NewSolrCloudDesign" by YonikSeeley

2011-09-05 Thread Yonik Seeley
I'm wondering if we shouldn't ditch the new term "partition" here and
just use "replica"?

In the past, we've sort of used "shard" to mean both a single physical
index, and the logical piece of the larger collection.  In practice,
this ambiguity normally isn't much of a problem as it's normally clear
by context and when it's not we sometimes throw in the word "replica".
 Examples:  "Doc X belongs on Shard Z", "Shard Z on this node is
corrupt".

Refreshing my memory on our ZK layout, it seems like we are using
"shards" in the logical sense there.

 /COLLECTIONS (v=6 children=1)
  COLLECTION1 (v=0 children=1) "configName=myconf"
   SHARDS (v=0 children=1)
SHARD1 (v=0 children=1)
 ROGUE.LOCAL:8983_SOLR_ (v=0)
"node_name=Rogue.local:8983_solr url=http://Rogue.local:8983/solr/";

So perhaps we should just continue that, and change "partition" to
"replica" when necessary to prevent ambiguity?

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3178) Native MMapDir

2011-09-05 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097182#comment-13097182
 ] 

Varun Thacker commented on LUCENE-3178:
---

bq. If we pass down IOContext to NMapIndexInput and in the ctor use mmap and 
then use madvise with the appropriate flag ( depending on the Context). Is that 
the correct way to go about it ?

Any suggestions on this?

> Native MMapDir
> --
>
> Key: LUCENE-3178
> URL: https://issues.apache.org/jira/browse/LUCENE-3178
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>
> Spinoff from LUCENE-2793.
> Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
> level IO flags depending on the IOContext, we could in theory do something 
> similar with MMapDir.
> The problem is MMap is apparently quite hairy... and to pass the flags the 
> native code would need to invoke mmap (I think?), unlike UnixDir where the 
> code "only" has to open the file handle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097168#comment-13097168
 ] 

Robert Muir commented on LUCENE-3390:
-

+1 to revisit how this was done in trunk.

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Assignee: Doron Cohen
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Fix For: 3.4
>
> Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097164#comment-13097164
 ] 

Michael McCandless commented on LUCENE-3390:


Also, can we use FastBitSet, not OpenBitSet, here?

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Assignee: Doron Cohen
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Fix For: 3.4
>
> Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097161#comment-13097161
 ] 

Michael McCandless commented on LUCENE-3390:


I like how we solved this in 3.x!  Ie, a whole separate entry for holding a 
bitset indicating if the doc has a value.

This is generally useful, alone, ie one can just pull this bitset and use it 
directly.

It's also nice because it's one source that computes this, vs N copies (one per 
value) that we have on trunk.

I guess the downside is it takes 2 passes over the terms (one to get the 
values, another to fill this bitset), but maybe that tradeoff is worth not 
duplicating the code all over... maybe we should take a similar approach in 
trunk?

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Assignee: Doron Cohen
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Fix For: 3.4
>
> Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-05 Thread Chris Male (JIRA)
Bring Hunspell for Lucene into analysis module
--

 Key: LUCENE-3414
 URL: https://issues.apache.org/jira/browse/LUCENE-3414
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Chris Male


Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
array of languages.   

It seems to still be being used but has fallen out of date.  I think it would 
benefit from being inside the analysis module where additional features such as 
decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Heads up for a few planned commit's

2011-09-05 Thread Michael McCandless
Jan,

I haven't looked at these issues but you should go ahead and commit if
you are comfortable with the changes!

Nobody responding despite pleas for review means "lazy" consensus, ie
it means others are OK with the change.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Sep 5, 2011 at 6:34 AM, Jan Høydahl  wrote:
> Hi,
>
> As I'm quite new as a committer, I want to make sure I follow the right 
> procedures.
> I have several Jira's with patches that I feel are ready for commit.
> They have tests which pass, but there has been limited peer review despite 
> requests for such in the issues themselves.
>
> These are the issues I plan to commit shortly. Would be great to get thumbs 
> up/down from more senior committers:
>
> SOLR-2741: Bugs in facet range display in trunk
> These are bug-fixes on previously committed SOLR-2383 code in trunk.
>
> SOLR-2383: Velocity: Generalize range and date facet display
> I plan to commit the patch SOLR-2383-branch_3x.patch which is a backport to 
> 3x, including the improvements from SOLR-2741
>
> SOLR-2540: CommitWithin as an Update Request parameter
> This gives &commitWithin=xxx capabilities to XML-URH, CSV-URH and 
> Extracting-URH (similar to what's in Binary-URH and JSON-URH already)
> I plan to commit this both to trunk and 3x
>
> SOLR-2742: Add commitWithin to convenience signatures for SolrServer.add(..)
> This one simply introduces convenience signatures in SolrJ to more easily 
> specify commitWithin on ADDs
> I plan to commit this both to trunk and 3x
>
> Thanks for any feedback on any of these!
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)

2011-09-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2742:
--

Attachment: SOLR-2742.patch

Made better JavaDocs for all public methods in SolrServer, including @param 
tags.

> Add commitWithin to convenience signatures for SolrServer.add(..)
> -
>
> Key: SOLR-2742
> URL: https://issues.apache.org/jira/browse/SOLR-2742
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: SolrJ, commitWithin
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2742.patch, SOLR-2742.patch, SOLR-2742.patch
>
>
> Today you need to manually create an UpdateRequest in order to set the 
> commitWithin value.
> We should provide an optional commitWithin parameter on all 
> SolrServer.add(..) methods as a convenience

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)

2011-09-05 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097123#comment-13097123
 ] 

Chris Male commented on SOLR-2742:
--

Hey Jan,

Looks great! +1 to committing to trunk and back porting.

Just one personal nitpick, if we're going to add Javadocs to the SolrServer 
methods, can we add full javadocs?

> Add commitWithin to convenience signatures for SolrServer.add(..)
> -
>
> Key: SOLR-2742
> URL: https://issues.apache.org/jira/browse/SOLR-2742
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: SolrJ, commitWithin
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2742.patch, SOLR-2742.patch
>
>
> Today you need to manually create an UpdateRequest in order to set the 
> commitWithin value.
> We should provide an optional commitWithin parameter on all 
> SolrServer.add(..) methods as a convenience

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-05 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3396:
---

Attachment: LUCENE-3396-rab.patch

Patch updated to trunk.

Generic  is removed from ReuseStrategy.

> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Heads up for a few planned commit's

2011-09-05 Thread Jan Høydahl
Hi,

As I'm quite new as a committer, I want to make sure I follow the right 
procedures.
I have several Jira's with patches that I feel are ready for commit.
They have tests which pass, but there has been limited peer review despite 
requests for such in the issues themselves.

These are the issues I plan to commit shortly. Would be great to get thumbs 
up/down from more senior committers:

SOLR-2741: Bugs in facet range display in trunk
These are bug-fixes on previously committed SOLR-2383 code in trunk.

SOLR-2383: Velocity: Generalize range and date facet display
I plan to commit the patch SOLR-2383-branch_3x.patch which is a backport to 3x, 
including the improvements from SOLR-2741

SOLR-2540: CommitWithin as an Update Request parameter
This gives &commitWithin=xxx capabilities to XML-URH, CSV-URH and 
Extracting-URH (similar to what's in Binary-URH and JSON-URH already)
I plan to commit this both to trunk and 3x

SOLR-2742: Add commitWithin to convenience signatures for SolrServer.add(..)
This one simply introduces convenience signatures in SolrJ to more easily 
specify commitWithin on ADDs
I plan to commit this both to trunk and 3x

Thanks for any feedback on any of these!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)

2011-09-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097115#comment-13097115
 ] 

Jan Høydahl commented on SOLR-2742:
---

Plan to commit this to both trunk and 3x branch in a couple of days

> Add commitWithin to convenience signatures for SolrServer.add(..)
> -
>
> Key: SOLR-2742
> URL: https://issues.apache.org/jira/browse/SOLR-2742
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: SolrJ, commitWithin
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2742.patch, SOLR-2742.patch
>
>
> Today you need to manually create an UpdateRequest in order to set the 
> commitWithin value.
> We should provide an optional commitWithin parameter on all 
> SolrServer.add(..) methods as a convenience

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2383) Velocity: Generalize range and date facet display

2011-09-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097109#comment-13097109
 ] 

Jan Høydahl commented on SOLR-2383:
---

Plan to commit this in a day or two, if no objections

> Velocity: Generalize range and date facet display
> -
>
> Key: SOLR-2383
> URL: https://issues.apache.org/jira/browse/SOLR-2383
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: facet, range, velocity
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, 
> SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, 
> SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, 
> SOLR-2383.patch, SOLR-2383.patch
>
>
> Velocity (/browse) GUI has hardcoded price range facet and a hardcoded 
> manufacturedate_dt date facet. Need general solution which work for any 
> facet.range and facet.date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2741) Bugs in facet range display in trunk

2011-09-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097110#comment-13097110
 ] 

Jan Høydahl commented on SOLR-2741:
---

Plan to commit this in a day or two if no objections

> Bugs in facet range display in trunk
> 
>
> Key: SOLR-2741
> URL: https://issues.apache.org/jira/browse/SOLR-2741
> Project: Solr
>  Issue Type: Sub-task
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0
>
> Attachments: SOLR-2741.patch, SOLR-2741.patch
>
>
> In SOLR-2383 the hardcoded display of some facet ranges were replaced with 
> automatic, dynamic display.
> There were some shortcomings:
> a) Float range to-values were sometimes displayed as int
> b) Capitalizing the facet name was a mistake, sometimes looks good, sometimes 
> not
> c) facet.range on a date did not work - dates were displayed in whatever 
> locale formatting
> d) The deprecated facet.date syntax was used in solrconfig.xml instead of the 
> new facet.range

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1834) Document level security

2011-09-05 Thread Ravish Bhagdev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097108#comment-13097108
 ] 

Ravish Bhagdev commented on SOLR-1834:
--

are there any plans for adding this or other document level or other search 
security solutions into solr? This requirement is quite critical for most 
enterprise search apps I would have thought?  Has this been discussed in detail 
elsewhere?

> Document level security
> ---
>
> Key: SOLR-1834
> URL: https://issues.apache.org/jira/browse/SOLR-1834
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Affects Versions: 1.4
>Reporter: Anders Rask
> Attachments: SOLR-1834-with-LCF.patch, SOLR-1834.patch, html.rar
>
>
> Attached to this issue is a patch that includes a framework for enabling 
> document level security in Solr as a search component. I did this as a Master 
> thesis project at Findwise in Stockholm and Findwise has now decided to 
> contribute it back to the community. The component was developed in spring 
> 2009 and has been in use at a customer since autumn the same year.
> There is a simple demo application up at 
> http://demo.findwise.se:8880/SolrSecurity/ which also explains more about the 
> component and how to set it up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2540) CommitWithin as an Update Request parameter

2011-09-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2540:
--

Attachment: SOLR-2540.patch

Updated patch with more tests. Will commit in a day or two

> CommitWithin as an Update Request parameter
> ---
>
> Key: SOLR-2540
> URL: https://issues.apache.org/jira/browse/SOLR-2540
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 3.1
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: commit, commitWithin
> Attachments: SOLR-2540.patch, SOLR-2540.patch
>
>
> It would be useful to support commitWithin HTTP GET request param on all 
> UpdateRequestHandlers.
> That way, you could set commitWithin on the request (for XML, JSON, CSV, 
> Binary and Extracting handlers) with this syntax:
> {code}
>   curl 
> http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1
>-H "Content-Type: application/pdf" --data-binary @file.pdf
> {code}
> PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already 
> support this syntax.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097083#comment-13097083
 ] 

Uwe Schindler commented on LUCENE-3396:
---

I agree its somehow overkill. But if not on class level I would even remove the 
T parameter from ther getter method, because it does not really fit, it is only 
used there, not even on the setter. There is no type enforcement anywhere, so 
the extra T is just to remove the casting on the caller of the protected 
method, but adding a SuppressWarnings on the implementor's side. So either make 
all T or use Object everywhere.

> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-05 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097076#comment-13097076
 ] 

Chris Male edited comment on LUCENE-3396 at 9/5/11 9:44 AM:


Hi Uwe,

I originally had ReuseStrategy with a generic type but then decided it was 
overkill since it only benefits implementations, not users of ReuseStrategy.  
If we want the extra type safety, I'll happily make the change.

  was (Author: cmale):
Hi Uwe,

I originally had ReuseStrategy with a generic type but then decided it was 
overkill.  If we want the extra type safety, I'll happily make the change.
  
> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-05 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097076#comment-13097076
 ] 

Chris Male commented on LUCENE-3396:


Hi Uwe,

I originally had ReuseStrategy with a generic type but then decided it was 
overkill.  If we want the extra type safety, I'll happily make the change.

> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097072#comment-13097072
 ] 

Uwe Schindler commented on LUCENE-3396:
---

Hi, the ReuseStrategies look fine. I am just confused about the Generics. Why 
not make the whole abstract ReuseStrategy T-typed? Then also ThreadLocal is 
used and no casting anywhere. The subclasses for PerField and global then are 
typed to the correct class (Map<> or TSComponents).

> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-05 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097069#comment-13097069
 ] 

Simon Willnauer commented on LUCENE-3396:
-

this patch looks good to me! I like the reuse strategy and how you factored out 
the thread local stuff. 
I think we should commit and let bake this in?

> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org