[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter

2011-01-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983149#action_12983149
 ] 

Robert Muir commented on LUCENE-2295:
-

Shai, the patch looks good to me. 

I would also say that with your trunk patch we can resolve SOLR-2086,
because then the maxFieldLength is totally implemented in the analyzer,
so tools like analysis.jsp will automatically work with it.


> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the 
> same functionality as MaxFieldLength provided on IndexWriter
> ---
>
> Key: LUCENE-2295
> URL: https://issues.apache.org/jira/browse/LUCENE-2295
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Shai Erera
>Assignee: Uwe Schindler
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2295-2-3x.patch, LUCENE-2295-2-trunk.patch, 
> LUCENE-2295-trunk.patch, LUCENE-2295.patch
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on 
> IndexWriter his requested MFL limit, we can get rid of this setting entirely 
> by providing an Analyzer which will wrap any other Analyzer and its 
> TokenStream with a TokenFilter that keeps track of the number of tokens 
> produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I 
> specified UNLIMITED for MFL.
> Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter

2011-01-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982594#action_12982594
 ] 

Robert Muir commented on LUCENE-2295:
-

Hi Shai, that sounds like the right solution to me!


> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the 
> same functionality as MaxFieldLength provided on IndexWriter
> ---
>
> Key: LUCENE-2295
> URL: https://issues.apache.org/jira/browse/LUCENE-2295
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Shai Erera
>Assignee: Uwe Schindler
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on 
> IndexWriter his requested MFL limit, we can get rid of this setting entirely 
> by providing an Analyzer which will wrap any other Analyzer and its 
> TokenStream with a TokenFilter that keeps track of the number of tokens 
> produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I 
> specified UNLIMITED for MFL.
> Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter

2011-01-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982529#action_12982529
 ] 

Shai Erera commented on LUCENE-2295:


I think the changes to 3x are less complicated than they seem - we don't need 
to deprecate anything, more than we already did. IndexWriterConfig is 
introduced in 3.1 and all IW ctors are already deprecated. So we can just 
remove the get/setMaxFieldLength from IWC and be done with it + some jdocs.

Is that the intention behind the reopening of the issue?

> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the 
> same functionality as MaxFieldLength provided on IndexWriter
> ---
>
> Key: LUCENE-2295
> URL: https://issues.apache.org/jira/browse/LUCENE-2295
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Shai Erera
>Assignee: Uwe Schindler
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on 
> IndexWriter his requested MFL limit, we can get rid of this setting entirely 
> by providing an Analyzer which will wrap any other Analyzer and its 
> TokenStream with a TokenFilter that keeps track of the number of tokens 
> produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I 
> specified UNLIMITED for MFL.
> Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter

2010-08-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903279#action_12903279
 ] 

Uwe Schindler commented on LUCENE-2295:
---

+1, I missed Mike's comment after resolving this issue!

> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the 
> same functionality as MaxFieldLength provided on IndexWriter
> ---
>
> Key: LUCENE-2295
> URL: https://issues.apache.org/jira/browse/LUCENE-2295
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Shai Erera
>Assignee: Uwe Schindler
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on 
> IndexWriter his requested MFL limit, we can get rid of this setting entirely 
> by providing an Analyzer which will wrap any other Analyzer and its 
> TokenStream with a TokenFilter that keeps track of the number of tokens 
> produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I 
> specified UNLIMITED for MFL.
> Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter

2010-05-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873397#action_12873397
 ] 

Michael McCandless commented on LUCENE-2295:


bq. Further investigantions showed, that there is some difference between using 
this filter/analyzer and the current setting in IndexWriter. IndexWriter uses 
the given MaxFieldLength as maximum value for all instances of the same field 
name. So if you add 100 fields "foo" (with each 1,000 terms) and have the 
default of 10,000 tokens, DocInverter will index 10 of these field instances 
(10,000 terms in total) and the rest will be supressed.

In LUCENE-2450 I'm experimenting with having multi-valued fields be handled 
entirely by an analyzer stage, ie, the logical concatenation of tokens (with 
gaps) would "hidden" to IW, and IW would think its dealing with a single token 
stream.  In this model, if you then appended the new LimitTokenCountFilter to 
the end, I think it'd result in the same behavior as maxFieldLength today.

But, even before we eventually switch to that model... can't we still deprecate 
(on 3x) IW's maxFieldLength (remove from trunk) now?  I realize the limiting is 
different (applying the limit pre vs post concatenation), but I think the 
javadocs can explain this difference?  I think it's unlikely apps are relying 
on this specific interaction of truncation and multi-valued fields...

> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the 
> same functionality as MaxFieldLength provided on IndexWriter
> ---
>
> Key: LUCENE-2295
> URL: https://issues.apache.org/jira/browse/LUCENE-2295
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Shai Erera
>Assignee: Uwe Schindler
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on 
> IndexWriter his requested MFL limit, we can get rid of this setting entirely 
> by providing an Analyzer which will wrap any other Analyzer and its 
> TokenStream with a TokenFilter that keeps track of the number of tokens 
> produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I 
> specified UNLIMITED for MFL.
> Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org