[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983149#action_12983149 ] Robert Muir commented on LUCENE-2295: - Shai, the patch looks good to me. I would also say that with your trunk patch we can resolve SOLR-2086, because then the maxFieldLength is totally implemented in the analyzer, so tools like analysis.jsp will automatically work with it. > Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the > same functionality as MaxFieldLength provided on IndexWriter > --- > > Key: LUCENE-2295 > URL: https://issues.apache.org/jira/browse/LUCENE-2295 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Shai Erera >Assignee: Uwe Schindler > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2295-2-3x.patch, LUCENE-2295-2-trunk.patch, > LUCENE-2295-trunk.patch, LUCENE-2295.patch > > > A spinoff from LUCENE-2294. Instead of asking the user to specify on > IndexWriter his requested MFL limit, we can get rid of this setting entirely > by providing an Analyzer which will wrap any other Analyzer and its > TokenStream with a TokenFilter that keeps track of the number of tokens > produced and stop when the limit has reached. > This will remove any count tracking in IW's indexing, which is done even if I > specified UNLIMITED for MFL. > Let's try to do it for 3.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982594#action_12982594 ] Robert Muir commented on LUCENE-2295: - Hi Shai, that sounds like the right solution to me! > Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the > same functionality as MaxFieldLength provided on IndexWriter > --- > > Key: LUCENE-2295 > URL: https://issues.apache.org/jira/browse/LUCENE-2295 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Shai Erera >Assignee: Uwe Schindler > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch > > > A spinoff from LUCENE-2294. Instead of asking the user to specify on > IndexWriter his requested MFL limit, we can get rid of this setting entirely > by providing an Analyzer which will wrap any other Analyzer and its > TokenStream with a TokenFilter that keeps track of the number of tokens > produced and stop when the limit has reached. > This will remove any count tracking in IW's indexing, which is done even if I > specified UNLIMITED for MFL. > Let's try to do it for 3.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982529#action_12982529 ] Shai Erera commented on LUCENE-2295: I think the changes to 3x are less complicated than they seem - we don't need to deprecate anything, more than we already did. IndexWriterConfig is introduced in 3.1 and all IW ctors are already deprecated. So we can just remove the get/setMaxFieldLength from IWC and be done with it + some jdocs. Is that the intention behind the reopening of the issue? > Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the > same functionality as MaxFieldLength provided on IndexWriter > --- > > Key: LUCENE-2295 > URL: https://issues.apache.org/jira/browse/LUCENE-2295 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Shai Erera >Assignee: Uwe Schindler > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch > > > A spinoff from LUCENE-2294. Instead of asking the user to specify on > IndexWriter his requested MFL limit, we can get rid of this setting entirely > by providing an Analyzer which will wrap any other Analyzer and its > TokenStream with a TokenFilter that keeps track of the number of tokens > produced and stop when the limit has reached. > This will remove any count tracking in IW's indexing, which is done even if I > specified UNLIMITED for MFL. > Let's try to do it for 3.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903279#action_12903279 ] Uwe Schindler commented on LUCENE-2295: --- +1, I missed Mike's comment after resolving this issue! > Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the > same functionality as MaxFieldLength provided on IndexWriter > --- > > Key: LUCENE-2295 > URL: https://issues.apache.org/jira/browse/LUCENE-2295 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Shai Erera >Assignee: Uwe Schindler > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch > > > A spinoff from LUCENE-2294. Instead of asking the user to specify on > IndexWriter his requested MFL limit, we can get rid of this setting entirely > by providing an Analyzer which will wrap any other Analyzer and its > TokenStream with a TokenFilter that keeps track of the number of tokens > produced and stop when the limit has reached. > This will remove any count tracking in IW's indexing, which is done even if I > specified UNLIMITED for MFL. > Let's try to do it for 3.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873397#action_12873397 ] Michael McCandless commented on LUCENE-2295: bq. Further investigantions showed, that there is some difference between using this filter/analyzer and the current setting in IndexWriter. IndexWriter uses the given MaxFieldLength as maximum value for all instances of the same field name. So if you add 100 fields "foo" (with each 1,000 terms) and have the default of 10,000 tokens, DocInverter will index 10 of these field instances (10,000 terms in total) and the rest will be supressed. In LUCENE-2450 I'm experimenting with having multi-valued fields be handled entirely by an analyzer stage, ie, the logical concatenation of tokens (with gaps) would "hidden" to IW, and IW would think its dealing with a single token stream. In this model, if you then appended the new LimitTokenCountFilter to the end, I think it'd result in the same behavior as maxFieldLength today. But, even before we eventually switch to that model... can't we still deprecate (on 3x) IW's maxFieldLength (remove from trunk) now? I realize the limiting is different (applying the limit pre vs post concatenation), but I think the javadocs can explain this difference? I think it's unlikely apps are relying on this specific interaction of truncation and multi-valued fields... > Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the > same functionality as MaxFieldLength provided on IndexWriter > --- > > Key: LUCENE-2295 > URL: https://issues.apache.org/jira/browse/LUCENE-2295 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Shai Erera >Assignee: Uwe Schindler > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch > > > A spinoff from LUCENE-2294. Instead of asking the user to specify on > IndexWriter his requested MFL limit, we can get rid of this setting entirely > by providing an Analyzer which will wrap any other Analyzer and its > TokenStream with a TokenFilter that keeps track of the number of tokens > produced and stop when the limit has reached. > This will remove any count tracking in IW's indexing, which is done even if I > specified UNLIMITED for MFL. > Let's try to do it for 3.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org