[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters
[ https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618512#comment-14618512 ] Michael McCandless commented on LUCENE-6667: bq. when you insert new tokens, restore the state instead of clearAttributes() But e.g. if syn filter matched domain name system and wants to insert dns which token's attributes is it supposed to clone for the dns token? Custom attributes get cleared by filters Key: LUCENE-6667 URL: https://issues.apache.org/jira/browse/LUCENE-6667 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.10.4 Reporter: Oliver Becker I believe the Lucene API enables users to define their custom attributes (by extending {{AttributeImpl}}) which may be added by custom Tokenizers. It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear and restore the state of this custom attribute. However, some filters (in our case the SynonymFilter) simply call {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the filter just resets some known attributes, simply ignoring all other custom attributes. In the end our custom attribute value is lost. Is this a bug in {{SynonymFilter}} (and others) or are we using the API in the wrong way? A solution might be of course to provide empty implementations of {{clear}} and {{copyTo}}, but I'm not sure if this has other unwanted effects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters
[ https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618515#comment-14618515 ] Uwe Schindler commented on LUCENE-6667: --- bq. But e.g. if syn filter matched domain name system and wants to insert dns which token's attributes is it supposed to clone for the dns token? That's the problem with the multi word synonyms... It has to be defined (first, last,...). But I am not sure what the right thing to do is! Custom attributes get cleared by filters Key: LUCENE-6667 URL: https://issues.apache.org/jira/browse/LUCENE-6667 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.10.4 Reporter: Oliver Becker I believe the Lucene API enables users to define their custom attributes (by extending {{AttributeImpl}}) which may be added by custom Tokenizers. It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear and restore the state of this custom attribute. However, some filters (in our case the SynonymFilter) simply call {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the filter just resets some known attributes, simply ignoring all other custom attributes. In the end our custom attribute value is lost. Is this a bug in {{SynonymFilter}} (and others) or are we using the API in the wrong way? A solution might be of course to provide empty implementations of {{clear}} and {{copyTo}}, but I'm not sure if this has other unwanted effects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters
[ https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618312#comment-14618312 ] Uwe Schindler commented on LUCENE-6667: --- I have not looked at SynonymFilter, but maybe there is a bug. In general the above is how all filters should call. Maybe we should somehow add some assertions that Filters never call clearAttributes(), but this is hard because of shared state between filters and root. Custom attributes get cleared by filters Key: LUCENE-6667 URL: https://issues.apache.org/jira/browse/LUCENE-6667 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.10.4 Reporter: Oliver Becker I believe the Lucene API enables users to define their custom attributes (by extending {{AttributeImpl}}) which may be added by custom Tokenizers. It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear and restore the state of this custom attribute. However, some filters (in our case the SynonymFilter) simply call {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the filter just resets some known attributes, simply ignoring all other custom attributes. In the end our custom attribute value is lost. Is this a bug in {{SynonymFilter}} (and others) or are we using the API in the wrong way? A solution might be of course to provide empty implementations of {{clear}} and {{copyTo}}, but I'm not sure if this has other unwanted effects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters
[ https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618309#comment-14618309 ] Uwe Schindler commented on LUCENE-6667: --- In filters the approach should be the following: - Of the original token capture the state - when you insert new tokens, restore the state instead of clearAttributes() - set the changed attributes This approach is used by stemmers that insert stemmed tokens (preserve original), so the original attributes keep alive. clearAttributes should only be called in Tokenizers or root TokenStreams. Custom attributes get cleared by filters Key: LUCENE-6667 URL: https://issues.apache.org/jira/browse/LUCENE-6667 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.10.4 Reporter: Oliver Becker I believe the Lucene API enables users to define their custom attributes (by extending {{AttributeImpl}}) which may be added by custom Tokenizers. It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear and restore the state of this custom attribute. However, some filters (in our case the SynonymFilter) simply call {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the filter just resets some known attributes, simply ignoring all other custom attributes. In the end our custom attribute value is lost. Is this a bug in {{SynonymFilter}} (and others) or are we using the API in the wrong way? A solution might be of course to provide empty implementations of {{clear}} and {{copyTo}}, but I'm not sure if this has other unwanted effects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters
[ https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618292#comment-14618292 ] Michael McCandless commented on LUCENE-6667: Hmm, {{SynonymFilter}} tries to preserve all attributes of the original incoming tokens (it uses {{capture/restoreState}} to do this). But for the new tokens it inserts, it does use {{clearAttributes}} to make a completely blank slate, and then sets the term, offset, posInc/Length etc. Which tokens (original input tokens vs. the inserted ones) are missing your custom attribute? Custom attributes get cleared by filters Key: LUCENE-6667 URL: https://issues.apache.org/jira/browse/LUCENE-6667 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.10.4 Reporter: Oliver Becker I believe the Lucene API enables users to define their custom attributes (by extending {{AttributeImpl}}) which may be added by custom Tokenizers. It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear and restore the state of this custom attribute. However, some filters (in our case the SynonymFilter) simply call {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the filter just resets some known attributes, simply ignoring all other custom attributes. In the end our custom attribute value is lost. Is this a bug in {{SynonymFilter}} (and others) or are we using the API in the wrong way? A solution might be of course to provide empty implementations of {{clear}} and {{copyTo}}, but I'm not sure if this has other unwanted effects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org