[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters

2015-07-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618512#comment-14618512
 ] 

Michael McCandless commented on LUCENE-6667:


bq. when you insert new tokens, restore the state instead of clearAttributes()

But e.g. if syn filter matched domain name system and wants to insert dns 
which token's attributes is it supposed to clone for the dns token?

 Custom attributes get cleared by filters
 

 Key: LUCENE-6667
 URL: https://issues.apache.org/jira/browse/LUCENE-6667
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.10.4
Reporter: Oliver Becker

 I believe the Lucene API enables users to define their custom attributes (by 
 extending {{AttributeImpl}}) which may be added by custom Tokenizers. 
 It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear 
 and restore the state of this custom attribute.
 However, some filters (in our case the SynonymFilter) simply call 
 {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the 
 filter just resets some known attributes, simply ignoring all other custom 
 attributes. In the end our custom attribute value is lost.
 Is this a bug in {{SynonymFilter}} (and others) or are we using the API in 
 the wrong way?
 A solution might be of course to provide empty implementations of {{clear}} 
 and {{copyTo}}, but I'm not sure if this has other unwanted effects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters

2015-07-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618515#comment-14618515
 ] 

Uwe Schindler commented on LUCENE-6667:
---

bq. But e.g. if syn filter matched domain name system and wants to insert 
dns which token's attributes is it supposed to clone for the dns token?

That's the problem with the multi word synonyms... It has to be defined (first, 
last,...). But I am not sure what the right thing to do is!

 Custom attributes get cleared by filters
 

 Key: LUCENE-6667
 URL: https://issues.apache.org/jira/browse/LUCENE-6667
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.10.4
Reporter: Oliver Becker

 I believe the Lucene API enables users to define their custom attributes (by 
 extending {{AttributeImpl}}) which may be added by custom Tokenizers. 
 It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear 
 and restore the state of this custom attribute.
 However, some filters (in our case the SynonymFilter) simply call 
 {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the 
 filter just resets some known attributes, simply ignoring all other custom 
 attributes. In the end our custom attribute value is lost.
 Is this a bug in {{SynonymFilter}} (and others) or are we using the API in 
 the wrong way?
 A solution might be of course to provide empty implementations of {{clear}} 
 and {{copyTo}}, but I'm not sure if this has other unwanted effects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters

2015-07-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618312#comment-14618312
 ] 

Uwe Schindler commented on LUCENE-6667:
---

I have not looked at SynonymFilter, but maybe there is a bug. In general the 
above is how all filters should call. Maybe we should somehow add some 
assertions that Filters never call clearAttributes(), but this is hard because 
of shared state between filters and root.

 Custom attributes get cleared by filters
 

 Key: LUCENE-6667
 URL: https://issues.apache.org/jira/browse/LUCENE-6667
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.10.4
Reporter: Oliver Becker

 I believe the Lucene API enables users to define their custom attributes (by 
 extending {{AttributeImpl}}) which may be added by custom Tokenizers. 
 It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear 
 and restore the state of this custom attribute.
 However, some filters (in our case the SynonymFilter) simply call 
 {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the 
 filter just resets some known attributes, simply ignoring all other custom 
 attributes. In the end our custom attribute value is lost.
 Is this a bug in {{SynonymFilter}} (and others) or are we using the API in 
 the wrong way?
 A solution might be of course to provide empty implementations of {{clear}} 
 and {{copyTo}}, but I'm not sure if this has other unwanted effects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters

2015-07-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618309#comment-14618309
 ] 

Uwe Schindler commented on LUCENE-6667:
---

In filters the approach should be the following:
- Of the original token capture the state
- when you insert new tokens, restore the state instead of clearAttributes()
- set the changed attributes

This approach is used by stemmers that insert stemmed tokens (preserve 
original), so the original attributes keep alive.

clearAttributes should only be called in Tokenizers or root TokenStreams.

 Custom attributes get cleared by filters
 

 Key: LUCENE-6667
 URL: https://issues.apache.org/jira/browse/LUCENE-6667
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.10.4
Reporter: Oliver Becker

 I believe the Lucene API enables users to define their custom attributes (by 
 extending {{AttributeImpl}}) which may be added by custom Tokenizers. 
 It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear 
 and restore the state of this custom attribute.
 However, some filters (in our case the SynonymFilter) simply call 
 {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the 
 filter just resets some known attributes, simply ignoring all other custom 
 attributes. In the end our custom attribute value is lost.
 Is this a bug in {{SynonymFilter}} (and others) or are we using the API in 
 the wrong way?
 A solution might be of course to provide empty implementations of {{clear}} 
 and {{copyTo}}, but I'm not sure if this has other unwanted effects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6667) Custom attributes get cleared by filters

2015-07-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618292#comment-14618292
 ] 

Michael McCandless commented on LUCENE-6667:


Hmm, {{SynonymFilter}} tries to preserve all attributes of the original 
incoming tokens (it uses {{capture/restoreState}} to do this).

But for the new tokens it inserts, it does use {{clearAttributes}} to make a 
completely blank slate, and then sets the term, offset, posInc/Length etc.

Which tokens (original input tokens vs. the inserted ones) are missing your 
custom attribute?

 Custom attributes get cleared by filters
 

 Key: LUCENE-6667
 URL: https://issues.apache.org/jira/browse/LUCENE-6667
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.10.4
Reporter: Oliver Becker

 I believe the Lucene API enables users to define their custom attributes (by 
 extending {{AttributeImpl}}) which may be added by custom Tokenizers. 
 It seems, the {{clear}} and {{copyTo}} methods must be implemented to clear 
 and restore the state of this custom attribute.
 However, some filters (in our case the SynonymFilter) simply call 
 {{AttributeSource.clearAttributes}} without invoking {{copyTo}}. Instead the 
 filter just resets some known attributes, simply ignoring all other custom 
 attributes. In the end our custom attribute value is lost.
 Is this a bug in {{SynonymFilter}} (and others) or are we using the API in 
 the wrong way?
 A solution might be of course to provide empty implementations of {{clear}} 
 and {{copyTo}}, but I'm not sure if this has other unwanted effects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org