Re: /trunk ClassCastException with BaseTokenizerFactory
Is it possible to mark out backwards-incompatible changes with deprecation? So at least we get warnings in Eclipse/Netbeans etc? On Fri, Aug 21, 2009 at 9:50 AM, Mark Miller markrmil...@gmail.com wrote: Ryan McKinley wrote: Ahh, I see: Tokenizer extends TokenStream So if this is going to break everything that implements TokenStream rather then Tokenizer, it seems we should change the TokenizerFactory API from: public Tokenizer create( Reader input ) rather then: public TokenStream create( Reader input ); I would WAY rather have my compiler tell me something is wrong then get an error and then find some documentation about the tokenizer. - - - - - Personally, I think lucene/solr just need to fess up and admit that 2.9 is *not* totally back compatible. I don't think anyone contends that Lucene is totally backcompat - and insofarasthatgoes there is no way Solr totally is - . it exposes a lot of Lucene. We admit our breaks in this release in the back compat breaks section. There is no way we will release claiming total back compat. Not even in the realm of possibility. No way is the Multireader change back-compatible! Personally, pure API wise - I think it was. Its a stickier issue on the possible more RAM usage - but too me, thats more of a Runtime change. Certain methods have always changed over time in their resource usage, and I think thats within back compat. This was a steep one to swallow though, I'll admit. Basically we just thought it was way worth it long term. And Hoss came up with some great ideas to help ease the possible pain. ryan On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote: On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com wrote: I'm fine upgrading, but it seems we should the 'back compatibility' notice more explicit. Yeah... that should be fun for expert-use plugins in general. In Lucene-land, this is the release of the break... I think we've covered the changes reasonably well in our external APIs, but people can always use pretty much the full Lucene API when writing Solr plugins. I think we'll need to document that things in tokenizer tags need to inherit from Tokenizer classes. It is technically a back-compat break, but I assume it will affect very few users? -Yonik http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: /trunk ClassCastException with BaseTokenizerFactory
not sure what you are asking for. The patch in SOLR-1377 (now /trunk) breaks the TokenizerFactory API. You will get an compiler error (or runtime error) with a TokenizerFactory that creates a TokenStream rather then a Tokenizer. I don't see any clean way to do this via deprecation -- if you have ideas, shout them out. ryan On Aug 24, 2009, at 1:49 PM, Lance Norskog wrote: Is it possible to mark out backwards-incompatible changes with deprecation? So at least we get warnings in Eclipse/Netbeans etc? On Fri, Aug 21, 2009 at 9:50 AM, Mark Miller markrmil...@gmail.com wrote: Ryan McKinley wrote: Ahh, I see: Tokenizer extends TokenStream So if this is going to break everything that implements TokenStream rather then Tokenizer, it seems we should change the TokenizerFactory API from: public Tokenizer create( Reader input ) rather then: public TokenStream create( Reader input ); I would WAY rather have my compiler tell me something is wrong then get an error and then find some documentation about the tokenizer. - - - - - Personally, I think lucene/solr just need to fess up and admit that 2.9 is *not* totally back compatible. I don't think anyone contends that Lucene is totally backcompat - and insofarasthatgoes there is no way Solr totally is - . it exposes a lot of Lucene. We admit our breaks in this release in the back compat breaks section. There is no way we will release claiming total back compat. Not even in the realm of possibility. No way is the Multireader change back-compatible! Personally, pure API wise - I think it was. Its a stickier issue on the possible more RAM usage - but too me, thats more of a Runtime change. Certain methods have always changed over time in their resource usage, and I think thats within back compat. This was a steep one to swallow though, I'll admit. Basically we just thought it was way worth it long term. And Hoss came up with some great ideas to help ease the possible pain. ryan On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote: On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com wrote: I'm fine upgrading, but it seems we should the 'back compatibility' notice more explicit. Yeah... that should be fun for expert-use plugins in general. In Lucene-land, this is the release of the break... I think we've covered the changes reasonably well in our external APIs, but people can always use pretty much the full Lucene API when writing Solr plugins. I think we'll need to document that things in tokenizer tags need to inherit from Tokenizer classes. It is technically a back-compat break, but I assume it will affect very few users? -Yonik http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: /trunk ClassCastException with BaseTokenizerFactory
Actually I think there may be something wrong here. BaseTokenizerFactory does not make a Tokenizer, it creates a TokenStream, so it should never be cast to Tokenizer My custom TokenizerFactory now looks the same as: o.a.s.analysis.PatternTokenizerFactory Not sure what to look at next... ideas? thanks ryan On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com wrote: Just updated to /trunk and am now seeing this exception: Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.ClassCastException: xxx.solr.analysis.JSONKeyValueTokenizerFactory$1 cannot be cast to org.apache.lucene.analysis.Tokenizer at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141) ... 15 more Caused by: java.lang.ClassCastException: xxx.solr.analysis.JSONKeyValueTokenizerFactory$1 cannot be cast to org.apache.lucene.analysis.Tokenizer at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:124) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) Looks like SolrIndexAnalyzer now assumes everything uses the new TokenStream API... I'm fine upgrading, but it seems we should the 'back compatibility' notice more explicit. FYI, this is what the TokenizerFactory looks like: public class JSONKeyValueTokenizerFactory extends BaseTokenizerFactory { ... public TokenStream create(Reader input) { final JSONParser js = new JSONParser( input ); final StackString keystack = new StackString(); return new TokenStream() { ...
Re: /trunk ClassCastException with BaseTokenizerFactory
On Fri, Aug 21, 2009 at 10:33 AM, Ryan McKinleyryan...@gmail.com wrote: Actually I think there may be something wrong here. BaseTokenizerFactory does not make a Tokenizer, it creates a TokenStream, so it should never be cast to Tokenizer My custom TokenizerFactory now looks the same as: o.a.s.analysis.PatternTokenizerFactory Urg... looks like there's no end-to-end (index then search) test for PatternTokenizerFactory, so we never caught this. It seems like when something is specified as a tokenizer in schema.xml it should in fact be a tokenizer - it's the only way tokenstream reuse works. -Yonik
Re: /trunk ClassCastException with BaseTokenizerFactory
On Aug 21, 2009, at 10:49 AM, Yonik Seeley wrote: On Fri, Aug 21, 2009 at 10:33 AM, Ryan McKinleyryan...@gmail.com wrote: Actually I think there may be something wrong here. BaseTokenizerFactory does not make a Tokenizer, it creates a TokenStream, so it should never be cast to Tokenizer My custom TokenizerFactory now looks the same as: o.a.s.analysis.PatternTokenizerFactory Urg... looks like there's no end-to-end (index then search) test for PatternTokenizerFactory, so we never caught this. I guess we need to add one :) It seems like when something is specified as a tokenizer in schema.xml it should in fact be a tokenizer - it's the only way tokenstream reuse works. I don't see anything in Solr that creates a Tokenizer. The TokenizerFactory just creates a TokenStream. It seems that TokenizerFactory really needs to be: public Tokenizer create( Reader input ) rather then: public TokenStream create( Reader input ); I don't see any backwards compatible way to make this change! ideas? ryan
Re: /trunk ClassCastException with BaseTokenizerFactory
On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com wrote: I'm fine upgrading, but it seems we should the 'back compatibility' notice more explicit. Yeah... that should be fun for expert-use plugins in general. In Lucene-land, this is the release of the break... I think we've covered the changes reasonably well in our external APIs, but people can always use pretty much the full Lucene API when writing Solr plugins. I think we'll need to document that things in tokenizer tags need to inherit from Tokenizer classes. It is technically a back-compat break, but I assume it will affect very few users? -Yonik http://www.lucidimagination.com
Re: /trunk ClassCastException with BaseTokenizerFactory
Ahh, I see: Tokenizer extends TokenStream So if this is going to break everything that implements TokenStream rather then Tokenizer, it seems we should change the TokenizerFactory API from: public Tokenizer create( Reader input ) rather then: public TokenStream create( Reader input ); I would WAY rather have my compiler tell me something is wrong then get an error and then find some documentation about the tokenizer. - - - - - Personally, I think lucene/solr just need to fess up and admit that 2.9 is *not* totally back compatible. No way is the Multireader change back-compatible! ryan On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote: On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com wrote: I'm fine upgrading, but it seems we should the 'back compatibility' notice more explicit. Yeah... that should be fun for expert-use plugins in general. In Lucene-land, this is the release of the break... I think we've covered the changes reasonably well in our external APIs, but people can always use pretty much the full Lucene API when writing Solr plugins. I think we'll need to document that things in tokenizer tags need to inherit from Tokenizer classes. It is technically a back-compat break, but I assume it will affect very few users? -Yonik http://www.lucidimagination.com
Re: /trunk ClassCastException with BaseTokenizerFactory
On Fri, Aug 21, 2009 at 12:22 PM, Ryan McKinleyryan...@gmail.com wrote: Ahh, I see: Tokenizer extends TokenStream So if this is going to break everything that implements TokenStream rather then Tokenizer, it seems we should change the TokenizerFactory API from: public Tokenizer create( Reader input ) rather then: public TokenStream create( Reader input ); I would WAY rather have my compiler tell me something is wrong then get an error and then find some documentation about the tokenizer. +1 Absolutely. -Yonik http://www.lucidimagination.com
Re: /trunk ClassCastException with BaseTokenizerFactory
Ryan McKinley wrote: Ahh, I see: Tokenizer extends TokenStream So if this is going to break everything that implements TokenStream rather then Tokenizer, it seems we should change the TokenizerFactory API from: public Tokenizer create( Reader input ) rather then: public TokenStream create( Reader input ); I would WAY rather have my compiler tell me something is wrong then get an error and then find some documentation about the tokenizer. - - - - - Personally, I think lucene/solr just need to fess up and admit that 2.9 is *not* totally back compatible. I don't think anyone contends that Lucene is totally backcompat - and insofarasthatgoes there is no way Solr totally is - . it exposes a lot of Lucene. We admit our breaks in this release in the back compat breaks section. There is no way we will release claiming total back compat. Not even in the realm of possibility. No way is the Multireader change back-compatible! Personally, pure API wise - I think it was. Its a stickier issue on the possible more RAM usage - but too me, thats more of a Runtime change. Certain methods have always changed over time in their resource usage, and I think thats within back compat. This was a steep one to swallow though, I'll admit. Basically we just thought it was way worth it long term. And Hoss came up with some great ideas to help ease the possible pain. ryan On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote: On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com wrote: I'm fine upgrading, but it seems we should the 'back compatibility' notice more explicit. Yeah... that should be fun for expert-use plugins in general. In Lucene-land, this is the release of the break... I think we've covered the changes reasonably well in our external APIs, but people can always use pretty much the full Lucene API when writing Solr plugins. I think we'll need to document that things in tokenizer tags need to inherit from Tokenizer classes. It is technically a back-compat break, but I assume it will affect very few users? -Yonik http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com