Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-24 Thread Lance Norskog
Is it possible to mark out backwards-incompatible changes with deprecation?
So at least we get warnings in Eclipse/Netbeans etc?

On Fri, Aug 21, 2009 at 9:50 AM, Mark Miller markrmil...@gmail.com wrote:

 Ryan McKinley wrote:
  Ahh, I see:
   Tokenizer extends TokenStream
 
  So if this is going to break everything that implements TokenStream
  rather then Tokenizer, it seems we should change the TokenizerFactory
  API from:
public Tokenizer create( Reader input )
  rather then:
public TokenStream create( Reader input );
 
  I would WAY rather have my compiler tell me something is wrong then
  get an error and then find some documentation about the tokenizer.
 
  - - - - -
 
  Personally, I think lucene/solr just need to fess up and admit that
  2.9 is *not* totally back compatible.
 I don't think anyone contends that Lucene is totally backcompat - and
 insofarasthatgoes there is no way Solr totally is - . it exposes a lot
 of Lucene.

 We admit our breaks in this release in the back compat breaks section.
 There is no way we will release claiming total back compat. Not even in
 the realm of possibility.
  No way is the Multireader change back-compatible!

 Personally, pure API wise - I think it was. Its a stickier issue on the
 possible more RAM usage - but too me, thats more of a Runtime change.
 Certain methods have always changed over time in their resource usage,
 and I think thats within back compat. This was a steep one to swallow
 though, I'll admit. Basically we just thought it was way worth it long
 term. And Hoss came up with some great ideas to help ease the possible
 pain.
 
  ryan
 
 
  On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote:
 
  On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com
  wrote:
  I'm fine upgrading, but it seems we should the 'back compatibility'
  notice more explicit.
 
  Yeah... that should be fun for expert-use plugins in general.  In
  Lucene-land, this is the release of the break... I think we've
  covered the changes reasonably well in our external APIs, but people
  can always use pretty much the full Lucene API when writing Solr
  plugins.
 
  I think we'll need to document that things in tokenizer tags need to
  inherit from Tokenizer classes.  It is technically a back-compat
  break, but I assume it will affect very few users?
 
  -Yonik
  http://www.lucidimagination.com
 


 --
 - Mark

 http://www.lucidimagination.com






-- 
Lance Norskog
goks...@gmail.com


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-24 Thread Ryan McKinley

not sure what you are asking for.

The patch in SOLR-1377 (now /trunk) breaks the TokenizerFactory API.

You will get an compiler error (or runtime error) with a  
TokenizerFactory that creates a TokenStream rather then a Tokenizer.


I don't see any clean way to do this via deprecation -- if you have  
ideas, shout them out.


ryan



On Aug 24, 2009, at 1:49 PM, Lance Norskog wrote:

Is it possible to mark out backwards-incompatible changes with  
deprecation?

So at least we get warnings in Eclipse/Netbeans etc?

On Fri, Aug 21, 2009 at 9:50 AM, Mark Miller markrmil...@gmail.com  
wrote:



Ryan McKinley wrote:

Ahh, I see:
Tokenizer extends TokenStream

So if this is going to break everything that implements TokenStream
rather then Tokenizer, it seems we should change the  
TokenizerFactory

API from:
 public Tokenizer create( Reader input )
rather then:
 public TokenStream create( Reader input );

I would WAY rather have my compiler tell me something is wrong then
get an error and then find some documentation about the tokenizer.

- - - - -

Personally, I think lucene/solr just need to fess up and admit that
2.9 is *not* totally back compatible.

I don't think anyone contends that Lucene is totally backcompat - and
insofarasthatgoes there is no way Solr totally is - . it exposes a  
lot

of Lucene.

We admit our breaks in this release in the back compat breaks  
section.
There is no way we will release claiming total back compat. Not  
even in

the realm of possibility.

No way is the Multireader change back-compatible!


Personally, pure API wise - I think it was. Its a stickier issue on  
the

possible more RAM usage - but too me, thats more of a Runtime change.
Certain methods have always changed over time in their resource  
usage,

and I think thats within back compat. This was a steep one to swallow
though, I'll admit. Basically we just thought it was way worth it  
long
term. And Hoss came up with some great ideas to help ease the  
possible

pain.


ryan


On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote:


On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com
wrote:
I'm fine upgrading, but it seems we should the 'back  
compatibility'

notice more explicit.


Yeah... that should be fun for expert-use plugins in general.  In
Lucene-land, this is the release of the break... I think we've
covered the changes reasonably well in our external APIs, but  
people

can always use pretty much the full Lucene API when writing Solr
plugins.

I think we'll need to document that things in tokenizer tags  
need to

inherit from Tokenizer classes.  It is technically a back-compat
break, but I assume it will affect very few users?

-Yonik
http://www.lucidimagination.com





--
- Mark

http://www.lucidimagination.com







--
Lance Norskog
goks...@gmail.com




Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Ryan McKinley
Actually I think there may be something wrong here.

BaseTokenizerFactory does not make a Tokenizer, it creates a
TokenStream, so it should never be cast to Tokenizer

My custom TokenizerFactory now looks the same as:
o.a.s.analysis.PatternTokenizerFactory

Not sure what to look at next...  ideas?

thanks
ryan


On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com wrote:
 Just updated to /trunk and am now seeing this exception:

 Caused by: org.apache.solr.client.solrj.SolrServerException:
 java.lang.ClassCastException:
 xxx.solr.analysis.JSONKeyValueTokenizerFactory$1 cannot be cast to
 org.apache.lucene.analysis.Tokenizer
        at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141)
        ... 15 more
 Caused by: java.lang.ClassCastException:
 xxx.solr.analysis.JSONKeyValueTokenizerFactory$1 cannot be cast to
 org.apache.lucene.analysis.Tokenizer
        at 
 org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69)
        at 
 org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74)
        at 
 org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364)
        at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:124)
        at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
        at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)


 Looks like SolrIndexAnalyzer now assumes everything uses the new
 TokenStream API...

 I'm fine upgrading, but it seems we should the 'back compatibility'
 notice more explicit.


 FYI, this is what the TokenizerFactory looks like:

 public class JSONKeyValueTokenizerFactory extends BaseTokenizerFactory
 {
  ...

  public TokenStream create(Reader input) {
    final JSONParser js = new JSONParser( input );
    final StackString keystack = new StackString();

    return new TokenStream()
    {
      ...



Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 10:33 AM, Ryan McKinleyryan...@gmail.com wrote:
 Actually I think there may be something wrong here.

 BaseTokenizerFactory does not make a Tokenizer, it creates a
 TokenStream, so it should never be cast to Tokenizer

 My custom TokenizerFactory now looks the same as:
 o.a.s.analysis.PatternTokenizerFactory

Urg... looks like there's no end-to-end (index then search) test for
PatternTokenizerFactory, so we never caught this.

It seems like when something is specified as a tokenizer in
schema.xml it should in fact be a tokenizer - it's the only way
tokenstream reuse works.

-Yonik


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Ryan McKinley


On Aug 21, 2009, at 10:49 AM, Yonik Seeley wrote:

On Fri, Aug 21, 2009 at 10:33 AM, Ryan McKinleyryan...@gmail.com  
wrote:

Actually I think there may be something wrong here.

BaseTokenizerFactory does not make a Tokenizer, it creates a
TokenStream, so it should never be cast to Tokenizer

My custom TokenizerFactory now looks the same as:
o.a.s.analysis.PatternTokenizerFactory


Urg... looks like there's no end-to-end (index then search) test for
PatternTokenizerFactory, so we never caught this.


I guess we need to add one :)



It seems like when something is specified as a tokenizer in
schema.xml it should in fact be a tokenizer - it's the only way
tokenstream reuse works.



I don't see anything in Solr that creates a Tokenizer.  The  
TokenizerFactory just creates a TokenStream.


It seems that TokenizerFactory really needs to be:
  public Tokenizer create( Reader input )
rather then:
  public TokenStream create( Reader input );

I don't see any backwards compatible way to make this change!

ideas?
ryan


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com wrote:
 I'm fine upgrading, but it seems we should the 'back compatibility'
 notice more explicit.

Yeah... that should be fun for expert-use plugins in general.  In
Lucene-land, this is the release of the break... I think we've
covered the changes reasonably well in our external APIs, but people
can always use pretty much the full Lucene API when writing Solr
plugins.

I think we'll need to document that things in tokenizer tags need to
inherit from Tokenizer classes.  It is technically a back-compat
break, but I assume it will affect very few users?

-Yonik
http://www.lucidimagination.com


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Ryan McKinley

Ahh, I see:
 Tokenizer extends TokenStream

So if this is going to break everything that implements TokenStream  
rather then Tokenizer, it seems we should change the TokenizerFactory  
API from:

  public Tokenizer create( Reader input )
rather then:
  public TokenStream create( Reader input );

I would WAY rather have my compiler tell me something is wrong then  
get an error and then find some documentation about the tokenizer.


- - - - -

Personally, I think lucene/solr just need to fess up and admit that  
2.9 is *not* totally back compatible.  No way is the Multireader  
change back-compatible!


ryan


On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote:

On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com  
wrote:

I'm fine upgrading, but it seems we should the 'back compatibility'
notice more explicit.


Yeah... that should be fun for expert-use plugins in general.  In
Lucene-land, this is the release of the break... I think we've
covered the changes reasonably well in our external APIs, but people
can always use pretty much the full Lucene API when writing Solr
plugins.

I think we'll need to document that things in tokenizer tags need to
inherit from Tokenizer classes.  It is technically a back-compat
break, but I assume it will affect very few users?

-Yonik
http://www.lucidimagination.com




Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 12:22 PM, Ryan McKinleyryan...@gmail.com wrote:
 Ahh, I see:
  Tokenizer extends TokenStream

 So if this is going to break everything that implements TokenStream rather
 then Tokenizer, it seems we should change the TokenizerFactory API from:
  public Tokenizer create( Reader input )
 rather then:
  public TokenStream create( Reader input );

 I would WAY rather have my compiler tell me something is wrong then get an
 error and then find some documentation about the tokenizer.

+1
Absolutely.

-Yonik
http://www.lucidimagination.com


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Mark Miller
Ryan McKinley wrote:
 Ahh, I see:
  Tokenizer extends TokenStream

 So if this is going to break everything that implements TokenStream
 rather then Tokenizer, it seems we should change the TokenizerFactory
 API from:
   public Tokenizer create( Reader input )
 rather then:
   public TokenStream create( Reader input );

 I would WAY rather have my compiler tell me something is wrong then
 get an error and then find some documentation about the tokenizer.

 - - - - -

 Personally, I think lucene/solr just need to fess up and admit that
 2.9 is *not* totally back compatible.  
I don't think anyone contends that Lucene is totally backcompat - and
insofarasthatgoes there is no way Solr totally is - . it exposes a lot
of Lucene.

We admit our breaks in this release in the back compat breaks section.
There is no way we will release claiming total back compat. Not even in
the realm of possibility.
 No way is the Multireader change back-compatible!

Personally, pure API wise - I think it was. Its a stickier issue on the
possible more RAM usage - but too me, thats more of a Runtime change.
Certain methods have always changed over time in their resource usage,
and I think thats within back compat. This was a steep one to swallow
though, I'll admit. Basically we just thought it was way worth it long
term. And Hoss came up with some great ideas to help ease the possible pain.

 ryan


 On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote:

 On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinleyryan...@gmail.com
 wrote:
 I'm fine upgrading, but it seems we should the 'back compatibility'
 notice more explicit.

 Yeah... that should be fun for expert-use plugins in general.  In
 Lucene-land, this is the release of the break... I think we've
 covered the changes reasonably well in our external APIs, but people
 can always use pretty much the full Lucene API when writing Solr
 plugins.

 I think we'll need to document that things in tokenizer tags need to
 inherit from Tokenizer classes.  It is technically a back-compat
 break, but I assume it will affect very few users?

 -Yonik
 http://www.lucidimagination.com



-- 
- Mark

http://www.lucidimagination.com