Re: Why release 3.0?

Mark Miller Mon, 16 Nov 2009 12:16:48 -0800

This is a big deal, weather its jdk or Lucene related. We are forcing
those on 1.4 to move to 1.5 - any problems you face with that with the
JDK are Lucene problems if they affect Lucene. We need big clear
warnings about this - we should have had them before we pushed to users
to 1.5 as well if I am reading right.


If it matters what JVM runs jflex, that is also a big deal. Even if it
hasn't been regenerated yet, it likely will be before long. We will
break then? Perhaps its better to break now?

I've only read through this thread quick, but to me, this is all a big
deal. Think of it from a user perspective. Its not okay to just say,
well, this stuff screws up Lucene, but its just because the user is
switching from 1.4 to 1.5 - thats not our concern - they should know the
consequences - I think that is our concern - very much so.

Robert Muir wrote:
> i suppose we are ok then, except for the fact that now
> StandardTokenizer is working with a unicode 3.0 definition, instead of
> the unicode version (4.0) that corresponds to our required minimum jre
> (1.5)...
>
> sorry if i raised a stink about nothing, but you see my concerns maybe?
>
> On Mon, Nov 16, 2009 at 3:01 PM, Uwe Schindler <[email protected]
> <mailto:[email protected]>> wrote:
>
>     JFlex was not regenerated as far as I know, but if somebody did,
>     its already broken…
>
>      
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: [email protected] <mailto:[email protected]>
>
>     ------------------------------------------------------------------------
>
>     *From:* Robert Muir [mailto:[email protected]
>     <mailto:[email protected]>]
>     *Sent:* Monday, November 16, 2009 8:53 PM
>
>     *To:* [email protected] <mailto:[email protected]>
>     *Subject:* Re: Why release 3.0?
>
>      
>
>     btw, so heres a great example. you are backwards broken regardless
>     of JVM for StandardTokenizer, because we used 1.4 JRE to run jflex
>     in 2.9, but 1.5 in 3.0, right?
>
>     On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <[email protected]
>     <mailto:[email protected]>> wrote:
>
>     Uwe, thats probably a good solution I think. just as long as we
>     document somewhere,
>     I think there is some warning verbage in StandardTokenizer already
>     about this.
>
>     NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
>           the tokenizer, remember to use JRE 1.4 to run jflex (before
>           Lucene 3.0).  This grammar now uses constructs (eg :digit:,
>           :letter:) whose meaning can vary according to the JRE used to
>           run jflex.  See
>           https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>
>      
>
>     On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <[email protected]
>     <mailto:[email protected]>> wrote:
>
>     But it is a general warning that should be placed in the Wiki: If
>     you upgrade from Java 1.4 to Java 5, think about reindexing.
>
>      
>
>     It has definitely nothing to do with 3.0, because uses could have
>     changed (and most of them have) before.
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: [email protected] <mailto:[email protected]>
>
>     ------------------------------------------------------------------------
>
>     *From:* Robert Muir [mailto:[email protected]
>     <mailto:[email protected]>]
>     *Sent:* Monday, November 16, 2009 8:45 PM
>
>
>     *To:* [email protected] <mailto:[email protected]>
>     *Subject:* Re: Why release 3.0?
>
>      
>
>     right, my point is its true its nothing to do with Lucene at all,
>     really.
>
>     but the reality is we should clarify this to users I think.
>
>     Its especially complex in the current StandardTokenizer, which
>     uses a mix of hardcoded ranges and properties, can you tell me if
>     you should reindex for given language X?
>     I wouldn't want to answer that question right now.
>
>     On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <[email protected]
>     <mailto:[email protected]>> wrote:
>
>     We tried out: Character.getType() for these two chars:
>
>      
>
>     Java 5:
>     '\u00AD' = 16
>     '\u06DD' = 16
>
>     Java 1.4:
>     '\u00AD' = 20
>     '\u06DD' = 7
>
>      
>
>     The first is the soft hyphen.
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: [email protected] <mailto:[email protected]>
>
>     ------------------------------------------------------------------------
>
>     *From:* Robert Muir [mailto:[email protected]
>     <mailto:[email protected]>]
>     *Sent:* Monday, November 16, 2009 8:37 PM
>
>
>     *To:* [email protected] <mailto:[email protected]>
>     *Subject:* Re: Why release 3.0?
>
>      
>
>     right, its nothing to do with lucene, instead due to property
>     changes, etc.
>
>     i just think we should inform users on java 1.4/2.9 that if they
>     upgrade to java 1.5/3.0, they should reindex.
>
>     the reason i say this about properties, is there are some that
>     change that will affect tokenizers, i give two examples, a hyphen
>     that changes from punctuation to format (might affect
>     SolrWordDelimiterFilter),
>     and arabic ayah which changes from NSM to format, which surely
>     affects ArabicLetterTokenizer.
>
>     On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <[email protected]
>     <mailto:[email protected]>> wrote:
>
>     Hi Robert,
>
>     I agree that the Unicode version supported by the JVM, as you say,
>     really has nothing to do with Lucene.
>
>     The disruption here is users' upgrading from Java 1.4 to 1.5+, not
>     when they upgrade Lucene.  I'd guess with few exceptions that most
>     people have been using Lucene with 1.5+ for a couple of years now,
>     though.
>
>     But even the upgrade from Java 1.4 to 1.5+ will have (had) zero
>     impact on most Lucene users, assuming that most use Latin-1
>     exclusively; although I haven't looked, I'd be surprised if
>     Latin-1 characters changed much, if at all, from Unicode 3.0 to 4.0.
>
>     It would be useful, I think, to include (a pointer to?) a
>     description of the details of the Unicode 3.0->4.0 differences in
>     the Lucene 3.0 release notes, since the minimum required Java
>     version, and so also the supported Unicode version, changes then.
>
>     Steve
>
>
>     On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>     > the problem is that the properties have changed for various
>     characters,
>     > and new characters were added.
>     >
>     > it really has nothing to do with lucene, but the idea you can go from
>     > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
>     true.
>     >
>     >
>     > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <[email protected]
>     <mailto:[email protected]>> wrote:
>     >
>     >
>     >       But an UTF-8 stream from Java 4 can still be read with Java 5,
>     > what is the problem? Java 5 extended Unicode support, but an index
>     > created with older versions can still be read. UTF-8 is standardized…
>     >
>     >
>     >
>     >       -----
>     >       Uwe Schindler
>     >       H.-H.-Meier-Allee 63, D-28213 Bremen
>     >       http://www.thetaphi.de
>     >       eMail: [email protected] <mailto:[email protected]>
>     >
>     >
>     > ________________________________
>     >
>     >
>     >       From: Robert Muir [mailto:[email protected]
>     <mailto:[email protected]>]
>     >       Sent: Monday, November 16, 2009 8:09 PM
>     >
>     >       To: [email protected]
>     <mailto:[email protected]>
>     >       Subject: Re: Why release 3.0?
>     >
>     >
>     >
>     >       uwe, on topic please read my comment on LUCENE-1689, because
>     > unicode version was bumped in jdk 1.5, i believe this index backwards
>     > compatibility is only theoretical
>     >
>     >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
>     <[email protected] <mailto:[email protected]>> wrote:
>     >
>     >       2.9 has *not* the same format as 3.0, an index created with 3.0
>     > cannot be read with 2.9. This is because compressed field support was
>     > removed and therefore the version number of the stored fields
>     file was
>     > upgraded. But indexes from 2.9 can be read with 3.0 and support
>     may get
>     > removed in 4.0. 3.0 Indexes can be read until version 4.9.
>     >
>     >
>     >
>     >       Uwe
>     >
>     >       -----
>     >       Uwe Schindler
>     >       H.-H.-Meier-Allee 63, D-28213 Bremen
>     >       http://www.thetaphi.de
>     >       eMail: [email protected] <mailto:[email protected]>
>     >
>     >
>     > ________________________________
>     >
>     >
>     >       From: Jake Mannix [mailto:[email protected]
>     <mailto:[email protected]>]
>     >       Sent: Monday, November 16, 2009 7:15 PM
>     >
>     >
>     >       To: [email protected]
>     <mailto:[email protected]>
>     >
>     >       Subject: Re: Why release 3.0?
>     >
>     >
>     >
>     >       Don't users need to upgrade to 3.0 because 3.1 won't be
>     > necessarily able to read your
>     >       2.4 index file formats?  I suppose if you've already
>     upgraded to
>     > 2.9, then all is well because
>     >       2.9 is the same format as 3.0, but we can't assume all users
>     > upgraded from 2.4 to 2.9.
>     >
>     >       If you've done that already, then 3.0 might not be necessary,
>     > but if you're on 2.4 right now,
>     >       you will be in for a bad surprise if you try to upgrade to 3.1.
>     >
>     >         -jake
>     >
>     >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>     > <[email protected] <mailto:[email protected]>> wrote:
>     >
>     >       One of my "specialties" is asking obvious questions just to see
>     > if everyone's assumptions are aligned. So with the discussion about
>     > branching 3.0 I have to ask "Is there going to be any 3.0 release
>     > intended for *production*?". And if not, would we save a lot of
>     > work by just not worrying about retrofitting fixes to a 3.0 branch
>     > and carrying on with 3.1 as the first *supported* 3.x release?
>     >
>     >       Since 3.0 is "upgrade-to-java5 and remove deprecations",
>     I'm not
>     > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
>     > "beta/snapshot" release to get a head start on cleaning up my code
>     > does seem worthwhile, if I have the spare time. And having a base
>     > 3.0 version that's not changing all over the place would be useful
>     > for that.
>     >
>     >       That said, I'm also not terribly comfortable with a "release"
>     > that's out there and unsupported.
>     >
>     >       Apologies if this has already been discussed, but I don't
>     > remember it. Although my memory isn't what it used to be (but
>     > some would claim it never was<G>)...
>     >
>     >       Erick
>
>
>
>
>     -- 
>     Robert Muir
>     [email protected] <mailto:[email protected]>
>
>
>
>
>     -- 
>     Robert Muir
>     [email protected] <mailto:[email protected]>
>
>
>
>
>     -- 
>     Robert Muir
>     [email protected] <mailto:[email protected]>
>
>
>
>
>     -- 
>     Robert Muir
>     [email protected] <mailto:[email protected]>
>
>
>
>
> -- 
> Robert Muir
> [email protected] <mailto:[email protected]>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Why release 3.0?

Reply via email to