On Mon, Apr 22, 2013 at 3:42 PM, Steve Rowe <sar...@gmail.com> wrote:
> I just committed the edge-ngrams fix on the 4.3 release branch.
>
> I will not -1 RC2 for this, but if we're respinning anyway for SOLR-4746, 
> including the edge-ngrams fix in the respin shouldn't be a problem.
>
> Steve
>
> On Apr 22, 2013, at 9:27 AM, Robert Muir <rcm...@gmail.com> wrote:
>
>> If I was the RM, i would not respin for this edge-ngrams filter.

my take on this more community oriented. I really want to encourage
folks to test our releases. Its a lot of work to upgrade existing apps
to run with an RC and if somebody does that and finds a bug I think
this is worth rolling a new RC. I don't have a rush here and quality
of the release is most important here. If this makes 1 more person
running our RC against their app to have a chance to catch a bug that
would prevent them to upgrade it's worth the effort.

I will catch up with yonik and see how long he needs for SOLR-4746

simon


>>
>> We already have tests to find such bugs, but these tests are currently 
>> disabled (!) because the filter is basically rotting.
>>
>> So i can't see how something can be important enough to respin a release 
>> candidate for, but not important in the sense no one cares if its unit tests 
>> are really working.
>>
>> On Mon, Apr 22, 2013 at 9:17 AM, Simon Willnauer <simon.willna...@gmail.com> 
>> wrote:
>> I think we can add this to 4.3 I can roll another RC for that.
>>
>> simon
>>
>> On Mon, Apr 22, 2013 at 3:11 PM, Jack Krupansky <j...@basetechnology.com> 
>> wrote:
>> > Is this a fix to 4.3 (RC3?) or for a 4.3.1?
>> >
>> > -- Jack Krupansky
>> >
>> > -----Original Message----- From: Steve Rowe
>> > Sent: Monday, April 22, 2013 2:07 AM
>> >
>> > To: dev@lucene.apache.org
>> > Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"
>> >
>> > I've reopened LUCENE-4810 and attached a patch with a test and fix for this
>> > problem. - Steve
>> >
>> > On Apr 22, 2013, at 1:09 AM, Steve Rowe <sar...@gmail.com> wrote:
>> >
>> >> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces
>> >> surface forms.  Not really so incompatible, I think.
>> >>
>> >> Regardless of the choice to use this particular sequence of filters,
>> >> EdgeNGramTokenFilter shouldn't produce a bad stream.
>> >>
>> >> Steve
>> >>
>> >> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wun...@wunderwood.org>
>> >> wrote:
>> >>
>> >>> Don't use a stemmer with edge ngrams.
>> >>>
>> >>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool
>> >>> for matching the root. Those are logically incompatible transforms.
>> >>>
>> >>> wunder
>> >>>
>> >>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
>> >>>
>> >>>> Karol has uncovered a bug introduced by LUCENE-4810
>> >>>> <https://issues.apache.org/jira/browse/LUCENE-4810>, included in 
>> >>>> Lucene/Solr
>> >>>> 4.3.0.
>> >>>>
>> >>>> The problem is an interaction between the Morfologik stemmer, which can
>> >>>> produce multiple stems per input term, all but the first having a 
>> >>>> position
>> >>>> increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams 
>> >>>> for
>> >>>> input terms that are at least as long as the minimum configured length, 
>> >>>> and
>> >>>> passes through unchanged the position increment for the first ngram 
>> >>>> output
>> >>>> for any given input term.
>> >>>>
>> >>>> So what happens in Karol's case is that "T." has the period stripped by
>> >>>> StandardTokenizer, then is stemmed by Morfologik to produce terms "to",
>> >>>> "tom" and "tona".  The first term "to" has a position increment of 1, 
>> >>>> but is
>> >>>> not output by EdgeNGramTokenFilter, because it's length is below the
>> >>>> configured minimum of 3.  The second term "tom" is given a position
>> >>>> increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum
>> >>>> length, so gets output, and since it's the first output term for the 
>> >>>> input
>> >>>> term "tom", the input position increment is left as-is in the output 
>> >>>> term:
>> >>>> 0.  That's how the first output term gets a position increment of 0.
>> >>>>
>> >>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0,
>> >>>> EdgeNGramTokenFilter indiscriminately set all output terms' position
>> >>>> increments to 1, so that explains why this behavior didn't occur with
>> >>>> previously released versions.
>> >>>>
>> >>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the
>> >>>> first term, that the position increment is greater than 0, and if it's 
>> >>>> not,
>> >>>> then it should be set it to 1.
>> >>>>
>> >>>> Does anybody know if this could also be an issue for other filters?
>> >>>>
>> >>>> I'll work on a patch for EdgeNGramTokenFilter.
>> >>>>
>> >>>> Steve
>> >>>>
>> >>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <karol.sik...@laboratorium.ee>
>> >>>> wrote:
>> >>>>
>> >>>>> hi,
>> >>>>>
>> >>>>> I extracted minimal failing example, solr configs(schema,
>> >>>>> solrconfig.xml) and data are in attached archive.
>> >>>>> I try to import simple document:
>> >>>>> [
>> >>>>>   {
>> >>>>>       "publisher": [
>> >>>>>           "T. Gl\u00fccksberg"
>> >>>>>       ],
>> >>>>>       "uid": "1000881"
>> >>>>>   },
>> >>>>>   {
>> >>>>>       "publisher": [
>> >>>>>     "Ala a kota"
>> >>>>>       ],
>> >>>>>       "uid": "1000894"
>> >>>>>   }
>> >>>>> ]
>> >>>>> first fails on copyfield destination publisher_hl with exception
>> >>>>> (trace: https://gist.github.com/anonymous/5429558), second is added 
>> >>>>> without
>> >>>>> any problems.
>> >>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
>> >>>>>
>> >>>>> When someone will trying to reproduce this behaviour remember to copy
>> >>>>> libs related with morfologik and icu filters.
>> >>>>>
>> >>>>> This extracted example works fine with solr 4.0 - 4.2.1.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Karol
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>> >>>>>>
>> >>>>>> hey karol,
>> >>>>>>
>> >>>>>> can you reproduce this behaviour in a small test-case (curl command or
>> >>>>>> something like this) that we can reproduce?
>> >>>>>>
>> >>>>>> @solr guys any idea what this could be?
>> >>>>>>
>> >>>>>> simon
>> >>>>>>
>> >>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>> >>>>>>
>> >>>>>> <karol.sik...@laboratorium.ee>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Hi all,
>> >>>>>>>
>> >>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>> >>>>>>> application which i'm developing.
>> >>>>>>> A lot of importing records fails with exception
>> >>>>>>> "java.lang.IllegalArgumentException: first position increment must be
>> >>>>>>> > 0
>> >>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>> >>>>>>> successfully, so I'm thinking that something is broken in new
>> >>>>>>> release.
>> >>>>>>> I'll try examine tomorrow what is broken.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Regards,
>> >>>>>>> Karol
>> >>>>>>>
>> >>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> Here is the RC:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> happy voting...
>> >>>>>>>>>
>> >>>>>>>>> here is my +1
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> PyLucene 4.3 builds and passes its tests.
>> >>>>>>>>
>> >>>>>>>> +1 !
>> >>>>>>>>
>> >>>>>>>> Andi..
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> ---------------------------------------------------------------------
>> >>>>>>>> To unsubscribe, e-mail:
>> >>>>>>>> dev-unsubscr...@lucene.apache.org
>> >>>>>>>>
>> >>>>>>>> For additional commands, e-mail:
>> >>>>>>>> dev-h...@lucene.apache.org
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>> --
>> >>>>>>> Karol Sikora
>> >>>>>>> +48 781 493 788
>> >>>>>>>
>> >>>>>>> Laboratorium EE
>> >>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>> >>>>>>>
>> >>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> ---------------------------------------------------------------------
>> >>>>>>> To unsubscribe, e-mail:
>> >>>>>>> dev-unsubscr...@lucene.apache.org
>> >>>>>>>
>> >>>>>>> For additional commands, e-mail:
>> >>>>>>> dev-h...@lucene.apache.org
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe, e-mail:
>> >>>>>> dev-unsubscr...@lucene.apache.org
>> >>>>>>
>> >>>>>> For additional commands, e-mail:
>> >>>>>> dev-h...@lucene.apache.org
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Karol Sikora
>> >>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>> >>>>> +48 781 493 788
>> >>>>>
>> >>>>> Laboratorium EE
>> >>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>> >>>>>
>> >>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>> >>>>
>> >>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>>>
>> >>>
>> >>> --
>> >>> Walter Underwood
>> >>> wun...@wunderwood.org
>> >>>
>> >>>
>> >>>
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to