On Mon, Apr 22, 2013 at 3:42 PM, Steve Rowe <sar...@gmail.com> wrote: > I just committed the edge-ngrams fix on the 4.3 release branch. > > I will not -1 RC2 for this, but if we're respinning anyway for SOLR-4746, > including the edge-ngrams fix in the respin shouldn't be a problem. > > Steve > > On Apr 22, 2013, at 9:27 AM, Robert Muir <rcm...@gmail.com> wrote: > >> If I was the RM, i would not respin for this edge-ngrams filter.
my take on this more community oriented. I really want to encourage folks to test our releases. Its a lot of work to upgrade existing apps to run with an RC and if somebody does that and finds a bug I think this is worth rolling a new RC. I don't have a rush here and quality of the release is most important here. If this makes 1 more person running our RC against their app to have a chance to catch a bug that would prevent them to upgrade it's worth the effort. I will catch up with yonik and see how long he needs for SOLR-4746 simon >> >> We already have tests to find such bugs, but these tests are currently >> disabled (!) because the filter is basically rotting. >> >> So i can't see how something can be important enough to respin a release >> candidate for, but not important in the sense no one cares if its unit tests >> are really working. >> >> On Mon, Apr 22, 2013 at 9:17 AM, Simon Willnauer <simon.willna...@gmail.com> >> wrote: >> I think we can add this to 4.3 I can roll another RC for that. >> >> simon >> >> On Mon, Apr 22, 2013 at 3:11 PM, Jack Krupansky <j...@basetechnology.com> >> wrote: >> > Is this a fix to 4.3 (RC3?) or for a 4.3.1? >> > >> > -- Jack Krupansky >> > >> > -----Original Message----- From: Steve Rowe >> > Sent: Monday, April 22, 2013 2:07 AM >> > >> > To: dev@lucene.apache.org >> > Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)" >> > >> > I've reopened LUCENE-4810 and attached a patch with a test and fix for this >> > problem. - Steve >> > >> > On Apr 22, 2013, at 1:09 AM, Steve Rowe <sar...@gmail.com> wrote: >> > >> >> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces >> >> surface forms. Not really so incompatible, I think. >> >> >> >> Regardless of the choice to use this particular sequence of filters, >> >> EdgeNGramTokenFilter shouldn't produce a bad stream. >> >> >> >> Steve >> >> >> >> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wun...@wunderwood.org> >> >> wrote: >> >> >> >>> Don't use a stemmer with edge ngrams. >> >>> >> >>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool >> >>> for matching the root. Those are logically incompatible transforms. >> >>> >> >>> wunder >> >>> >> >>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote: >> >>> >> >>>> Karol has uncovered a bug introduced by LUCENE-4810 >> >>>> <https://issues.apache.org/jira/browse/LUCENE-4810>, included in >> >>>> Lucene/Solr >> >>>> 4.3.0. >> >>>> >> >>>> The problem is an interaction between the Morfologik stemmer, which can >> >>>> produce multiple stems per input term, all but the first having a >> >>>> position >> >>>> increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams >> >>>> for >> >>>> input terms that are at least as long as the minimum configured length, >> >>>> and >> >>>> passes through unchanged the position increment for the first ngram >> >>>> output >> >>>> for any given input term. >> >>>> >> >>>> So what happens in Karol's case is that "T." has the period stripped by >> >>>> StandardTokenizer, then is stemmed by Morfologik to produce terms "to", >> >>>> "tom" and "tona". The first term "to" has a position increment of 1, >> >>>> but is >> >>>> not output by EdgeNGramTokenFilter, because it's length is below the >> >>>> configured minimum of 3. The second term "tom" is given a position >> >>>> increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum >> >>>> length, so gets output, and since it's the first output term for the >> >>>> input >> >>>> term "tom", the input position increment is left as-is in the output >> >>>> term: >> >>>> 0. That's how the first output term gets a position increment of 0. >> >>>> >> >>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0, >> >>>> EdgeNGramTokenFilter indiscriminately set all output terms' position >> >>>> increments to 1, so that explains why this behavior didn't occur with >> >>>> previously released versions. >> >>>> >> >>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the >> >>>> first term, that the position increment is greater than 0, and if it's >> >>>> not, >> >>>> then it should be set it to 1. >> >>>> >> >>>> Does anybody know if this could also be an issue for other filters? >> >>>> >> >>>> I'll work on a patch for EdgeNGramTokenFilter. >> >>>> >> >>>> Steve >> >>>> >> >>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <karol.sik...@laboratorium.ee> >> >>>> wrote: >> >>>> >> >>>>> hi, >> >>>>> >> >>>>> I extracted minimal failing example, solr configs(schema, >> >>>>> solrconfig.xml) and data are in attached archive. >> >>>>> I try to import simple document: >> >>>>> [ >> >>>>> { >> >>>>> "publisher": [ >> >>>>> "T. Gl\u00fccksberg" >> >>>>> ], >> >>>>> "uid": "1000881" >> >>>>> }, >> >>>>> { >> >>>>> "publisher": [ >> >>>>> "Ala a kota" >> >>>>> ], >> >>>>> "uid": "1000894" >> >>>>> } >> >>>>> ] >> >>>>> first fails on copyfield destination publisher_hl with exception >> >>>>> (trace: https://gist.github.com/anonymous/5429558), second is added >> >>>>> without >> >>>>> any problems. >> >>>>> schema.xml is here: https://gist.github.com/anonymous/5429562 >> >>>>> >> >>>>> When someone will trying to reproduce this behaviour remember to copy >> >>>>> libs related with morfologik and icu filters. >> >>>>> >> >>>>> This extracted example works fine with solr 4.0 - 4.2.1. >> >>>>> >> >>>>> Regards, >> >>>>> Karol >> >>>>> >> >>>>> >> >>>>> >> >>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze: >> >>>>>> >> >>>>>> hey karol, >> >>>>>> >> >>>>>> can you reproduce this behaviour in a small test-case (curl command or >> >>>>>> something like this) that we can reproduce? >> >>>>>> >> >>>>>> @solr guys any idea what this could be? >> >>>>>> >> >>>>>> simon >> >>>>>> >> >>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora >> >>>>>> >> >>>>>> <karol.sik...@laboratorium.ee> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Hi all, >> >>>>>>> >> >>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching >> >>>>>>> application which i'm developing. >> >>>>>>> A lot of importing records fails with exception >> >>>>>>> "java.lang.IllegalArgumentException: first position increment must be >> >>>>>>> > 0 >> >>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added >> >>>>>>> successfully, so I'm thinking that something is broken in new >> >>>>>>> release. >> >>>>>>> I'll try examine tomorrow what is broken. >> >>>>>>> >> >>>>>>> >> >>>>>>> Regards, >> >>>>>>> Karol >> >>>>>>> >> >>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze: >> >>>>>>> >> >>>>>>> >> >>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> Here is the RC: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054 >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> happy voting... >> >>>>>>>>> >> >>>>>>>>> here is my +1 >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> PyLucene 4.3 builds and passes its tests. >> >>>>>>>> >> >>>>>>>> +1 ! >> >>>>>>>> >> >>>>>>>> Andi.. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> --------------------------------------------------------------------- >> >>>>>>>> To unsubscribe, e-mail: >> >>>>>>>> dev-unsubscr...@lucene.apache.org >> >>>>>>>> >> >>>>>>>> For additional commands, e-mail: >> >>>>>>>> dev-h...@lucene.apache.org >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>> -- >> >>>>>>> Karol Sikora >> >>>>>>> +48 781 493 788 >> >>>>>>> >> >>>>>>> Laboratorium EE >> >>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa | >> >>>>>>> >> >>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> --------------------------------------------------------------------- >> >>>>>>> To unsubscribe, e-mail: >> >>>>>>> dev-unsubscr...@lucene.apache.org >> >>>>>>> >> >>>>>>> For additional commands, e-mail: >> >>>>>>> dev-h...@lucene.apache.org >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> --------------------------------------------------------------------- >> >>>>>> To unsubscribe, e-mail: >> >>>>>> dev-unsubscr...@lucene.apache.org >> >>>>>> >> >>>>>> For additional commands, e-mail: >> >>>>>> dev-h...@lucene.apache.org >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> -- >> >>>>> >> >>>>> Karol Sikora >> >>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0 >> >>>>> +48 781 493 788 >> >>>>> >> >>>>> Laboratorium EE >> >>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa | >> >>>>> >> >>>>> www.laboratorium.ee | www.laboratorium.ee/facebook >> >>>> >> >>>> >> >>>> >> >>>> --------------------------------------------------------------------- >> >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>>> >> >>> >> >>> -- >> >>> Walter Underwood >> >>> wun...@wunderwood.org >> >>> >> >>> >> >>> >> >> >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org