I think Sun's String ctor probably does CodingErrorAction.REPLACE (insert the 0x3f: question mark char) and IBM's probably does CodingErrorAction.IGNORE (drops it)
i dont know who is right, both suck in my opinion, i like CodingErrorAction.REPORT (throw an exception). On Mon, Jul 26, 2010 at 3:41 PM, Shai Erera <ser...@gmail.com> wrote: > From here: http://www.fileformat.info/info/unicode/char/d9ff/index.htm > > Looks like that character is not a valid Unicode character, and perhaps the > IBM's JVM behaves correctly? Robert - you're the Unicode expert :). > > Shai > > > On Mon, Jul 26, 2010 at 10:40 PM, Shai Erera <ser...@gmail.com> wrote: > >> I don't know what was the thing w/ the strings generated before, but now I >> ran the test again w/ the same seed and it generates the same strings. So at >> least it seems there are no problems w/ the Random class :). >> >> However, the string l.E fails w/ the IBM JVM and succeeds w/ SUN's. Any >> ideas why? What does the test check anyway? >> >> I ran TRR2, and set the regexp to always be "l.E" and the test passes. The >> failure comes from >> >> junit.framework.AssertionFailedError: expected:<true> but was:<false> >> at >> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:199) >> at >> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:171) >> >> I've set regexp to "l.E", and also 'string' inside assertAutomaton to >> "\u006C\uD9FF\u0045". The byte[] returned from string.getBytes("UTF-8") are >> [108, 69]. It just ignores the middle character. Perhaps that's why the test >> fails? >> >> When I run this w/ SUN's JVM, the bytes returned are [108, 63, 69]. >> >> If I manually set the bytes, using IBM's, to [108, 63, 69], then the test >> passes. >> >> Interestingly, Googling for \uD9FF brings back LUCENE-2019 as the first >> result :). I'll dig some more into this character, and why the IBM and SUN >> JVMs return different byte[] representation for the same sequence of >> characters. If you already spot the problem, please let me know. >> >> BTW, the test calls _TestUtil.getRandomMultiplier on every iteration loop, >> which goes and checks a system property. Perhaps we can extract it to a >> variable, or include a static constant in LuceneTestCase(J4) or something? >> >> Shai >> >> >> On Mon, Jul 26, 2010 at 9:22 PM, Robert Muir <rcm...@gmail.com> wrote: >> >>> maybe there is a bug in ibm's random generator :) >>> >>> >>> On Mon, Jul 26, 2010 at 11:50 AM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>>> That's VERY spooky that w/ a fixed seed you see different random >>>> regexps being made. >>>> >>>> Mike >>>> >>>> On Mon, Jul 26, 2010 at 11:40 AM, Shai Erera <ser...@gmail.com> wrote: >>>> > Ok I've dug deeper into the test. I set the random seed to >>>> > -9029631602016965389L in setUp(), and discovered that on the 4th >>>> iteration >>>> > it breaks. For some reason though, AutomatonTestUtil.randomRegex >>>> generates >>>> > different strings every time I run the test, even though it uses the >>>> same >>>> > Random object w/ the same seed ... >>>> > >>>> > Anyway, one of the regex that failed was this "l.E" (w/o the quotes) >>>> and I >>>> > think it's a lowercase L, '.' (dot) and 'E' (uppercase). Hope this >>>> helps. >>>> > >>>> > Shai >>>> > >>>> > On Mon, Jul 26, 2010 at 6:23 PM, Robert Muir <rcm...@gmail.com> >>>> wrote: >>>> >> >>>> >> sounds nasty... its good you are running the tests with this >>>> different >>>> >> jvm... >>>> >> >>>> >> On Mon, Jul 26, 2010 at 11:21 AM, Shai Erera <ser...@gmail.com> >>>> wrote: >>>> >>> >>>> >>> Tried to run it w/ SUN JRE6 and it succeeds ! I've tried several >>>> times >>>> >>> and it succeeds every time. However, when I revert back to IBM's, it >>>> fail >>>> >>> immediately. >>>> >>> >>>> >>> I can help w/ the debug, if you give me a hint where to look :). >>>> >>> >>>> >>> Shai >>>> >>> >>>> >>> On Mon, Jul 26, 2010 at 5:57 PM, Shai Erera <ser...@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> Sorry for the delayed response. >>>> >>>> >>>> >>>> I ran it a couple more times, from Eclipse and Ant, and each time >>>> it >>>> >>>> fails (amazing !), w/ different seeds. More seeds that fail: >>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was: >>>> >>>> -4244174191361080127 >>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was: >>>> >>>> -7059086272401721644 >>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was: >>>> >>>> -1314734215611104147 >>>> >>>> >>>> >>>> I use IBM JVM, tried w/ both 1.5 and 1.6 ... >>>> >>>> >>>> >>>> Mike, can we use LUCENE-2565 to track this, or would you prefer >>>> that I >>>> >>>> open a separate one? >>>> >>>> >>>> >>>> Shai >>>> >>>> >>>> >>>> On Mon, Jul 26, 2010 at 3:26 PM, Michael McCandless >>>> >>>> <luc...@mikemccandless.com> wrote: >>>> >>>>> >>>> >>>>> On a more general note... >>>> >>>>> >>>> >>>>> Any time any of you out there hit an "odd" test failure, please >>>> please >>>> >>>>> please do just what Shai did: take it to the dev list! >>>> >>>>> >>>> >>>>> Think of Lucene's unit tests like SETI :) We are desperately >>>> seeking >>>> >>>>> bugs, and you and your machine may just be lucky enough to find >>>> one... >>>> >>>>> go forth and buy expensive new power hungry computers just so you >>>> can >>>> >>>>> run the random tests over and over, seeking the bugs! >>>> >>>>> >>>> >>>>> But be sure to include that random seed when you do hit a >>>> failure... >>>> >>>>> >>>> >>>>> Mike >>>> >>>>> >>>> >>>>> On Mon, Jul 26, 2010 at 8:23 AM, Robert Muir <rcm...@gmail.com> >>>> wrote: >>>> >>>>> > I agree, Shai can you open a bug? I cannot reproduce, did you >>>> use an >>>> >>>>> > IBM JVM >>>> >>>>> > or another environment that might help us figure it out? >>>> >>>>> > >>>> >>>>> > On Mon, Jul 26, 2010 at 6:29 AM, Michael McCandless >>>> >>>>> > <luc...@mikemccandless.com> wrote: >>>> >>>>> >> >>>> >>>>> >> Hmmm this means a bug is lurking. This is the power of random >>>> >>>>> >> testing >>>> >>>>> >> (that every time we all run tests, we're testing different >>>> "paths" >>>> >>>>> >> through the code).... >>>> >>>>> >> >>>> >>>>> >> It seems exceptionally unlikely that LUCENE-2537's changes >>>> would >>>> >>>>> >> cause >>>> >>>>> >> this! >>>> >>>>> >> >>>> >>>>> >> But, unfortunately, when I plug that seed in I don't see it >>>> fail, >>>> >>>>> >> which is odd. I'll run a stress test to see if I can tickle >>>> the >>>> >>>>> >> bug... can you open a Jira issue so we don't lose track? >>>> >>>>> >> >>>> >>>>> >> Mike >>>> >>>>> >> >>>> >>>>> >> On Mon, Jul 26, 2010 at 2:57 AM, Shai Erera <ser...@gmail.com> >>>> >>>>> >> wrote: >>>> >>>>> >> > Hi >>>> >>>>> >> > >>>> >>>>> >> > I was running tests on trunk (after merging the changes from >>>> >>>>> >> > LUCENE-2537) >>>> >>>>> >> > and received this error message: >>>> >>>>> >> > >>>> >>>>> >> > expected:<true> but was:<false> >>>> >>>>> >> > >>>> >>>>> >> > junit.framework.AssertionFailedError: expected: but was: >>>> >>>>> >> > at >>>> >>>>> >> > >>>> >>>>> >> > >>>> >>>>> >> > >>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:197) >>>> >>>>> >> > at >>>> >>>>> >> > >>>> >>>>> >> > >>>> >>>>> >> > >>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:170) >>>> >>>>> >> > at >>>> >>>>> >> > >>>> >>>>> >> > >>>> org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:285) >>>> >>>>> >> > >>>> >>>>> >> > NOTE: random seed of testcase 'testRandomRegexes' was: >>>> >>>>> >> > 3510820306304573866 >>>> >>>>> >> > >>>> >>>>> >> > I'm sure it's related to my changes. Has anyone else seen >>>> this >>>> >>>>> >> > before? >>>> >>>>> >> > >>>> >>>>> >> > Shai >>>> >>>>> >> > >>>> >>>>> >> >>>> >>>>> >> >>>> >>>>> >> >>>> --------------------------------------------------------------------- >>>> >>>>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> >>>>> >> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>>> >> >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > -- >>>> >>>>> > Robert Muir >>>> >>>>> > rcm...@gmail.com >>>> >>>>> > >>>> >>>>> >>>> >>>>> >>>> --------------------------------------------------------------------- >>>> >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>>> >>>> >>>> >>>> >>> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Robert Muir >>>> >> rcm...@gmail.com >>>> > >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>> >>> >>> >>> -- >>> Robert Muir >>> rcm...@gmail.com >>> >> >> > -- Robert Muir rcm...@gmail.com