I think Sun's String ctor probably does CodingErrorAction.REPLACE (insert
the 0x3f: question mark char) and IBM's probably
does CodingErrorAction.IGNORE (drops it)

i dont know who is right, both suck in my opinion, i
like CodingErrorAction.REPORT (throw an exception).

On Mon, Jul 26, 2010 at 3:41 PM, Shai Erera <ser...@gmail.com> wrote:

> From here: http://www.fileformat.info/info/unicode/char/d9ff/index.htm
>
> Looks like that character is not a valid Unicode character, and perhaps the
> IBM's JVM behaves correctly? Robert - you're the Unicode expert :).
>
> Shai
>
>
> On Mon, Jul 26, 2010 at 10:40 PM, Shai Erera <ser...@gmail.com> wrote:
>
>> I don't know what was the thing w/ the strings generated before, but now I
>> ran the test again w/ the same seed and it generates the same strings. So at
>> least it seems there are no problems w/ the Random class :).
>>
>> However, the string l.E fails w/ the IBM JVM and succeeds w/ SUN's. Any
>> ideas why? What does the test check anyway?
>>
>> I ran TRR2, and set the regexp to always be "l.E" and the test passes. The
>> failure comes from
>>
>> junit.framework.AssertionFailedError: expected:<true> but was:<false>
>>     at
>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:199)
>>     at
>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:171)
>>
>> I've set regexp to "l.E", and also 'string' inside assertAutomaton to
>> "\u006C\uD9FF\u0045". The byte[] returned from string.getBytes("UTF-8") are
>> [108, 69]. It just ignores the middle character. Perhaps that's why the test
>> fails?
>>
>> When I run this w/ SUN's JVM, the bytes returned are [108, 63, 69].
>>
>> If I manually set the bytes, using IBM's, to [108, 63, 69], then the test
>> passes.
>>
>> Interestingly, Googling for \uD9FF brings back LUCENE-2019 as the first
>> result :). I'll dig some more into this character, and why the IBM and SUN
>> JVMs return different byte[] representation for the same sequence of
>> characters. If you already spot the problem, please let me know.
>>
>> BTW, the test calls _TestUtil.getRandomMultiplier on every iteration loop,
>> which goes and checks a system property. Perhaps we can extract it to a
>> variable, or include a static constant in LuceneTestCase(J4) or something?
>>
>> Shai
>>
>>
>> On Mon, Jul 26, 2010 at 9:22 PM, Robert Muir <rcm...@gmail.com> wrote:
>>
>>> maybe there is a bug in ibm's random generator :)
>>>
>>>
>>> On Mon, Jul 26, 2010 at 11:50 AM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>> That's VERY spooky that w/ a fixed seed you see different random
>>>> regexps being made.
>>>>
>>>> Mike
>>>>
>>>> On Mon, Jul 26, 2010 at 11:40 AM, Shai Erera <ser...@gmail.com> wrote:
>>>> > Ok I've dug deeper into the test. I set the random seed to
>>>> > -9029631602016965389L in setUp(), and discovered that on the 4th
>>>> iteration
>>>> > it breaks. For some reason though, AutomatonTestUtil.randomRegex
>>>> generates
>>>> > different strings every time I run the test, even though it uses the
>>>> same
>>>> > Random object w/ the same seed ...
>>>> >
>>>> > Anyway, one of the regex that failed was this "l.E" (w/o the quotes)
>>>> and I
>>>> > think it's a lowercase L, '.' (dot) and 'E' (uppercase). Hope this
>>>> helps.
>>>> >
>>>> > Shai
>>>> >
>>>> > On Mon, Jul 26, 2010 at 6:23 PM, Robert Muir <rcm...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> sounds nasty... its good you are running the tests with this
>>>> different
>>>> >> jvm...
>>>> >>
>>>> >> On Mon, Jul 26, 2010 at 11:21 AM, Shai Erera <ser...@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>> Tried to run it w/ SUN JRE6 and it succeeds ! I've tried several
>>>> times
>>>> >>> and it succeeds every time. However, when I revert back to IBM's, it
>>>> fail
>>>> >>> immediately.
>>>> >>>
>>>> >>> I can help w/ the debug, if you give me a hint where to look :).
>>>> >>>
>>>> >>> Shai
>>>> >>>
>>>> >>> On Mon, Jul 26, 2010 at 5:57 PM, Shai Erera <ser...@gmail.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> Sorry for the delayed response.
>>>> >>>>
>>>> >>>> I ran it a couple more times, from Eclipse and Ant, and each time
>>>> it
>>>> >>>> fails (amazing !), w/ different seeds. More seeds that fail:
>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>> >>>> -4244174191361080127
>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>> >>>> -7059086272401721644
>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>> >>>> -1314734215611104147
>>>> >>>>
>>>> >>>> I use IBM JVM, tried w/ both 1.5 and 1.6 ...
>>>> >>>>
>>>> >>>> Mike, can we use LUCENE-2565 to track this, or would you prefer
>>>> that I
>>>> >>>> open a separate one?
>>>> >>>>
>>>> >>>> Shai
>>>> >>>>
>>>> >>>> On Mon, Jul 26, 2010 at 3:26 PM, Michael McCandless
>>>> >>>> <luc...@mikemccandless.com> wrote:
>>>> >>>>>
>>>> >>>>> On a more general note...
>>>> >>>>>
>>>> >>>>> Any time any of you out there hit an "odd" test failure, please
>>>> please
>>>> >>>>> please do just what Shai did: take it to the dev list!
>>>> >>>>>
>>>> >>>>> Think of Lucene's unit tests like SETI :)  We are desperately
>>>> seeking
>>>> >>>>> bugs, and you and your machine may just be lucky enough to find
>>>> one...
>>>> >>>>> go forth and buy expensive new power hungry computers just so you
>>>> can
>>>> >>>>> run the random tests over and over, seeking the bugs!
>>>> >>>>>
>>>> >>>>> But be sure to include that random seed when you do hit a
>>>> failure...
>>>> >>>>>
>>>> >>>>> Mike
>>>> >>>>>
>>>> >>>>> On Mon, Jul 26, 2010 at 8:23 AM, Robert Muir <rcm...@gmail.com>
>>>> wrote:
>>>> >>>>> > I agree, Shai can you open a bug? I cannot reproduce, did you
>>>> use an
>>>> >>>>> > IBM JVM
>>>> >>>>> > or another environment that might help us figure it out?
>>>> >>>>> >
>>>> >>>>> > On Mon, Jul 26, 2010 at 6:29 AM, Michael McCandless
>>>> >>>>> > <luc...@mikemccandless.com> wrote:
>>>> >>>>> >>
>>>> >>>>> >> Hmmm this means a bug is lurking.  This is the power of random
>>>> >>>>> >> testing
>>>> >>>>> >> (that every time we all run tests, we're testing different
>>>> "paths"
>>>> >>>>> >> through the code)....
>>>> >>>>> >>
>>>> >>>>> >> It seems exceptionally unlikely that LUCENE-2537's changes
>>>> would
>>>> >>>>> >> cause
>>>> >>>>> >> this!
>>>> >>>>> >>
>>>> >>>>> >> But, unfortunately, when I plug that seed in I don't see it
>>>> fail,
>>>> >>>>> >> which is odd.  I'll run a stress test to see if I can tickle
>>>> the
>>>> >>>>> >> bug... can you open a Jira issue so we don't lose track?
>>>> >>>>> >>
>>>> >>>>> >> Mike
>>>> >>>>> >>
>>>> >>>>> >> On Mon, Jul 26, 2010 at 2:57 AM, Shai Erera <ser...@gmail.com>
>>>> >>>>> >> wrote:
>>>> >>>>> >> > Hi
>>>> >>>>> >> >
>>>> >>>>> >> > I was running tests on trunk (after merging the changes from
>>>> >>>>> >> > LUCENE-2537)
>>>> >>>>> >> > and received this error message:
>>>> >>>>> >> >
>>>> >>>>> >> > expected:<true> but was:<false>
>>>> >>>>> >> >
>>>> >>>>> >> > junit.framework.AssertionFailedError: expected: but was:
>>>> >>>>> >> > at
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:197)
>>>> >>>>> >> > at
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:170)
>>>> >>>>> >> > at
>>>> >>>>> >> >
>>>> >>>>> >> >
>>>> org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:285)
>>>> >>>>> >> >
>>>> >>>>> >> > NOTE: random seed of testcase 'testRandomRegexes' was:
>>>> >>>>> >> > 3510820306304573866
>>>> >>>>> >> >
>>>> >>>>> >> > I'm sure it's related to my changes. Has anyone else seen
>>>> this
>>>> >>>>> >> > before?
>>>> >>>>> >> >
>>>> >>>>> >> > Shai
>>>> >>>>> >> >
>>>> >>>>> >>
>>>> >>>>> >>
>>>> >>>>> >>
>>>> ---------------------------------------------------------------------
>>>> >>>>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> >>>>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>> >>>>> >>
>>>> >>>>> >
>>>> >>>>> >
>>>> >>>>> >
>>>> >>>>> > --
>>>> >>>>> > Robert Muir
>>>> >>>>> > rcm...@gmail.com
>>>> >>>>> >
>>>> >>>>>
>>>> >>>>>
>>>> ---------------------------------------------------------------------
>>>> >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Robert Muir
>>>> >> rcm...@gmail.com
>>>> >
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcm...@gmail.com
>>>
>>
>>
>


-- 
Robert Muir
rcm...@gmail.com

Reply via email to