Hey Martin,

So far it sounds like we have two different issues discovered.

Tracking ticket is here:

https://issues.apache.org/jira/browse/SAMZA-16

Original issue discovered by Tejas is here:

https://issues.apache.org/jira/browse/SAMZA-10

And now there's TJ's issue, which we should probably update SAMZA-16 with.

Cheers,
Chris
________________________________________
From: Martin Kleppmann [[email protected]]
Sent: Thursday, March 06, 2014 1:40 PM
To: <[email protected]>
Subject: Re: TestStatefulTask failures

I've been developing with Java 7 on Mac OS (1.7.0_51-b13) and not noticed any 
particular issues -- at least it builds, the tests pass and hello-samza runs. 
If anything breaks, I'm happy to try to track it down. Is there anything 
particular I should be watching out for?

On 6 Mar 2014, at 20:18, Jakob Homan <[email protected]> wrote:
> The problems were with the build.  To my knowledge, no one has yet run
> Samza on 7.
>
>
> On Thu, Mar 6, 2014 at 11:23 AM, TJ Giuli <[email protected]> wrote:
>
>> Ok, yes, things look good from my end when I compile 12594fb710 with Java
>> 6 on Mac OS X.  Do you guys have more of a sense of whether the issues with
>> Java 7 are confined to the build, or is runtime stability of Samza on Java
>> 7 in question?  Should I exclusively run my Samza tasks with 6?  Thanks for
>> looking into this!
>> —T
>>
>> On Mar 5, 2014, at 1:10 PM, Garry Turkington <
>> [email protected]> wrote:
>>
>>> Woot, yes, I can confirm this patch fixes things on my host. Thanks
>> Chris!
>>>
>>> Regarding the failings in TestTopicMetadataCache this is Java 7 related,
>> something I need update SAMZA-16 about, this is an intermittent build
>> failure that we don't see on JDK6.
>>>
>>> Garry
>>>
>>> -----Original Message-----
>>> From: Chris Riccomini [mailto:[email protected]]
>>> Sent: 05 March 2014 20:10
>>> To: [email protected]
>>> Subject: Re: TestStatefulTask failures
>>>
>>> Hey Guys,
>>>
>>> I have a patch up at:
>>>
>>> https://issues.apache.org/jira/browse/SAMZA-166
>>>
>>>
>>> Could you please apply and see if this fixes your problem? I ran the
>> TestStatefulTask test for an hour, and it passed every time.
>>>
>>> TJ, regarding your cache issue, can you try running with Java 1.6
>> instead of 1.7, and see if that fixes the issue? Samza has had known issues
>> with Java 1.7.
>>>
>>> Cheers,
>>> Chris
>>>
>>> On 3/4/14 4:12 PM, "Jakob Homan" <[email protected]> wrote:
>>>
>>>> Hey TJ-
>>>> Java 1.7 is known to be flaky right now.  Garry had planned on
>>>> taking a look at the issue.  Not sure where he is on this.  We
>>>> definitely want to get better 1.7 support.
>>>> -jg
>>>>
>>>>
>>>>
>>>> On Tue, Mar 4, 2014 at 2:27 PM, TJ Giuli <[email protected]>
>>>> wrote:
>>>>
>>>>> Great, thanks Chris.
>>>>>
>>>>> Also, I should mention that when I build on my Mac, this is sprinkled
>>>>> throughout the build output:
>>>>>
>>>>> objc[52666]: Class JavaLaunchHelper is implemented in both
>>>>> /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/j
>>>>> ava
>>>>> and
>>>>>
>>>>> /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre/li
>>>>> b/l
>>>>> ibinstrument.dylib.
>>>>> One of the two will be used. Which one is undefined.
>>>>>
>>>>> —T
>>>>> On Mar 4, 2014, at 2:11 PM, Chris Riccomini <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hey Guys,
>>>>>>
>>>>>> Able to reproduce the change log issue. Opening a JIRA and
>>>>> investigating.
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/SAMZA-166
>>>>>>
>>>>>>
>>>>>> TJ, I'll dig into the cache issue afterwards.
>>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>> On 3/4/14 2:04 PM, "TJ Giuli" <[email protected]> wrote:
>>>>>>
>>>>>>> Sure, Chris,
>>>>>>>
>>>>>>> 1.)  d38277ff83956f5885dd6596db9c0e15761964c7
>>>>>>> 2.)  ./gradlew clean test
>>>>>>> 3.)  It doesn’t happen every time.  I just ran three consecutive
>>>>> tests,
>>>>> 2
>>>>>>> failed with different failures and one succeeded.
>>>>>>> Failure 1: http://pastebin.com/YG7KBjJz Failure 2:
>>>>>>> http://pastebin.com/7NqES1rS
>>>>>>>
>>>>>>> Thanks for getting on this!
>>>>>>> —T
>>>>>>>
>>>>>>> On Mar 4, 2014, at 1:37 PM, Chris Riccomini
>>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey Guys,
>>>>>>>>
>>>>>>>> Having a look, but nothing yet.
>>>>>>>>
>>>>>>>> Regarding the TestStatefulTask bugs, Martin did find a bug this
>>>>> morning
>>>>>>>> in
>>>>>>>> the SAMZA-142 commit. The issue is that KafkaSystemAdmin can
>>>>>>>> occasionally return empty metadata information for a change-log
>>>>>>>> stream. This
>>>>> results
>>>>>>>> in
>>>>>>>> an NPE later in the TaskStorageManager. The issue is triggered
>>>>>>>> when there is no lead Kafka broker for a given change-log's
>>>>>>>> topic/partition.
>>>>>>>>
>>>>>>>> That said, I don't *think* this should cause a failure in
>>>>>>>> TestStatefulTask, since TestStatefulTask.validateTopics is run
>>>>> before
>>>>>>>> the
>>>>>>>> tests are run, and validateTopics checks to make sure that the
>>>>> metadata
>>>>>>>> is
>>>>>>>> available and there is no error code.
>>>>>>>>
>>>>>>>> As for the testBasicMetadataCacheFunctionality, I haven't seen
>>>>>>>> that issue, and can't reproduce it. TJ, can you send:
>>>>>>>>
>>>>>>>> 1. The git checksum you're working off of.
>>>>>>>> 2. The command you're using to run the test.
>>>>>>>> 3. Does the failure happen every time, or just randomly?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On 3/3/14 11:57 PM, "TJ Giuli" <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hey, guys,
>>>>>>>>>
>>>>>>>>> I¹m also having build and test problems on both my Mac OS X
>>>>> (10.9.2)
>>>>>>>>> box
>>>>>>>>> and a relatively fresh Ubuntu 12.04  install.  On Ubuntu, I¹m
>>>>> getting
>>>>>>>>> the
>>>>>>>>> error that Garry describes (http://pastebin.com/4w3qr11K).  I
>>>>>>>>> was getting the same error on my Mac, but now I seem to have
>>>>>>>>> moved onto a
>>>>> failure
>>>>>>>>> in
>>>>>>>>> the testBasicMetadataCacheFunctionality test
>>>>>>>>> (http://pastebin.com/YNxrNC7q).
>>>>>>>>> ‹T
>>>>>>>>>
>>>>>>>>> On Mar 3, 2014, at 4:25 PM, Garry Turkington
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Jakob,
>>>>>>>>>>
>>>>>>>>>> Yep, here's the output:
>>>>>>>>>>
>>>>>>>>>> devel@vm17:~/samza$ git bisect bad
>>>>>>>>>> f50f022c7d0fbe648412c26c9d6dc677e7758006 is the first bad
>>>>>>>>>> commit commit f50f022c7d0fbe648412c26c9d6dc677e7758006
>>>>>>>>>> Author: Chris Riccomini <[email protected]>
>>>>>>>>>> Date:   Fri Feb 28 09:26:54 2014 -0800
>>>>>>>>>>
>>>>>>>>>> SAMZA-142; changelog stores should restore from beginning of
>>>>> stream,
>>>>>>>>>> not the end
>>>>>>>>>>
>>>>>>>>>> Garry
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Jakob Homan [mailto:[email protected]]
>>>>>>>>>> Sent: 03 March 2014 23:25
>>>>>>>>>> To: [email protected]
>>>>>>>>>> Subject: Re: TestStatefulTask failures
>>>>>>>>>>
>>>>>>>>>> Garry, can you run git bisect against the commits for the past
>>>>>>>>>> few days on the wheezy box?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Monday, March 3, 2014 at 3:11 PM, Garry Turkington wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>
>>>>>>>>>>> Posted the test log at :
>>>>>>>>>>>
>>>>>>>>>>> http://pastebin.com/LFEdfQqX
>>>>>>>>>>>
>>>>>>>>>>> Highlight is that it is timing out, and indeed line 325 of the
>>>>> test
>>>>>>>>>>> is
>>>>>>>>>>> task.awaitMessage. Which seems slightly odd as if there was
>>>>> something
>>>>>>>>>>> badly broken with the instantiation of Kafka and sending
>>>>>>>>>>> messages to/from it wouldn't we expect failures in the
>> samza-kafka tests?
>>>>>>>>>>>
>>>>>>>>>>> On the Wheezy box this is failing every time.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Garry
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Chris Riccomini [mailto:[email protected]]
>>>>>>>>>>> Sent: 03 March 2014 22:55
>>>>>>>>>>> To: [email protected]
>>>>>>>>>>> Subject: Re: TestStatefulTask failures
>>>>>>>>>>>
>>>>>>>>>>> Hey Garry,
>>>>>>>>>>>
>>>>>>>>>>> Master successfully tested on my Mac OSX box with:
>>>>>>>>>>>
>>>>>>>>>>> $ ./gradlew clean test
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>> On 3/3/14 2:49 PM, "Chris Riccomini" <[email protected]>
>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey Garry,
>>>>>>>>>>>>
>>>>>>>>>>>> Hmm. This is alarming.
>>>>>>>>>>>>
>>>>>>>>>>>> This test is really more of an integration test than a unit
>>>>> test,
>>>>>>>>>>>> which makes it a bit trickier to tell why it's failed. It is,
>>>>>>>>>>>> however, extraordinarily useful in catching a ton of obscure
>>>>> bugs
>>>>>>>>>>>> that sneak through most of the other tests.
>>>>>>>>>>>>
>>>>>>>>>>>> Questions:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. What is the error you see in the resulting test logs?
>>>>>>>>>>>> 2. Does it ALWAYS fail on your Wheezy box, or just sometimes?
>>>>>>>>>>>>
>>>>>>>>>>>> I will try and re-run on my end. It's working fine on a
>>>>>>>>>>>> branch
>>>>> of
>>>>>>>>>>>> mine that was rebased mid-last week, but perhaps something
>>>>>>>>>>>> has broken.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/3/14 2:44 PM, "Garry Turkington"
>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyone else having issues doing a clean build of master? I
>>>>>>>>>>>>> was happily doing rebuilds on a repo that I hadn't pulled
>>>>>>>>>>>>> from
>>>>> origin
>>>>>>>>>>>>> since mid-last week. Then I did a git pull today and I get
>>>>>>>>>>>>> the following on each build
>>>>>>>>>>>>> attempt:
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.samza.test.integration.TestStatefulTask >
>>>>>>>>>>>>> testShouldStartAndRestore FAILED java.lang.AssertionError at
>>>>>>>>>>>>> TestStatefulTask.scala:325
>>>>>>>>>>>>>
>>>>>>>>>>>>> The slightly curious thing is that if I go do a clone of
>>>>> master on
>>>>>>>>>>>>> a different host (Centos 5.2 64-bit) it builds fine but on
>>>>>>>>>>>>> my usual development VM (Debian Wheezy 64-bit) the above
>> happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This could be specific to my environment (not the first
>>>>>>>>>>>>> time!)
>>>>> but
>>>>>>>>>>>>> I also know there have been changes around state and that
>>>>> specific
>>>>>>>>>>>>> test recently so anyone else seeing odd behaviour?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Garry
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----
>>>>>>>>>>> No virus found in this message.
>>>>>>>>>>> Checked by AVG - www.avg.com
>>>>>>>>>>> Version: 2014.0.4259 / Virus Database: 3705/7144 - Release Date:
>>>>>>>>>>> 03/03/14
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----
>>>>>>>>>> No virus found in this message.
>>>>>>>>>> Checked by AVG - www.avg.com
>>>>>>>>>> Version: 2014.0.4259 / Virus Database: 3705/7144 - Release Date:
>>>>>>>>>> 03/03/14
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>
>>

Reply via email to