Update on this thread..  There has been progress and we have few fixes
being tested

https://github.com/vinothchandar/incubator-hudi/tree/hudi-312-flaky-tests
https://github.com/apache/incubator-hudi/pull/989
<https://github.com/apache/incubator-hudi/pull/989>

It boiled down the remnants from the previous run hanging around and
causing invalid states. We also had some threadpool that was n't closed
upon such an unexpected error causing the jvm to hang around.

@Balaji Varadarajan <vbal...@apache.org>  I think its best to rebuild and
publish new images which use local storage for hdfs . wdyt?

Also filed a few follow ups : HUDI-322, HUDI-323



On Sat, Oct 26, 2019 at 9:36 AM Vinoth Chandar <vin...@apache.org> wrote:

> Disabling UI is not doing the trick. I think it gets stuck while starting
> up (and not while exiting like I assumed incorrectly before).
>
> On Fri, Oct 25, 2019 at 9:00 AM Vinoth Chandar <vin...@apache.org> wrote:
>
>> Could we disable the UI and try again? Its either the jetty threads or
>> the two HDFS threads that's hanging on.
>> Cannot understand why the JVM would n't exit otherwise.
>>
>> On Fri, Oct 25, 2019 at 5:27 AM Bhavani Sudha <bhavanisud...@gmail.com>
>> wrote:
>>
>>> https://gist.github.com/bhasudha/5aac43d93a942f68bcab413a26229292
>>>  Took a thread dump. Seems like jetty threads are not shutting down? Dont
>>> see any hudi/spark related activity that is pending. Only threads in
>>> RUNNABLE state are jetty ones
>>>
>>> On Fri, Oct 25, 2019 at 1:54 AM Pratyaksh Sharma <pratyaks...@gmail.com>
>>> wrote:
>>>
>>> > Hi Vinoth,
>>> >
>>> > > can you try
>>> > - Do : docker ps -a and make sure there are no lingering containers.
>>> > - if so, run : cd docker; ./stop_demo.sh
>>> > - cd ..
>>> > - mvn clean verify -DskipUTs=true -B
>>> >
>>> > I ran the above 3 times. Twice it was successful but once it incurred
>>> the
>>> > same errors I listed in previous mail.
>>> >
>>> > On Fri, Oct 25, 2019 at 8:26 AM Vinoth Chandar <
>>> > mail.vinoth.chan...@gmail.com> wrote:
>>> >
>>> > > Got the integ test to hang once, at the same spot as Pratyaksh
>>> > mentioned..
>>> > > So it would be a good candidate to drill into.
>>> > >
>>> > > @nishith in this state, the containers are all open. So you could
>>> just
>>> > hop
>>> > > in and stack trace to see whats going on.
>>> > >
>>> > >
>>> > > On Thu, Oct 24, 2019 at 9:14 AM Nishith <n3.nas...@gmail.com> wrote:
>>> > >
>>> > > > I’m going to look into the flaky tests on Travis sometime today.
>>> > > >
>>> > > > -Nishith
>>> > > >
>>> > > > Sent from my iPhone
>>> > > >
>>> > > > > On Oct 23, 2019, at 10:23 PM, Vinoth Chandar <vin...@apache.org>
>>> > > wrote:
>>> > > > >
>>> > > > > Just to make sure we are on the same page,
>>> > > > >
>>> > > > > can you try
>>> > > > > - Do : docker ps -a and make sure there are no lingering
>>> containers.
>>> > > > > - if so, run : cd docker; ./stop_demo.sh
>>> > > > > - cd ..
>>> > > > > - mvn clean verify -DskipUTs=true -B
>>> > > > >
>>> > > > > and this always gets stuck? The failures on CI seem to be random
>>> > > > timeouts.
>>> > > > > Not very related to this.
>>> > > > >
>>> > > > > FWIW I ran the above 3 times, without glitches so far.. So if
>>> you can
>>> > > > > confirm then it ll help
>>> > > > >
>>> > > > >> On Wed, Oct 23, 2019 at 7:04 AM Vinoth Chandar <
>>> vin...@apache.org>
>>> > > > wrote:
>>> > > > >>
>>> > > > >> I saw someone else share the same experience. Can't think of
>>> > anything
>>> > > > that
>>> > > > >> could have caused this to become flaky recently.
>>> > > > >> I already created
>>> https://issues.apache.org/jira/browse/HUDI-312
>>> > > > >> <
>>> > > >
>>> > >
>>> >
>>> https://issues.apache.org/jira/browse/HUDI-312?filter=12347468&jql=project%20%3D%20HUDI%20AND%20fixVersion%20%3D%200.5.1%20AND%20(status%20%3D%20Open%20OR%20status%20%3D%20%22In%20Progress%22)%20ORDER%20BY%20assignee%20ASC
>>> > > >
>>> > > > to
>>> > > > >> look into some flakiness on travis.
>>> > > > >>
>>> > > > >> any volunteers to drive this? (I am in the middle of fleshing
>>> out an
>>> > > > RFC)
>>> > > > >>
>>> > > > >> On Wed, Oct 23, 2019 at 6:43 AM Pratyaksh Sharma <
>>> > > pratyaks...@gmail.com
>>> > > > >
>>> > > > >> wrote:
>>> > > > >>
>>> > > > >>> It gets stuck forever while running the following -
>>> > > > >>>
>>> > > > >>> Container : /adhoc-1, Running command :spark-submit --class
>>> > > > >>> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
>>> > > > >>>
>>> > > >
>>> >
>>> /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar
>>> > > > >>> --storage-type MERGE_ON_READ  --source-class
>>> > > > >>> org.apache.hudi.utilities.sources.JsonDFSSource
>>> > > > --source-ordering-field ts
>>> > > > >>> --target-base-path /user/hive/warehouse/stock_ticks_mor
>>> > > --target-table
>>> > > > >>> stock_ticks_mor --props /var/demo/config/dfs-source.properties
>>> > > > >>> --schemaprovider-class
>>> > > > >>> org.apache.hudi.utilities.schema.FilebasedSchemaProvider
>>> > > > >>> --disable-compaction  --enable-hive-sync  --hoodie-conf
>>> > > > >>>
>>> hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000
>>> > > > >>> --hoodie-conf hoodie.datasource.hive_sync.username=hive
>>> > > --hoodie-conf
>>> > > > >>> hoodie.datasource.hive_sync.password=hive  --hoodie-conf
>>> > > > >>> hoodie.datasource.hive_sync.partition_fields=dt  --hoodie-conf
>>> > > > >>> hoodie.datasource.hive_sync.database=default  --hoodie-conf
>>> > > > >>> hoodie.datasource.hive_sync.table=stock_ticks_mor
>>> > > > >>>
>>> > > > >>> On Wed, Oct 23, 2019 at 7:02 PM Pratyaksh Sharma <
>>> > > > pratyaks...@gmail.com>
>>> > > > >>> wrote:
>>> > > > >>>
>>> > > > >>>> Hi,
>>> > > > >>>>
>>> > > > >>>> I am facing errors when trying to run integration tests using
>>> the
>>> > > > script
>>> > > > >>>> travis_run_tests.sh and also it takes a lot of time or rather
>>> gets
>>> > > > >>> stuck.
>>> > > > >>>> If I run them like normal junit tests, they work fine.
>>> > > > >>>>
>>> > > > >>>> Sometimes random transient errors also come, but these are the
>>> > most
>>> > > > >>>> frequent ones -
>>> > > > >>>>
>>> > > > >>>> [ERROR] Tests run: 3, Failures: 3, Errors: 0, Skipped: 0, Time
>>> > > > elapsed:
>>> > > > >>>> 345.207 s <<< FAILURE! - in
>>> > org.apache.hudi.integ.ITTestHoodieSanity
>>> > > > >>>> [ERROR]
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> testRunHoodieJavaAppOnSinglePartitionKeyCOWTable(org.apache.hudi.integ.ITTestHoodieSanity)
>>> > > > >>>> Time elapsed: 129.227 s  <<< FAILURE!
>>> > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in
>>> the
>>> > > new
>>> > > > >>>> table expected:<100> but was:<200>
>>> > > > >>>> at
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115)
>>> > > > >>>> at
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnSinglePartitionKeyCOWTable(ITTestHoodieSanity.java:42)
>>> > > > >>>>
>>> > > > >>>> [ERROR]
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> testRunHoodieJavaAppOnMultiPartitionKeysCOWTable(org.apache.hudi.integ.ITTestHoodieSanity)
>>> > > > >>>> Time elapsed: 108.146 s  <<< FAILURE!
>>> > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in
>>> the
>>> > > new
>>> > > > >>>> table expected:<100> but was:<200>
>>> > > > >>>> at
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115)
>>> > > > >>>> at
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnMultiPartitionKeysCOWTable(ITTestHoodieSanity.java:54)
>>> > > > >>>>
>>> > > > >>>> [ERROR]
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> testRunHoodieJavaAppOnNonPartitionedCOWTable(org.apache.hudi.integ.ITTestHoodieSanity)
>>> > > > >>>> Time elapsed: 107.63 s  <<< FAILURE!
>>> > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in
>>> the
>>> > > new
>>> > > > >>>> table expected:<100> but was:<200>
>>> > > > >>>> at
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115)
>>> > > > >>>> at
>>> > > > >>>>
>>> > > > >>>
>>> > > >
>>> > >
>>> >
>>> org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnNonPartitionedCOWTable(ITTestHoodieSanity.java:66)
>>> > > > >>>>
>>> > > > >>>> Has anybody else faced similar issues?
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>>
>>

Reply via email to