One thing that I would caution though is that this is likely a heisenbug. The more logging I added earlier made it less likely to occur. It seems more likely to occur on Travis than locally and I made it happen by repeatedly running mvn install on Metron-solr (after a mvn install of the whole project). On Tue, Jan 24, 2017 at 07:59 Casey Stella <ceste...@gmail.com> wrote:
> Agreed to both counts. I was able to reproduce it locally, but not in an > IDE by the way. > On Tue, Jan 24, 2017 at 07:57 Justin Leet <justinjl...@gmail.com> wrote: > > I definitely agree that this isn't a fluke. > > Do we have a Jira for this? If not, I can create one and I would like to > propose that part of that ticket is adding logging. Right now, I'm > concerned we don't have enough info from the Travis builds to be able to > (easily) debug failure or reproduce locally. > > Justin > > On Mon, Jan 23, 2017 at 4:16 PM, Casey Stella <ceste...@gmail.com> wrote: > > > One more thing, just for posterity here, it always freezes at 6 records > > written to HDFS. That's the reason I thought it was a flushing issue. > > > > On Mon, Jan 23, 2017 at 3:38 PM, Casey Stella <ceste...@gmail.com> > wrote: > > > > > Ok, so now I'm concerned that this isn't a fluke. Here's an excerpt > from > > > the failing logs on travis for my PR with substantially longer > timeouts ( > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt) > > > > > > Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest > > > 0 vs 10 vs 0 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0- > > 1485200689038.json > > > 10 vs 10 vs 6 > > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 317.056 > > sec <<< FAILURE! > > > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest) > > Time elapsed: 316.949 sec <<< ERROR! > > > java.lang.RuntimeException: Took too long to complete: 300783 > 300000 > > > at org.apache.metron.integration.ComponentRunner.process( > > ComponentRunner.java:131) > > > at org.apache.metron.indexing.integration. > > IndexingIntegrationTest.test(IndexingIntegrationTest.java:173) > > > > > > > > > I'm getting the impression that this isn't the timeout and we have a > > mystery on our hands. Each of those lines "10 vs 10 vs 6" happen 15 > > seconds apart. That line means that it read 10 entries from kafka, 10 > > entries from the indexed data and 6 entries from HDFS. It's that 6 > entries > > that is the problem. Also of note, this does not seem to happen to me > > locally AND it's not consistent on Travis. Given all that I'd say that > > it's a problem with the HDFS Writer not getting flushed, but I verified > > that it is indeed flushed per message. > > > > > > > > > Anyway, tl;dr we have a mystery unit test bug that isn't deterministic > > wrt the unit tests and may or may not manifest itself outside of the unit > > tests. So, yeah, I'll be looking at it, but would appreciate others > taking > > a gander too. > > > > > > > > > Casey > > > > > > > > > On Mon, Jan 23, 2017 at 2:09 PM, Casey Stella <ceste...@gmail.com> > > wrote: > > > > > >> Yeah, I adjusted the timeout on the indexing integration tests as part > > of > > >> https://github.com/apache/incubator-metron/pull/420 which I'll merge > in > > >> today. > > >> > > >> On Mon, Jan 23, 2017 at 2:01 PM, zeo...@gmail.com <zeo...@gmail.com> > > >> wrote: > > >> > > >>> Okay, now we have back to back failures, and it looks like it may > have > > >>> been > > >>> a timeout issue? > > >>> `test(org.apache.metron.solr.integration. > > SolrIndexingIntegrationTest): > > >>> Took too long to complete: 150582 > 150000`, more details below: > > >>> > > >>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > > >>> 166.167 sec <<< FAILURE! > > >>> > > >>> test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest) > > >>> Time elapsed: 166.071 sec <<< ERROR! > > >>> > > >>> java.lang.RuntimeException: Took too long to complete: 150582 > > 150000 > > >>> > > >>> at org.apache.metron.integration. > > ComponentRunner.process(Compon > > >>> entRunner.java:131) > > >>> > > >>> at org.apache.metron.indexing.integration. > > IndexingIntegrationTe > > >>> st.test(IndexingIntegrationTest.java:173) > > >>> > > >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > >>> > > >>> at sun.reflect.NativeMethodAccessorImpl. > > invoke(NativeMethodAcce > > >>> ssorImpl.java:62) > > >>> > > >>> at sun.reflect.DelegatingMethodAccessorImpl. > > invoke(DelegatingMe > > >>> thodAccessorImpl.java:43) > > >>> > > >>> at java.lang.reflect.Method.invoke(Method.java:483) > > >>> > > >>> at org.junit.runners.model.FrameworkMethod$1. > > runReflectiveCall( > > >>> FrameworkMethod.java:50) > > >>> > > >>> at org.junit.internal.runners.model.ReflectiveCallable.run( > > Refl > > >>> ectiveCallable.java:12) > > >>> > > >>> at org.junit.runners.model.FrameworkMethod. > > invokeExplosively(Fr > > >>> ameworkMethod.java:47) > > >>> > > >>> at org.junit.internal.runners.statements.InvokeMethod. > > evaluate( > > >>> InvokeMethod.java:17) > > >>> > > >>> at org.junit.runners.ParentRunner.runLeaf( > > ParentRunner.java:325) > > >>> > > >>> at org.junit.runners.BlockJUnit4ClassRunner. > > runChild(BlockJUnit > > >>> 4ClassRunner.java:78) > > >>> > > >>> at org.junit.runners.BlockJUnit4ClassRunner. > > runChild(BlockJUnit > > >>> 4ClassRunner.java:57) > > >>> > > >>> at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > > >>> > > >>> at org.junit.runners.ParentRunner$1.schedule( > > ParentRunner.java: > > >>> 71) > > >>> > > >>> at org.junit.runners.ParentRunner.runChildren( > > ParentRunner.java > > >>> :288) > > >>> > > >>> at org.junit.runners.ParentRunner.access$000( > > ParentRunner.java: > > >>> 58) > > >>> > > >>> at org.junit.runners.ParentRunner$2.evaluate( > > ParentRunner.java: > > >>> 268) > > >>> > > >>> at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > > >>> > > >>> at org.apache.maven.surefire.junit4.JUnit4Provider.execute( > > JUni > > >>> t4Provider.java:252) > > >>> > > >>> at org.apache.maven.surefire.junit4.JUnit4Provider. > > executeTestS > > >>> et(JUnit4Provider.java:141) > > >>> > > >>> at org.apache.maven.surefire.junit4.JUnit4Provider.invoke( > > JUnit > > >>> 4Provider.java:112) > > >>> > > >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > >>> > > >>> at sun.reflect.NativeMethodAccessorImpl. > > invoke(NativeMethodAcce > > >>> ssorImpl.java:62) > > >>> > > >>> at sun.reflect.DelegatingMethodAccessorImpl. > > invoke(DelegatingMe > > >>> thodAccessorImpl.java:43) > > >>> > > >>> at java.lang.reflect.Method.invoke(Method.java:483) > > >>> > > >>> at org.apache.maven.surefire.util.ReflectionUtils. > > invokeMethodW > > >>> ithArray(ReflectionUtils.java:189) > > >>> > > >>> at org.apache.maven.surefire.booter.ProviderFactory$ > > ProviderPro > > >>> xy.invoke(ProviderFactory.java:165) > > >>> > > >>> at org.apache.maven.surefire.booter.ProviderFactory. > > invokeProvi > > >>> der(ProviderFactory.java:85) > > >>> > > >>> at org.apache.maven.surefire.booter.ForkedBooter. > > runSuitesInPro > > >>> cess(ForkedBooter.java:115) > > >>> > > >>> at org.apache.maven.surefire.booter.ForkedBooter.main( > > ForkedBoo > > >>> ter.java:75) > > >>> > > >>> > > >>> Jon > > >>> > > >>> On Thu, Jan 19, 2017 at 2:49 PM zeo...@gmail.com <zeo...@gmail.com> > > >>> wrote: > > >>> > > >>> > The build has been showing as failing > > >>> > <https://github.com/apache/incubator-metron> for a little while > now. > > >>> I > > >>> > know we recently updated the language around Merge Requirements > > >>> > <https://cwiki.apache.org/confluence/pages/viewpage.action?p > > >>> ageId=61332235>, > > >>> > but if I recall properly our current issue is simply a Travis CI > > >>> overload > > >>> > issue. Is there a way we can update the wiki doc to account for > > >>> situations > > >>> > like this? > > >>> > > > >>> > Jon > > >>> > -- > > >>> > > > >>> > Jon > > >>> > > > >>> > Sent from my mobile device > > >>> > > > >>> -- > > >>> > > >>> Jon > > >>> > > >>> Sent from my mobile device > > >>> > > >> > > >> > > > > > > >