Re: Flume failing travis builds

2022-01-30 Thread Ralph Goers
I moved the CI build to GitHub Actions and was able to get the build for both 
MacOS and Linux to complete successfully.

Now I will take a look at what other dependencies need upgrading.

Ralph

> On Jan 27, 2022, at 10:24 PM, Ralph Goers  wrote:
> 
> I am going to admit I am becoming very annoyed with these Travis builds. 
> 
> For one, I have looked at the build history and as far as I can tell none of 
> them have 
> ever worked. Several of them have check marks on them but when you look at 
> the job 
> log you will see they failed.
> 
> Next, it appears that the build is running using the command 
> 
> ./mvnw test —quiet --fail-fast --threads 2.0C
> 
> When I run that locally (without the —quiet) the build also fails, but 
> differently than how 
> Travis does.  I see the output below. You will notice that stuff isn’t 
> running in the order listed by the reactor. 
> I suspect that this may be caused by running the build in parallel. In fact, 
> when the command started I saw
> 
> [WARNING] The following plugins are not marked @threadSafe in Flume NG SDK:
> [INFO] --- maven-remote-resources-plugin:1.7.0:process 
> (process-resource-bundles) @ flume-ng-clients ---
> [WARNING] com.thoughtworks.paranamer:paranamer-maven-plugin:2.8
> [WARNING] Enable debug to see more precisely which goals are not marked 
> @threadSafe.
> [WARNING] *
> 
> I then reran it locally without the —threads option and it completed 
> successfully so I am starting to think 
> some of the weirdness is due to running the build in parallel. I have 
> committed a change to remove that 
> option and enabled debug logging to the travis log so I can have more 
> information if/when it fails.
> 
> 
> [ERROR] Failures: 
> [ERROR]   TestKafkaSink.testStaticTopic:214->checkMessageArrived:195 No 
> message matches static-topic-test
> [INFO] 
> [ERROR] Tests run: 18, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Build Support .. SUCCESS [  0.555 
> s]
> [INFO] Apache Flume 1.10.0-SNAPSHOT ... SUCCESS [  0.751 
> s]
> [INFO] Flume NG SDK ... SUCCESS [07:24 
> min]
> [INFO] Flume NG Hadoop Credential Store Config Filter . SUCCESS [  0.058 
> s]
> [INFO] Flume NG Config Filters API  SUCCESS [  0.302 
> s]
> [INFO] Flume NG Configuration . SUCCESS [  5.684 
> s]
> [INFO] Flume Auth . SUCCESS [  6.836 
> s]
> [INFO] Flume NG Core .. SUCCESS [13:39 
> min]
> [INFO] Flume NG Sinks . SUCCESS [  0.036 
> s]
> [INFO] Flume NG HDFS Sink . SKIPPED
> [INFO] Flume NG IRC Sink .. SUCCESS [ 12.079 
> s]
> [INFO] Flume NG Channels .. SUCCESS [  0.060 
> s]
> [INFO] Flume NG JDBC channel .. SUCCESS [ 53.409 
> s]
> [INFO] Flume NG file-based channel  SKIPPED
> [INFO] Flume NG Spillable Memory channel .. SKIPPED
> [INFO] Flume NG Node .. SKIPPED
> [INFO] Flume NG Embedded Agent  SKIPPED
> [INFO] Flume NG HBase Sink  SKIPPED
> [INFO] Flume NG HBase2 Sink ... SUCCESS [01:48 
> min]
> [INFO] Flume NG ElasticSearch Sink  SUCCESS [01:28 
> min]
> [INFO] Flume NG Morphline Solr Sink ... SUCCESS [ 43.347 
> s]
> [INFO] Flume Shared Utils . SUCCESS [  0.056 
> s]
> [INFO] Flume Shared Kafka . SUCCESS [  1.046 
> s]
> [INFO] Flume Shared Kafka Test Utils .. SUCCESS [ 11.672 
> s]
> [INFO] Flume Kafka Sink ... FAILURE [02:11 
> min]
> [INFO] Flume HTTP/S Sink .. SUCCESS [ 31.688 
> s]
> [INFO] Flume NG Hive Sink . SUCCESS [01:13 
> min]
> [INFO] Flume Sources .. SUCCESS [  0.035 
> s]
> [INFO] Flume Scribe Source  SUCCESS [ 18.953 
> s]
> [INFO] Flume JMS Source ... SUCCESS [01:06 
> min]
> [INFO] Flume Twitter Source ... SUCCESS [ 11.377 
> s]
> [INFO] Flume Kafka Source . SKIPPED
> [INFO] Flume Taildir Source ... SUCCESS [ 29.437 
> s]
> [INFO] flume-kafka-channel  SKIPPED
> [INFO] Flume legacy Sources ... SUCCESS [  0.061 
> s]
> [INFO] Flume legacy Avro source .

Re: Flume failing travis builds

2022-01-27 Thread Ralph Goers
I am going to admit I am becoming very annoyed with these Travis builds. 

For one, I have looked at the build history and as far as I can tell none of 
them have 
ever worked. Several of them have check marks on them but when you look at the 
job 
log you will see they failed.

Next, it appears that the build is running using the command 

./mvnw test —quiet --fail-fast --threads 2.0C

When I run that locally (without the —quiet) the build also fails, but 
differently than how 
Travis does.  I see the output below. You will notice that stuff isn’t running 
in the order listed by the reactor. 
I suspect that this may be caused by running the build in parallel. In fact, 
when the command started I saw

[WARNING] The following plugins are not marked @threadSafe in Flume NG SDK:
[INFO] --- maven-remote-resources-plugin:1.7.0:process 
(process-resource-bundles) @ flume-ng-clients ---
[WARNING] com.thoughtworks.paranamer:paranamer-maven-plugin:2.8
[WARNING] Enable debug to see more precisely which goals are not marked 
@threadSafe.
[WARNING] *

I then reran it locally without the —threads option and it completed 
successfully so I am starting to think 
some of the weirdness is due to running the build in parallel. I have committed 
a change to remove that 
option and enabled debug logging to the travis log so I can have more 
information if/when it fails.


[ERROR] Failures: 
[ERROR]   TestKafkaSink.testStaticTopic:214->checkMessageArrived:195 No message 
matches static-topic-test
[INFO] 
[ERROR] Tests run: 18, Failures: 1, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Build Support .. SUCCESS [  0.555 s]
[INFO] Apache Flume 1.10.0-SNAPSHOT ... SUCCESS [  0.751 s]
[INFO] Flume NG SDK ... SUCCESS [07:24 min]
[INFO] Flume NG Hadoop Credential Store Config Filter . SUCCESS [  0.058 s]
[INFO] Flume NG Config Filters API  SUCCESS [  0.302 s]
[INFO] Flume NG Configuration . SUCCESS [  5.684 s]
[INFO] Flume Auth . SUCCESS [  6.836 s]
[INFO] Flume NG Core .. SUCCESS [13:39 min]
[INFO] Flume NG Sinks . SUCCESS [  0.036 s]
[INFO] Flume NG HDFS Sink . SKIPPED
[INFO] Flume NG IRC Sink .. SUCCESS [ 12.079 s]
[INFO] Flume NG Channels .. SUCCESS [  0.060 s]
[INFO] Flume NG JDBC channel .. SUCCESS [ 53.409 s]
[INFO] Flume NG file-based channel  SKIPPED
[INFO] Flume NG Spillable Memory channel .. SKIPPED
[INFO] Flume NG Node .. SKIPPED
[INFO] Flume NG Embedded Agent  SKIPPED
[INFO] Flume NG HBase Sink  SKIPPED
[INFO] Flume NG HBase2 Sink ... SUCCESS [01:48 min]
[INFO] Flume NG ElasticSearch Sink  SUCCESS [01:28 min]
[INFO] Flume NG Morphline Solr Sink ... SUCCESS [ 43.347 s]
[INFO] Flume Shared Utils . SUCCESS [  0.056 s]
[INFO] Flume Shared Kafka . SUCCESS [  1.046 s]
[INFO] Flume Shared Kafka Test Utils .. SUCCESS [ 11.672 s]
[INFO] Flume Kafka Sink ... FAILURE [02:11 min]
[INFO] Flume HTTP/S Sink .. SUCCESS [ 31.688 s]
[INFO] Flume NG Hive Sink . SUCCESS [01:13 min]
[INFO] Flume Sources .. SUCCESS [  0.035 s]
[INFO] Flume Scribe Source  SUCCESS [ 18.953 s]
[INFO] Flume JMS Source ... SUCCESS [01:06 min]
[INFO] Flume Twitter Source ... SUCCESS [ 11.377 s]
[INFO] Flume Kafka Source . SKIPPED
[INFO] Flume Taildir Source ... SUCCESS [ 29.437 s]
[INFO] flume-kafka-channel  SKIPPED
[INFO] Flume legacy Sources ... SUCCESS [  0.061 s]
[INFO] Flume legacy Avro source ... SUCCESS [ 18.226 s]
[INFO] Flume legacy Thrift Source . SUCCESS [ 13.142 s]
[INFO] Flume NG Environment Variable Config Filter  SUCCESS [  1.492 s]
[INFO] flume-ng-hadoop-credential-store-config-filter . SUCCESS [  3.651 s]
[INFO] Flume NG External Process Config Filter  SUCCESS [  1.507 s]
[INFO] Flume NG Clients ... SUCCESS [  0.058 s]
[INFO] Flume NG Log4j Appender  SKIP

Re: Flume failing travis builds

2022-01-26 Thread Apache
I just noticed that the Travis builds do not rebuild the whole project. That is 
definitely the reason for one of the build failures and probably both. As it 
stands these builds are worthless.

Ralph

> On Jan 26, 2022, at 1:24 AM, Ralph Goers  wrote:
> 
> The last change to ExecSource was in June 2018. 
> 
> I am not sure that what Travis is picking up is valid.  The problem I am 
> having 
> with TestKafkaSink is that I have modified the validation method such that 
> every 
> assertion error should generate a custom message. Yet none do. And, in fact, 
> in the code path it should be following it should never perform an 
> assertEquals 
> where expected and actual should have a value of 10.
> 
> The second set of test failures are all because the test expects the local 
> hostname 
> to be either “localhost” or “127.0.0.1”. Instead, it is getting a value of 
> ip6-localhost. 
> This is despite my having configured the surefire plugin to be configured 
> with 
> java.net.preferIPv4Stack=true.
> 
> There are two more errors in TestReliableSpoolingEventReader. For this it 
> looks 
> like events somehow occur in a different sequence on the Travis server then 
> they 
> do on my Mac and my Linux VM. Note that this is also a class that was not 
> touched.
> 
> Ralph
> 
>> On Jan 25, 2022, at 3:03 PM, Tristan Stevens  wrote:
>> 
>> Thanks Ralph. Seems the unit tests are picking up valid problems, which is
>> reassuring. Curious about execsource although I've got a feeling that did
>> change since the last release?
>> 
>> Tristan
>> 
>> 
>> 
>> 
>> 
>>> On Tue, 25 Jan 2022, 19:03 Ralph Goers,  wrote:
>>> 
>>> There seem to be two builds running and both fail but fail in different
>>> places.
>>> 
>>> The first build seems to be failing in a way it shouldn’t. The test is for
>>> not specifying any Kafka partitions.
>>> The behavior of how Kafka handles this changed in version 2.4 so it should
>>> only be checking to see if it
>>> received all the evants, but it appears it is somehow in the logic to
>>> check that all the partitions have an
>>> equal number of events. I’ve added more info into the assert message to
>>> help diagnose this.
>>> 
>>> The second build is failing in changes I just made to upgrade Netty &
>>> Avro. It appears to be failing
>>> checking the local host name. I will have to add some info to the error to
>>> determine what it getting for a
>>> hostname.
>>> 
>>> I then ran the build in an Ubuntu VM on my MacBook and it got an error in
>>> TestExecSource (which hasn’t
>>> been changed). It seems it is calling process.waitFor() and getting a
>>> returned value of 1. I changed the
>>> test to call waitFor before calling destroy and it passed. It then failed
>>> in TestFileChannelRestart giving me
>>> IOExceptions saying the checkpoint hadn’t completed and the checkpoint
>>> interval should be increased.
>>> I added logic to retry in this situation but there is a unit test that
>>> tries to force that error so I had to have
>>> it bypass the fix in that case.
>>> 
>>> I committed those changes and will look at the results of the next Travis
>>> build to see what additional info
>>> it can provide.
>>> 
>>> Ralph
>>> 
>>> 
 On Jan 24, 2022, at 12:18 AM, Tristan Stevens 
>>> wrote:
 
 Hi all,
 It seems that for some reason the Travis builds are failing again. One
>>> of them has been since the Log4j and SLF4J bump (odd) and the other since
>>> the Kafka upgrade.
 
 Anybody got some cycles in investigate whether these are just flaky
>>> tests and/or whether there’s something more sinister in there?
 
 Thanks
 Tristan
 
>>> 
>>> 
> 



Re: Flume failing travis builds

2022-01-26 Thread Ralph Goers
The last change to ExecSource was in June 2018. 

I am not sure that what Travis is picking up is valid.  The problem I am having 
with TestKafkaSink is that I have modified the validation method such that 
every 
assertion error should generate a custom message. Yet none do. And, in fact, 
in the code path it should be following it should never perform an assertEquals 
where expected and actual should have a value of 10.

The second set of test failures are all because the test expects the local 
hostname 
to be either “localhost” or “127.0.0.1”. Instead, it is getting a value of 
ip6-localhost. 
This is despite my having configured the surefire plugin to be configured with 
java.net.preferIPv4Stack=true.

There are two more errors in TestReliableSpoolingEventReader. For this it looks 
like events somehow occur in a different sequence on the Travis server then 
they 
do on my Mac and my Linux VM. Note that this is also a class that was not 
touched.

Ralph

> On Jan 25, 2022, at 3:03 PM, Tristan Stevens  wrote:
> 
> Thanks Ralph. Seems the unit tests are picking up valid problems, which is
> reassuring. Curious about execsource although I've got a feeling that did
> change since the last release?
> 
> Tristan
> 
> 
> 
> 
> 
> On Tue, 25 Jan 2022, 19:03 Ralph Goers,  wrote:
> 
>> There seem to be two builds running and both fail but fail in different
>> places.
>> 
>> The first build seems to be failing in a way it shouldn’t. The test is for
>> not specifying any Kafka partitions.
>> The behavior of how Kafka handles this changed in version 2.4 so it should
>> only be checking to see if it
>> received all the evants, but it appears it is somehow in the logic to
>> check that all the partitions have an
>> equal number of events. I’ve added more info into the assert message to
>> help diagnose this.
>> 
>> The second build is failing in changes I just made to upgrade Netty &
>> Avro. It appears to be failing
>> checking the local host name. I will have to add some info to the error to
>> determine what it getting for a
>> hostname.
>> 
>> I then ran the build in an Ubuntu VM on my MacBook and it got an error in
>> TestExecSource (which hasn’t
>> been changed). It seems it is calling process.waitFor() and getting a
>> returned value of 1. I changed the
>> test to call waitFor before calling destroy and it passed. It then failed
>> in TestFileChannelRestart giving me
>> IOExceptions saying the checkpoint hadn’t completed and the checkpoint
>> interval should be increased.
>> I added logic to retry in this situation but there is a unit test that
>> tries to force that error so I had to have
>> it bypass the fix in that case.
>> 
>> I committed those changes and will look at the results of the next Travis
>> build to see what additional info
>> it can provide.
>> 
>> Ralph
>> 
>> 
>>> On Jan 24, 2022, at 12:18 AM, Tristan Stevens 
>> wrote:
>>> 
>>> Hi all,
>>> It seems that for some reason the Travis builds are failing again. One
>> of them has been since the Log4j and SLF4J bump (odd) and the other since
>> the Kafka upgrade.
>>> 
>>> Anybody got some cycles in investigate whether these are just flaky
>> tests and/or whether there’s something more sinister in there?
>>> 
>>> Thanks
>>> Tristan
>>> 
>> 
>> 



Re: Flume failing travis builds

2022-01-25 Thread Tristan Stevens
Thanks Ralph. Seems the unit tests are picking up valid problems, which is
reassuring. Curious about execsource although I've got a feeling that did
change since the last release?

Tristan





On Tue, 25 Jan 2022, 19:03 Ralph Goers,  wrote:

> There seem to be two builds running and both fail but fail in different
> places.
>
> The first build seems to be failing in a way it shouldn’t. The test is for
> not specifying any Kafka partitions.
> The behavior of how Kafka handles this changed in version 2.4 so it should
> only be checking to see if it
> received all the evants, but it appears it is somehow in the logic to
> check that all the partitions have an
> equal number of events. I’ve added more info into the assert message to
> help diagnose this.
>
> The second build is failing in changes I just made to upgrade Netty &
> Avro. It appears to be failing
> checking the local host name. I will have to add some info to the error to
> determine what it getting for a
> hostname.
>
> I then ran the build in an Ubuntu VM on my MacBook and it got an error in
> TestExecSource (which hasn’t
> been changed). It seems it is calling process.waitFor() and getting a
> returned value of 1. I changed the
> test to call waitFor before calling destroy and it passed. It then failed
> in TestFileChannelRestart giving me
> IOExceptions saying the checkpoint hadn’t completed and the checkpoint
> interval should be increased.
> I added logic to retry in this situation but there is a unit test that
> tries to force that error so I had to have
>  it bypass the fix in that case.
>
> I committed those changes and will look at the results of the next Travis
> build to see what additional info
> it can provide.
>
> Ralph
>
>
> > On Jan 24, 2022, at 12:18 AM, Tristan Stevens 
> wrote:
> >
> > Hi all,
> > It seems that for some reason the Travis builds are failing again. One
> of them has been since the Log4j and SLF4J bump (odd) and the other since
> the Kafka upgrade.
> >
> > Anybody got some cycles in investigate whether these are just flaky
> tests and/or whether there’s something more sinister in there?
> >
> > Thanks
> > Tristan
> >
>
>


Re: Flume failing travis builds

2022-01-25 Thread Ralph Goers
There seem to be two builds running and both fail but fail in different places. 

The first build seems to be failing in a way it shouldn’t. The test is for not 
specifying any Kafka partitions. 
The behavior of how Kafka handles this changed in version 2.4 so it should only 
be checking to see if it 
received all the evants, but it appears it is somehow in the logic to check 
that all the partitions have an 
equal number of events. I’ve added more info into the assert message to help 
diagnose this.

The second build is failing in changes I just made to upgrade Netty & Avro. It 
appears to be failing 
checking the local host name. I will have to add some info to the error to 
determine what it getting for a 
hostname.

I then ran the build in an Ubuntu VM on my MacBook and it got an error in 
TestExecSource (which hasn’t 
been changed). It seems it is calling process.waitFor() and getting a returned 
value of 1. I changed the 
test to call waitFor before calling destroy and it passed. It then failed in 
TestFileChannelRestart giving me 
IOExceptions saying the checkpoint hadn’t completed and the checkpoint interval 
should be increased. 
I added logic to retry in this situation but there is a unit test that tries to 
force that error so I had to have
 it bypass the fix in that case.

I committed those changes and will look at the results of the next Travis build 
to see what additional info 
it can provide.

Ralph


> On Jan 24, 2022, at 12:18 AM, Tristan Stevens  wrote:
> 
> Hi all,
> It seems that for some reason the Travis builds are failing again. One of 
> them has been since the Log4j and SLF4J bump (odd) and the other since the 
> Kafka upgrade.
> 
> Anybody got some cycles in investigate whether these are just flaky tests 
> and/or whether there’s something more sinister in there?
> 
> Thanks
> Tristan
> 



Re: Flume failing travis builds

2022-01-24 Thread Ralph Goers
I will take a look at them since I am actively working on Flume. However, I can 
say that I have low confidence in those builds. Log4j has been having problems 
with its builds. Frequently it seems the problem is that the build is 
successful for a module but the build fails because the tooling can’t read the 
files produced by surefire.

Ralph

> On Jan 24, 2022, at 12:18 AM, Tristan Stevens  wrote:
> 
> Hi all,
> It seems that for some reason the Travis builds are failing again. One of 
> them has been since the Log4j and SLF4J bump (odd) and the other since the 
> Kafka upgrade.
> 
> Anybody got some cycles in investigate whether these are just flaky tests 
> and/or whether there’s something more sinister in there?
> 
> Thanks
> Tristan
> 



Flume failing travis builds

2022-01-23 Thread Tristan Stevens
Hi all,
It seems that for some reason the Travis builds are failing again. One of them 
has been since the Log4j and SLF4J bump (odd) and the other since the Kafka 
upgrade.

Anybody got some cycles in investigate whether these are just flaky tests 
and/or whether there’s something more sinister in there?

Thanks
Tristan