Re: Guidance for Integration Tests with External Technologies

2021-05-19 Thread Yun Gao
Hi Rion,

Do you mean you are running the tests directly in the IDE like Idea  for 
"multiple tests run in sequence" ?
If the test could be successful when running separately, but would fail when 
running in sequence, then
it seems there other tests should still infect on the failed tests.

For the consume failure, is there errors or the flink just fetchs no data? From 
the description I reckon 
the failure might not related to the flink job structure, if we have 
suspections on this point, could we
change it to a simpler job first to see if the fetched kafka records satisfy 
expection ? 

Best,
Yun



 --Original Mail --
Sender:Rion Williams 
Send Date:Wed May 19 07:14:05 2021
Recipients:user 
Subject:Guidance for Integration Tests with External Technologies
Hey all,

I’ve been taking a very TDD-oriented approach to developing many of the Flink 
apps I’ve worked on, but recently I’ve encountered a problem that has me 
scratching my head.

A majority of my integration tests leverage a few external technologies such as 
Kafka and typically a relational database like Postgres. I’ve found 
in-memory/embedded versions of these that have worked well in the past to allow 
me to:

- send messages into a kafka topic
- run my exact Flink job asynchronously 
- verify my results / assertions in Postgres via awaitility

Recently, I had a use case for Broadcast state for a job and found that my 
tests would run successfully when executed directly but multiple tests run in 
sequence (in the same file), it seems that Flink would fail to consume from the 
topics and eventually fail the assertion. 

I’ve tried several approaches including:
- ensuring that each Flink job is passed a unique consumer.id / group.id / 
application.id
- ensuring each test has brand new Kafka topics specific for it
- spinning up a new Flink cluster / Kafka cluster / Postgres instance per test

I’m not entirely sure what could be causing the problem but it only occurs for 
Flink jobs that read from two topics and leverage broadcast state. All other 
integration tests that use Kafka/Flink/Postgres still pass and can be run in 
sequence.

Any advice / examples / recommendations would be helpful. l’d be happy to 
elaborate and provide code whenever possible as well.

Thanks,

Rion

Guidance for Integration Tests with External Technologies

2021-05-18 Thread Rion Williams
Hey all,

I’ve been taking a very TDD-oriented approach to developing many of the Flink 
apps I’ve worked on, but recently I’ve encountered a problem that has me 
scratching my head.

A majority of my integration tests leverage a few external technologies such as 
Kafka and typically a relational database like Postgres. I’ve found 
in-memory/embedded versions of these that have worked well in the past to allow 
me to:

- send messages into a kafka topic
- run my exact Flink job asynchronously 
- verify my results / assertions in Postgres via awaitility

Recently, I had a use case for Broadcast state for a job and found that my 
tests would run successfully when executed directly but multiple tests run in 
sequence (in the same file), it seems that Flink would fail to consume from the 
topics and eventually fail the assertion. 

I’ve tried several approaches including:
- ensuring that each Flink job is passed a unique consumer.id / group.id / 
application.id
- ensuring each test has brand new Kafka topics specific for it
- spinning up a new Flink cluster / Kafka cluster / Postgres instance per test

I’m not entirely sure what could be causing the problem but it only occurs for 
Flink jobs that read from two topics and leverage broadcast state. All other 
integration tests that use Kafka/Flink/Postgres still pass and can be run in 
sequence.

Any advice / examples / recommendations would be helpful. l’d be happy to 
elaborate and provide code whenever possible as well.

Thanks,

Rion