Re: Problems with E2E test

2021-01-20 Thread Brian Hulette
Sorry this was my mistake when I reviewed the PR last week.

I suggested renaming this new E2E test to *IT since it looks like an
integration test. This means that the test will run as part of the "Java
Examples Dataflow" PreCommit. However the test is also using a local fake
of PubSub, which won't work when running with a distributed runner like
Dataflow.

We could just keep this as a *Test and make sure it's running with the
DirectRunner in a Jenkins job.

Brian

On Mon, Jan 18, 2021 at 5:31 PM Boyuan Zhang  wrote:

> It does seem like the Dataflow will do some validation around PubSub
> params before actually creating the pipeline. That's fair for Dataflow
> because Dataflow will swap the PubSubIO from beam implementation into
> Dataflow native one.
>
> I think if you really want to run your virtual PubSub with Dataflow, you
> need to try out --experiments=enable_custom_pubsub_sink to enforce Dataflow
> not to do the override.
>
> Would you like to share your job id thus we can verify the failure. Also
> I'm not sure about the motivation to test it against Dataflow. Would you
> like to elaborate more on that?
>
>
> On Mon, Jan 18, 2021 at 3:51 AM Ramazan Yapparov <
> ramazan.yappa...@akvelon.com> wrote:
>
>> Hi Beam!
>> We've been writing E2E test for KafkaToPubsub example pipeline. Instead
>> of depending on some real
>> Cloud Pubsub and Kafka instances we decided to use Testcontainers.
>> We launch Kafka and PubSub Emulator containers and after that we pass
>> containers urls into pipeline options and run the pipeline.
>> During PR review we received a request for turning this test into IT so
>> it would run in Dataflow Runner
>> instead of Direct Runner.
>> Trying to do so, we've ran into some troubles with that:
>> 1. While running the test all Docker containers start at the machine
>> where the test is running,
>>so in order for this test to work properly dataflow job should be able
>> to reach test-runner machine by a public IP.
>>I certainly can't do it on my local machine, not sure how it will
>> behave when running in CI environment.
>> 2. When we pass our fake PubSub url into the dataflow job we receive
>> following error:
>> json
>> {
>>   "code" : 400,
>>   "errors" : [ {
>> "domain" : "global",
>> "message" : "(f214233f9dbe6968): The workflow could not be created.
>> Causes: (f214233f9dbe6719): http://localhost:49169 is not a valid
>> Pub/Sub URL.",
>> "reason" : "badRequest"
>>   } ],
>>   "message" : "(f214233f9dbe6968): The workflow could not be created.
>> Causes: (f214233f9dbe6719): http://localhost:49169 is not a valid
>> Pub/Sub URL.",
>>   "status" : "INVALID_ARGUMENT"
>> }
>>
>> Not sure how this can be avoided, looks like the job will only accept the
>> real Cloud PubSub url.
>> It would be great if you share some thoughts or any suggestions how it
>> can be solved!
>>
>>


Re: Problems with E2E test

2021-01-18 Thread Boyuan Zhang
It does seem like the Dataflow will do some validation around PubSub params
before actually creating the pipeline. That's fair for Dataflow because
Dataflow will swap the PubSubIO from beam implementation into Dataflow
native one.

I think if you really want to run your virtual PubSub with Dataflow, you
need to try out --experiments=enable_custom_pubsub_sink to enforce Dataflow
not to do the override.

Would you like to share your job id thus we can verify the failure. Also
I'm not sure about the motivation to test it against Dataflow. Would you
like to elaborate more on that?


On Mon, Jan 18, 2021 at 3:51 AM Ramazan Yapparov <
ramazan.yappa...@akvelon.com> wrote:

> Hi Beam!
> We've been writing E2E test for KafkaToPubsub example pipeline. Instead of
> depending on some real
> Cloud Pubsub and Kafka instances we decided to use Testcontainers.
> We launch Kafka and PubSub Emulator containers and after that we pass
> containers urls into pipeline options and run the pipeline.
> During PR review we received a request for turning this test into IT so it
> would run in Dataflow Runner
> instead of Direct Runner.
> Trying to do so, we've ran into some troubles with that:
> 1. While running the test all Docker containers start at the machine where
> the test is running,
>so in order for this test to work properly dataflow job should be able
> to reach test-runner machine by a public IP.
>I certainly can't do it on my local machine, not sure how it will
> behave when running in CI environment.
> 2. When we pass our fake PubSub url into the dataflow job we receive
> following error:
> json
> {
>   "code" : 400,
>   "errors" : [ {
> "domain" : "global",
> "message" : "(f214233f9dbe6968): The workflow could not be created.
> Causes: (f214233f9dbe6719): http://localhost:49169 is not a valid Pub/Sub
> URL.",
> "reason" : "badRequest"
>   } ],
>   "message" : "(f214233f9dbe6968): The workflow could not be created.
> Causes: (f214233f9dbe6719): http://localhost:49169 is not a valid Pub/Sub
> URL.",
>   "status" : "INVALID_ARGUMENT"
> }
>
> Not sure how this can be avoided, looks like the job will only accept the
> real Cloud PubSub url.
> It would be great if you share some thoughts or any suggestions how it can
> be solved!
>
>


Problems with E2E test

2021-01-18 Thread Ramazan Yapparov
Hi Beam!
We've been writing E2E test for KafkaToPubsub example pipeline. Instead of 
depending on some real
Cloud Pubsub and Kafka instances we decided to use Testcontainers.
We launch Kafka and PubSub Emulator containers and after that we pass 
containers urls into pipeline options and run the pipeline.
During PR review we received a request for turning this test into IT so it 
would run in Dataflow Runner
instead of Direct Runner.
Trying to do so, we've ran into some troubles with that:
1. While running the test all Docker containers start at the machine where the 
test is running,
   so in order for this test to work properly dataflow job should be able to 
reach test-runner machine by a public IP.
   I certainly can't do it on my local machine, not sure how it will behave 
when running in CI environment.
2. When we pass our fake PubSub url into the dataflow job we receive following 
error:
json
{
  "code" : 400,
  "errors" : [ {
"domain" : "global",
"message" : "(f214233f9dbe6968): The workflow could not be created. Causes: 
(f214233f9dbe6719): http://localhost:49169 is not a valid Pub/Sub URL.",
"reason" : "badRequest"
  } ],
  "message" : "(f214233f9dbe6968): The workflow could not be created. Causes: 
(f214233f9dbe6719): http://localhost:49169 is not a valid Pub/Sub URL.",
  "status" : "INVALID_ARGUMENT"
}

Not sure how this can be avoided, looks like the job will only accept the real 
Cloud PubSub url.
It would be great if you share some thoughts or any suggestions how it can be 
solved!