Re: Querying Dataflow job status via Java SDK

2020-10-12 Thread Peter Littig
Thanks for the link, Steve - very helpful!

On Mon, Oct 12, 2020 at 11:31 AM Steve Niemitz  wrote:

> This is what I was referencing:
> https://github.com/googleapis/google-api-java-client-services/tree/master/clients/google-api-services-dataflow/v1b3
>
>
>
>
> On Mon, Oct 12, 2020 at 2:23 PM Peter Littig 
> wrote:
>
>> Thanks for the replies, Lukasz and Steve!
>>
>> Steve: do you have a link to the google client api wrappers (I'm not sure
>> if I know what they are.)
>>
>> Thank you!
>>
>> On Mon, Oct 12, 2020 at 11:04 AM Steve Niemitz 
>> wrote:
>>
>>> We use the Dataflow API [1] directly, via the google api client wrappers
>>> (both python and java), pretty extensively.  It works well and doesn't
>>> require a dependency on beam.
>>>
>>> [1] https://cloud.google.com/dataflow/docs/reference/rest
>>>
>>> On Mon, Oct 12, 2020 at 1:56 PM Luke Cwik  wrote:
>>>
 It is your best way to do this right now and this hasn't changed in a
 while (region was added to project and job ids in the past 6 years).

 On Mon, Oct 12, 2020 at 10:53 AM Peter Littig 
 wrote:

> Thanks for the reply, Kyle.
>
> The DataflowClient::getJob method uses a Dataflow instance that's
> provided at construction time (via
> DataflowPipelineOptions::getDataflowClient). If that Dataflow instance can
> be obtained from a minimal instance of the options (i.e., containing only
> the project ID and region) then it looks like everything should work.
>
> I suppose a secondary question here is whether or not this approach is
> the recommended way to solve my problem (but I don't know of any
> alternatives).
>
> On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver 
> wrote:
>
>> > I think the answer is to use a DataflowClient in the second
>> service, but creating one requires DataflowPipelineOptions. Are these
>> options supposed to be exactly the same as those used by the first 
>> service?
>> Or do only some of the fields have to be the same?
>>
>> Most options are not necessary for retrieving a job. In general,
>> Dataflow jobs can always be uniquely identified by the project, region 
>> and
>> job ID.
>> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100
>>
>> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig 
>> wrote:
>>
>>> Hello, Beam users!
>>>
>>> Suppose I want to build two (Java) services, one that launches
>>> (long-running) dataflow jobs, and the other that monitors the status of
>>> dataflow jobs. Within a single service, I could simply track a
>>> PipelineResult for each dataflow run and periodically call getState. How
>>> can I monitor job status like this from a second, independent service?
>>>
>>> I think the answer is to use a DataflowClient in the second service,
>>> but creating one requires DataflowPipelineOptions. Are these options
>>> supposed to be exactly the same as those used by the first service? Or 
>>> do
>>> only some of the fields have to be the same?
>>>
>>> Or maybe there's a better alternative than DataflowClient?
>>>
>>> Thanks in advance!
>>>
>>> Peter
>>>
>>


Re: Querying Dataflow job status via Java SDK

2020-10-12 Thread Steve Niemitz
This is what I was referencing:
https://github.com/googleapis/google-api-java-client-services/tree/master/clients/google-api-services-dataflow/v1b3




On Mon, Oct 12, 2020 at 2:23 PM Peter Littig 
wrote:

> Thanks for the replies, Lukasz and Steve!
>
> Steve: do you have a link to the google client api wrappers (I'm not sure
> if I know what they are.)
>
> Thank you!
>
> On Mon, Oct 12, 2020 at 11:04 AM Steve Niemitz 
> wrote:
>
>> We use the Dataflow API [1] directly, via the google api client wrappers
>> (both python and java), pretty extensively.  It works well and doesn't
>> require a dependency on beam.
>>
>> [1] https://cloud.google.com/dataflow/docs/reference/rest
>>
>> On Mon, Oct 12, 2020 at 1:56 PM Luke Cwik  wrote:
>>
>>> It is your best way to do this right now and this hasn't changed in a
>>> while (region was added to project and job ids in the past 6 years).
>>>
>>> On Mon, Oct 12, 2020 at 10:53 AM Peter Littig 
>>> wrote:
>>>
 Thanks for the reply, Kyle.

 The DataflowClient::getJob method uses a Dataflow instance that's
 provided at construction time (via
 DataflowPipelineOptions::getDataflowClient). If that Dataflow instance can
 be obtained from a minimal instance of the options (i.e., containing only
 the project ID and region) then it looks like everything should work.

 I suppose a secondary question here is whether or not this approach is
 the recommended way to solve my problem (but I don't know of any
 alternatives).

 On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver 
 wrote:

> > I think the answer is to use a DataflowClient in the second service,
> but creating one requires DataflowPipelineOptions. Are these options
> supposed to be exactly the same as those used by the first service? Or do
> only some of the fields have to be the same?
>
> Most options are not necessary for retrieving a job. In general,
> Dataflow jobs can always be uniquely identified by the project, region and
> job ID.
> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100
>
> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig 
> wrote:
>
>> Hello, Beam users!
>>
>> Suppose I want to build two (Java) services, one that launches
>> (long-running) dataflow jobs, and the other that monitors the status of
>> dataflow jobs. Within a single service, I could simply track a
>> PipelineResult for each dataflow run and periodically call getState. How
>> can I monitor job status like this from a second, independent service?
>>
>> I think the answer is to use a DataflowClient in the second service,
>> but creating one requires DataflowPipelineOptions. Are these options
>> supposed to be exactly the same as those used by the first service? Or do
>> only some of the fields have to be the same?
>>
>> Or maybe there's a better alternative than DataflowClient?
>>
>> Thanks in advance!
>>
>> Peter
>>
>


Re: Querying Dataflow job status via Java SDK

2020-10-12 Thread Peter Littig
Thanks for the replies, Lukasz and Steve!

Steve: do you have a link to the google client api wrappers (I'm not sure
if I know what they are.)

Thank you!

On Mon, Oct 12, 2020 at 11:04 AM Steve Niemitz  wrote:

> We use the Dataflow API [1] directly, via the google api client wrappers
> (both python and java), pretty extensively.  It works well and doesn't
> require a dependency on beam.
>
> [1] https://cloud.google.com/dataflow/docs/reference/rest
>
> On Mon, Oct 12, 2020 at 1:56 PM Luke Cwik  wrote:
>
>> It is your best way to do this right now and this hasn't changed in a
>> while (region was added to project and job ids in the past 6 years).
>>
>> On Mon, Oct 12, 2020 at 10:53 AM Peter Littig 
>> wrote:
>>
>>> Thanks for the reply, Kyle.
>>>
>>> The DataflowClient::getJob method uses a Dataflow instance that's
>>> provided at construction time (via
>>> DataflowPipelineOptions::getDataflowClient). If that Dataflow instance can
>>> be obtained from a minimal instance of the options (i.e., containing only
>>> the project ID and region) then it looks like everything should work.
>>>
>>> I suppose a secondary question here is whether or not this approach is
>>> the recommended way to solve my problem (but I don't know of any
>>> alternatives).
>>>
>>> On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver  wrote:
>>>
 > I think the answer is to use a DataflowClient in the second service,
 but creating one requires DataflowPipelineOptions. Are these options
 supposed to be exactly the same as those used by the first service? Or do
 only some of the fields have to be the same?

 Most options are not necessary for retrieving a job. In general,
 Dataflow jobs can always be uniquely identified by the project, region and
 job ID.
 https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100

 On Mon, Oct 12, 2020 at 9:31 AM Peter Littig 
 wrote:

> Hello, Beam users!
>
> Suppose I want to build two (Java) services, one that launches
> (long-running) dataflow jobs, and the other that monitors the status of
> dataflow jobs. Within a single service, I could simply track a
> PipelineResult for each dataflow run and periodically call getState. How
> can I monitor job status like this from a second, independent service?
>
> I think the answer is to use a DataflowClient in the second service,
> but creating one requires DataflowPipelineOptions. Are these options
> supposed to be exactly the same as those used by the first service? Or do
> only some of the fields have to be the same?
>
> Or maybe there's a better alternative than DataflowClient?
>
> Thanks in advance!
>
> Peter
>



Re: Querying Dataflow job status via Java SDK

2020-10-12 Thread Steve Niemitz
We use the Dataflow API [1] directly, via the google api client wrappers
(both python and java), pretty extensively.  It works well and doesn't
require a dependency on beam.

[1] https://cloud.google.com/dataflow/docs/reference/rest

On Mon, Oct 12, 2020 at 1:56 PM Luke Cwik  wrote:

> It is your best way to do this right now and this hasn't changed in a
> while (region was added to project and job ids in the past 6 years).
>
> On Mon, Oct 12, 2020 at 10:53 AM Peter Littig 
> wrote:
>
>> Thanks for the reply, Kyle.
>>
>> The DataflowClient::getJob method uses a Dataflow instance that's
>> provided at construction time (via
>> DataflowPipelineOptions::getDataflowClient). If that Dataflow instance can
>> be obtained from a minimal instance of the options (i.e., containing only
>> the project ID and region) then it looks like everything should work.
>>
>> I suppose a secondary question here is whether or not this approach is
>> the recommended way to solve my problem (but I don't know of any
>> alternatives).
>>
>> On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver  wrote:
>>
>>> > I think the answer is to use a DataflowClient in the second service,
>>> but creating one requires DataflowPipelineOptions. Are these options
>>> supposed to be exactly the same as those used by the first service? Or do
>>> only some of the fields have to be the same?
>>>
>>> Most options are not necessary for retrieving a job. In general,
>>> Dataflow jobs can always be uniquely identified by the project, region and
>>> job ID.
>>> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100
>>>
>>> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig 
>>> wrote:
>>>
 Hello, Beam users!

 Suppose I want to build two (Java) services, one that launches
 (long-running) dataflow jobs, and the other that monitors the status of
 dataflow jobs. Within a single service, I could simply track a
 PipelineResult for each dataflow run and periodically call getState. How
 can I monitor job status like this from a second, independent service?

 I think the answer is to use a DataflowClient in the second service,
 but creating one requires DataflowPipelineOptions. Are these options
 supposed to be exactly the same as those used by the first service? Or do
 only some of the fields have to be the same?

 Or maybe there's a better alternative than DataflowClient?

 Thanks in advance!

 Peter

>>>


Re: Querying Dataflow job status via Java SDK

2020-10-12 Thread Luke Cwik
It is your best way to do this right now and this hasn't changed in a while
(region was added to project and job ids in the past 6 years).

On Mon, Oct 12, 2020 at 10:53 AM Peter Littig 
wrote:

> Thanks for the reply, Kyle.
>
> The DataflowClient::getJob method uses a Dataflow instance that's provided
> at construction time (via DataflowPipelineOptions::getDataflowClient). If
> that Dataflow instance can be obtained from a minimal instance of the
> options (i.e., containing only the project ID and region) then it looks
> like everything should work.
>
> I suppose a secondary question here is whether or not this approach is the
> recommended way to solve my problem (but I don't know of any alternatives).
>
> On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver  wrote:
>
>> > I think the answer is to use a DataflowClient in the second service,
>> but creating one requires DataflowPipelineOptions. Are these options
>> supposed to be exactly the same as those used by the first service? Or do
>> only some of the fields have to be the same?
>>
>> Most options are not necessary for retrieving a job. In general, Dataflow
>> jobs can always be uniquely identified by the project, region and job ID.
>> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100
>>
>> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig 
>> wrote:
>>
>>> Hello, Beam users!
>>>
>>> Suppose I want to build two (Java) services, one that launches
>>> (long-running) dataflow jobs, and the other that monitors the status of
>>> dataflow jobs. Within a single service, I could simply track a
>>> PipelineResult for each dataflow run and periodically call getState. How
>>> can I monitor job status like this from a second, independent service?
>>>
>>> I think the answer is to use a DataflowClient in the second service, but
>>> creating one requires DataflowPipelineOptions. Are these options supposed
>>> to be exactly the same as those used by the first service? Or do only some
>>> of the fields have to be the same?
>>>
>>> Or maybe there's a better alternative than DataflowClient?
>>>
>>> Thanks in advance!
>>>
>>> Peter
>>>
>>


Re: Querying Dataflow job status via Java SDK

2020-10-12 Thread Peter Littig
Thanks for the reply, Kyle.

The DataflowClient::getJob method uses a Dataflow instance that's provided
at construction time (via DataflowPipelineOptions::getDataflowClient). If
that Dataflow instance can be obtained from a minimal instance of the
options (i.e., containing only the project ID and region) then it looks
like everything should work.

I suppose a secondary question here is whether or not this approach is the
recommended way to solve my problem (but I don't know of any alternatives).

On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver  wrote:

> > I think the answer is to use a DataflowClient in the second service, but
> creating one requires DataflowPipelineOptions. Are these options supposed
> to be exactly the same as those used by the first service? Or do only some
> of the fields have to be the same?
>
> Most options are not necessary for retrieving a job. In general, Dataflow
> jobs can always be uniquely identified by the project, region and job ID.
> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100
>
> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig 
> wrote:
>
>> Hello, Beam users!
>>
>> Suppose I want to build two (Java) services, one that launches
>> (long-running) dataflow jobs, and the other that monitors the status of
>> dataflow jobs. Within a single service, I could simply track a
>> PipelineResult for each dataflow run and periodically call getState. How
>> can I monitor job status like this from a second, independent service?
>>
>> I think the answer is to use a DataflowClient in the second service, but
>> creating one requires DataflowPipelineOptions. Are these options supposed
>> to be exactly the same as those used by the first service? Or do only some
>> of the fields have to be the same?
>>
>> Or maybe there's a better alternative than DataflowClient?
>>
>> Thanks in advance!
>>
>> Peter
>>
>


Re: Querying Dataflow job status via Java SDK

2020-10-12 Thread Kyle Weaver
> I think the answer is to use a DataflowClient in the second service, but
creating one requires DataflowPipelineOptions. Are these options supposed
to be exactly the same as those used by the first service? Or do only some
of the fields have to be the same?

Most options are not necessary for retrieving a job. In general, Dataflow
jobs can always be uniquely identified by the project, region and job ID.
https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100

On Mon, Oct 12, 2020 at 9:31 AM Peter Littig 
wrote:

> Hello, Beam users!
>
> Suppose I want to build two (Java) services, one that launches
> (long-running) dataflow jobs, and the other that monitors the status of
> dataflow jobs. Within a single service, I could simply track a
> PipelineResult for each dataflow run and periodically call getState. How
> can I monitor job status like this from a second, independent service?
>
> I think the answer is to use a DataflowClient in the second service, but
> creating one requires DataflowPipelineOptions. Are these options supposed
> to be exactly the same as those used by the first service? Or do only some
> of the fields have to be the same?
>
> Or maybe there's a better alternative than DataflowClient?
>
> Thanks in advance!
>
> Peter
>