Re: Pipeline Processing Time

2020-06-09 Thread Talat Uyarer
Thank you Luke and Reuven for helping me. Now I can see my pipeline
processing time for each record.

On Wed, Jun 3, 2020 at 9:25 AM Reuven Lax  wrote:

> Note: you need to tag the timestamp parameter to @ProcessElement with
> the @Timestamp annotation.
>
> On Mon, Jun 1, 2020 at 3:31 PM Luke Cwik  wrote:
>
>> You can configure KafkaIO to use some data from the record as the
>> elements timestamp. See the KafkaIO javadoc around the TimestampPolicy[1],
>> the default is current processing time.
>> You can access the timestamp of the element by adding
>> "org.joda.time.Instant timestamp" as a parameter to your @ProcessElement,
>> see this javadoc for additional details[2]. You could then compute now() -
>> timestamp to calculate processing time.
>>
>> 1:
>> https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/io/kafka/TimestampPolicy.html
>> 
>> 2:
>> https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html
>> 
>>
>> On Mon, Jun 1, 2020 at 2:00 PM Talat Uyarer 
>> wrote:
>>
>>> Sorry for the late response. Where does the beam set that timestamp
>>> field on element ? Is it set whenever KafkaIO reads that element ?
>>>
>> And also I have a windowing function on my pipeline. Does the timestamp
>>> field change for any kind of operation ? On pipeline I have the
>>> following steps: KafkaIO -> Format Conversion Pardo -> SQL Filter ->
>>> Windowing Step -> Custom Sink. If timestamp set in KafkaIO, Can I see
>>> process time by now() - timestamp in Custom Sink ?
>>>
>>>
>> Thanks
>>>
>>> On Thu, May 28, 2020 at 2:07 PM Luke Cwik  wrote:
>>>
 Dataflow provides msec counters for each transform that executes. You
 should be able to get them from stackdriver and see them from the Dataflow
 UI.

 You need to keep track of the timestamp of the element as it flows
 through the system as part of data that goes alongside the element. You can
 use the element's timestamp[1] if that makes sense (it might not if you
 intend to use a timestamp that is from the kafka record itself and the
 record's timestamp isn't the same as the ingestion timestamp). Unless you
 are writing your own sink, the sink won't track the processing time at all
 so you'll need to add a ParDo that goes right before it that writes the
 timing information to wherever you want (a counter, your own metrics
 database, logs, ...).

 1:
 https://github.com/apache/beam/blob/018e889829e300ab9f321da7e0010ff0011a73b1/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L257
 


 On Thu, May 28, 2020 at 1:12 PM Talat Uyarer <
 tuya...@paloaltonetworks.com> wrote:

> Yes I am trying to track how long it takes for a single element to be
> ingested into the pipeline until it is output somewhere.
>
> My pipeline is unbounded. I am using KafkaIO. I did not think about
> CPU time. if there is a way to track it too, it would be useful to improve
> my metrics.
>
> On Thu, May 28, 2020 at 12:52 PM Luke Cwik  wrote:
>
>> What do you mean by processing time?
>>
>> Are you trying to track how long it takes for a single element to be
>> ingested into the pipeline until it is output somewhere?
>> Do you have a bounded pipeline and want to know how long all the
>> processing takes?
>> Do you care about how much CPU time is being consumed in aggregate
>> for all the processing that your pipeline is doing?
>>
>>
>> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> I am using Dataflow Runner. The pipeline read from kafkaIO and send
>>> Http. I could not find any metadata field on the element to set first 
>>> read
>>> time.
>>>
>>> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver 
>>> wrote:
>>>
 Which runner are you using?

Re: Pipeline Processing Time

2020-06-03 Thread Reuven Lax
Note: you need to tag the timestamp parameter to @ProcessElement with
the @Timestamp annotation.

On Mon, Jun 1, 2020 at 3:31 PM Luke Cwik  wrote:

> You can configure KafkaIO to use some data from the record as the elements
> timestamp. See the KafkaIO javadoc around the TimestampPolicy[1], the
> default is current processing time.
> You can access the timestamp of the element by adding
> "org.joda.time.Instant timestamp" as a parameter to your @ProcessElement,
> see this javadoc for additional details[2]. You could then compute now() -
> timestamp to calculate processing time.
>
> 1:
> https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/io/kafka/TimestampPolicy.html
> 2:
> https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html
>
> On Mon, Jun 1, 2020 at 2:00 PM Talat Uyarer 
> wrote:
>
>> Sorry for the late response. Where does the beam set that timestamp field
>> on element ? Is it set whenever KafkaIO reads that element ?
>>
> And also I have a windowing function on my pipeline. Does the timestamp
>> field change for any kind of operation ? On pipeline I have the
>> following steps: KafkaIO -> Format Conversion Pardo -> SQL Filter ->
>> Windowing Step -> Custom Sink. If timestamp set in KafkaIO, Can I see
>> process time by now() - timestamp in Custom Sink ?
>>
>>
> Thanks
>>
>> On Thu, May 28, 2020 at 2:07 PM Luke Cwik  wrote:
>>
>>> Dataflow provides msec counters for each transform that executes. You
>>> should be able to get them from stackdriver and see them from the Dataflow
>>> UI.
>>>
>>> You need to keep track of the timestamp of the element as it flows
>>> through the system as part of data that goes alongside the element. You can
>>> use the element's timestamp[1] if that makes sense (it might not if you
>>> intend to use a timestamp that is from the kafka record itself and the
>>> record's timestamp isn't the same as the ingestion timestamp). Unless you
>>> are writing your own sink, the sink won't track the processing time at all
>>> so you'll need to add a ParDo that goes right before it that writes the
>>> timing information to wherever you want (a counter, your own metrics
>>> database, logs, ...).
>>>
>>> 1:
>>> https://github.com/apache/beam/blob/018e889829e300ab9f321da7e0010ff0011a73b1/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L257
>>> 
>>>
>>>
>>> On Thu, May 28, 2020 at 1:12 PM Talat Uyarer <
>>> tuya...@paloaltonetworks.com> wrote:
>>>
 Yes I am trying to track how long it takes for a single element to be
 ingested into the pipeline until it is output somewhere.

 My pipeline is unbounded. I am using KafkaIO. I did not think about CPU
 time. if there is a way to track it too, it would be useful to improve my
 metrics.

 On Thu, May 28, 2020 at 12:52 PM Luke Cwik  wrote:

> What do you mean by processing time?
>
> Are you trying to track how long it takes for a single element to be
> ingested into the pipeline until it is output somewhere?
> Do you have a bounded pipeline and want to know how long all the
> processing takes?
> Do you care about how much CPU time is being consumed in aggregate for
> all the processing that your pipeline is doing?
>
>
> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> I am using Dataflow Runner. The pipeline read from kafkaIO and send
>> Http. I could not find any metadata field on the element to set first 
>> read
>> time.
>>
>> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver 
>> wrote:
>>
>>> Which runner are you using?
>>>
>>> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer <
>>> tuya...@paloaltonetworks.com> wrote:
>>>
 Hi,

 I have a pipeline which has 5 steps. What is the best way to
 measure processing time for my pipeline?

 Thnaks

>>>


Re: Pipeline Processing Time

2020-06-01 Thread Luke Cwik
You can configure KafkaIO to use some data from the record as the elements
timestamp. See the KafkaIO javadoc around the TimestampPolicy[1], the
default is current processing time.
You can access the timestamp of the element by adding
"org.joda.time.Instant timestamp" as a parameter to your @ProcessElement,
see this javadoc for additional details[2]. You could then compute now() -
timestamp to calculate processing time.

1:
https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/io/kafka/TimestampPolicy.html
2:
https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html

On Mon, Jun 1, 2020 at 2:00 PM Talat Uyarer 
wrote:

> Sorry for the late response. Where does the beam set that timestamp field
> on element ? Is it set whenever KafkaIO reads that element ?
>
And also I have a windowing function on my pipeline. Does the timestamp
> field change for any kind of operation ? On pipeline I have the
> following steps: KafkaIO -> Format Conversion Pardo -> SQL Filter ->
> Windowing Step -> Custom Sink. If timestamp set in KafkaIO, Can I see
> process time by now() - timestamp in Custom Sink ?
>
>
Thanks
>
> On Thu, May 28, 2020 at 2:07 PM Luke Cwik  wrote:
>
>> Dataflow provides msec counters for each transform that executes. You
>> should be able to get them from stackdriver and see them from the Dataflow
>> UI.
>>
>> You need to keep track of the timestamp of the element as it flows
>> through the system as part of data that goes alongside the element. You can
>> use the element's timestamp[1] if that makes sense (it might not if you
>> intend to use a timestamp that is from the kafka record itself and the
>> record's timestamp isn't the same as the ingestion timestamp). Unless you
>> are writing your own sink, the sink won't track the processing time at all
>> so you'll need to add a ParDo that goes right before it that writes the
>> timing information to wherever you want (a counter, your own metrics
>> database, logs, ...).
>>
>> 1:
>> https://github.com/apache/beam/blob/018e889829e300ab9f321da7e0010ff0011a73b1/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L257
>> 
>>
>>
>> On Thu, May 28, 2020 at 1:12 PM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Yes I am trying to track how long it takes for a single element to be
>>> ingested into the pipeline until it is output somewhere.
>>>
>>> My pipeline is unbounded. I am using KafkaIO. I did not think about CPU
>>> time. if there is a way to track it too, it would be useful to improve my
>>> metrics.
>>>
>>> On Thu, May 28, 2020 at 12:52 PM Luke Cwik  wrote:
>>>
 What do you mean by processing time?

 Are you trying to track how long it takes for a single element to be
 ingested into the pipeline until it is output somewhere?
 Do you have a bounded pipeline and want to know how long all the
 processing takes?
 Do you care about how much CPU time is being consumed in aggregate for
 all the processing that your pipeline is doing?


 On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
 tuya...@paloaltonetworks.com> wrote:

> I am using Dataflow Runner. The pipeline read from kafkaIO and send
> Http. I could not find any metadata field on the element to set first read
> time.
>
> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver 
> wrote:
>
>> Which runner are you using?
>>
>> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Hi,
>>>
>>> I have a pipeline which has 5 steps. What is the best way to measure
>>> processing time for my pipeline?
>>>
>>> Thnaks
>>>
>>


Re: Pipeline Processing Time

2020-06-01 Thread Talat Uyarer
Sorry for the late response. Where does the beam set that timestamp field
on element ? Is it set whenever KafkaIO reads that element ? And also I
have a windowing function on my pipeline. Does the timestamp field change
for any kind of operation ? On pipeline I have the following steps: KafkaIO
-> Format Conversion Pardo -> SQL Filter -> Windowing Step -> Custom Sink.
If timestamp set in KafkaIO, Can I see process time by now() - timestamp in
Custom Sink ?

Thanks

On Thu, May 28, 2020 at 2:07 PM Luke Cwik  wrote:

> Dataflow provides msec counters for each transform that executes. You
> should be able to get them from stackdriver and see them from the Dataflow
> UI.
>
> You need to keep track of the timestamp of the element as it flows through
> the system as part of data that goes alongside the element. You can use the
> element's timestamp[1] if that makes sense (it might not if you intend to
> use a timestamp that is from the kafka record itself and the record's
> timestamp isn't the same as the ingestion timestamp). Unless you are
> writing your own sink, the sink won't track the processing time at all so
> you'll need to add a ParDo that goes right before it that writes the timing
> information to wherever you want (a counter, your own metrics database,
> logs, ...).
>
> 1:
> https://github.com/apache/beam/blob/018e889829e300ab9f321da7e0010ff0011a73b1/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L257
> 
>
>
> On Thu, May 28, 2020 at 1:12 PM Talat Uyarer 
> wrote:
>
>> Yes I am trying to track how long it takes for a single element to be
>> ingested into the pipeline until it is output somewhere.
>>
>> My pipeline is unbounded. I am using KafkaIO. I did not think about CPU
>> time. if there is a way to track it too, it would be useful to improve my
>> metrics.
>>
>> On Thu, May 28, 2020 at 12:52 PM Luke Cwik  wrote:
>>
>>> What do you mean by processing time?
>>>
>>> Are you trying to track how long it takes for a single element to be
>>> ingested into the pipeline until it is output somewhere?
>>> Do you have a bounded pipeline and want to know how long all the
>>> processing takes?
>>> Do you care about how much CPU time is being consumed in aggregate for
>>> all the processing that your pipeline is doing?
>>>
>>>
>>> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
>>> tuya...@paloaltonetworks.com> wrote:
>>>
 I am using Dataflow Runner. The pipeline read from kafkaIO and send
 Http. I could not find any metadata field on the element to set first read
 time.

 On Thu, May 28, 2020 at 10:44 AM Kyle Weaver 
 wrote:

> Which runner are you using?
>
> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> Hi,
>>
>> I have a pipeline which has 5 steps. What is the best way to measure
>> processing time for my pipeline?
>>
>> Thnaks
>>
>


Re: Pipeline Processing Time

2020-05-28 Thread Luke Cwik
Dataflow provides msec counters for each transform that executes. You
should be able to get them from stackdriver and see them from the Dataflow
UI.

You need to keep track of the timestamp of the element as it flows through
the system as part of data that goes alongside the element. You can use the
element's timestamp[1] if that makes sense (it might not if you intend to
use a timestamp that is from the kafka record itself and the record's
timestamp isn't the same as the ingestion timestamp). Unless you are
writing your own sink, the sink won't track the processing time at all so
you'll need to add a ParDo that goes right before it that writes the timing
information to wherever you want (a counter, your own metrics database,
logs, ...).

1:
https://github.com/apache/beam/blob/018e889829e300ab9f321da7e0010ff0011a73b1/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L257


On Thu, May 28, 2020 at 1:12 PM Talat Uyarer 
wrote:

> Yes I am trying to track how long it takes for a single element to be
> ingested into the pipeline until it is output somewhere.
>
> My pipeline is unbounded. I am using KafkaIO. I did not think about CPU
> time. if there is a way to track it too, it would be useful to improve my
> metrics.
>
> On Thu, May 28, 2020 at 12:52 PM Luke Cwik  wrote:
>
>> What do you mean by processing time?
>>
>> Are you trying to track how long it takes for a single element to be
>> ingested into the pipeline until it is output somewhere?
>> Do you have a bounded pipeline and want to know how long all the
>> processing takes?
>> Do you care about how much CPU time is being consumed in aggregate for
>> all the processing that your pipeline is doing?
>>
>>
>> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> I am using Dataflow Runner. The pipeline read from kafkaIO and send
>>> Http. I could not find any metadata field on the element to set first read
>>> time.
>>>
>>> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver 
>>> wrote:
>>>
 Which runner are you using?

 On Thu, May 28, 2020 at 1:43 PM Talat Uyarer <
 tuya...@paloaltonetworks.com> wrote:

> Hi,
>
> I have a pipeline which has 5 steps. What is the best way to measure
> processing time for my pipeline?
>
> Thnaks
>



Re: Pipeline Processing Time

2020-05-28 Thread Talat Uyarer
Yes I am trying to track how long it takes for a single element to be
ingested into the pipeline until it is output somewhere.

My pipeline is unbounded. I am using KafkaIO. I did not think about CPU
time. if there is a way to track it too, it would be useful to improve my
metrics.

On Thu, May 28, 2020 at 12:52 PM Luke Cwik  wrote:

> What do you mean by processing time?
>
> Are you trying to track how long it takes for a single element to be
> ingested into the pipeline until it is output somewhere?
> Do you have a bounded pipeline and want to know how long all the
> processing takes?
> Do you care about how much CPU time is being consumed in aggregate for all
> the processing that your pipeline is doing?
>
>
> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> I am using Dataflow Runner. The pipeline read from kafkaIO and send Http.
>> I could not find any metadata field on the element to set first read time.
>>
>> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver  wrote:
>>
>>> Which runner are you using?
>>>
>>> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer <
>>> tuya...@paloaltonetworks.com> wrote:
>>>
 Hi,

 I have a pipeline which has 5 steps. What is the best way to measure
 processing time for my pipeline?

 Thnaks

>>>


Re: Pipeline Processing Time

2020-05-28 Thread Luke Cwik
What do you mean by processing time?

Are you trying to track how long it takes for a single element to be
ingested into the pipeline until it is output somewhere?
Do you have a bounded pipeline and want to know how long all the processing
takes?
Do you care about how much CPU time is being consumed in aggregate for all
the processing that your pipeline is doing?


On Thu, May 28, 2020 at 11:01 AM Talat Uyarer 
wrote:

> I am using Dataflow Runner. The pipeline read from kafkaIO and send Http.
> I could not find any metadata field on the element to set first read time.
>
> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver  wrote:
>
>> Which runner are you using?
>>
>> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Hi,
>>>
>>> I have a pipeline which has 5 steps. What is the best way to measure
>>> processing time for my pipeline?
>>>
>>> Thnaks
>>>
>>


Re: Pipeline Processing Time

2020-05-28 Thread Talat Uyarer
I am using Dataflow Runner. The pipeline read from kafkaIO and send Http. I
could not find any metadata field on the element to set first read time.

On Thu, May 28, 2020 at 10:44 AM Kyle Weaver  wrote:

> Which runner are you using?
>
> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer 
> wrote:
>
>> Hi,
>>
>> I have a pipeline which has 5 steps. What is the best way to measure
>> processing time for my pipeline?
>>
>> Thnaks
>>
>


Re: Pipeline Processing Time

2020-05-28 Thread Kyle Weaver
Which runner are you using?

On Thu, May 28, 2020 at 1:43 PM Talat Uyarer 
wrote:

> Hi,
>
> I have a pipeline which has 5 steps. What is the best way to measure
> processing time for my pipeline?
>
> Thnaks
>