Re: [ANNOUNCEMENT] Common Pipeline Patterns - new section in the documentation + contributions welcome

2019-06-07 Thread Reza Rokni
+1 on the pattern Tim!

Please raise a Jira with the label pipeline-patterns, details are here:
https://beam.apache.org/documentation/patterns/overview/#contributing-a-pattern



On Sat, 8 Jun 2019 at 05:04, Tim Robertson 
wrote:

> This is great. Thanks Pablo and all
>
> I've seen several folk struggle with writing avro to dynamic locations
> which I think might be a good addition. If you agree I'll offer a PR unless
> someone gets there first - I have an example here:
>
> https://github.com/gbif/pipelines/blob/master/pipelines/export-gbif-hbase/src/main/java/org/gbif/pipelines/hbase/beam/ExportHBase.java#L81
>
>
> On Fri, Jun 7, 2019 at 10:52 PM Pablo Estrada  wrote:
>
>> Hello everyone,
>> A group of community members has been working on gathering and providing
>> common pipeline patterns for pipelines in Beam. These are examples on how
>> to perform certain operations, and useful ways of using Beam in your
>> pipelines. Some of them relate to processing of files, use of side inputs,
>> sate/timers, etc. Check them out[1].
>>
>> These initial patterns have been chosen based on evidence gathered from
>> StackOverflow, and from talking to users of Beam.
>>
>> It would be great if this section could grow, and be useful to many Beam
>> users. For that reason, we invite anyone to share patterns, and pipeline
>> examples that they have used in the past. If you are interested in
>> contributing, please submit a pull request, or get in touch with Cyrus
>> Maden, Reza Rokni, Melissa Pashniak or myself.
>>
>> Thanks!
>> Best
>> -P.
>>
>> [1] https://beam.apache.org/documentation/patterns/overview/
>>
>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.


Re: [ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-07 Thread Chad Dombrova
I saw this and was particularly excited about the new support for
"external" transforms in portable runners like python (i.e. the ability to
use the Java KafkaIO transforms, with presumably more to come in the
future).  While the release notes are useful, I will say that it takes a
lot of time and effort to sift through the release notes to find relevant
issues.  They're not grouped by sdk/component, and for example, not all of
the python issues include the word "python" in their title.  It would be
great to have a blurb on the Beam blog explaining the highlights.  An
example of a project that I think does this very well is mypy:
http://mypy-lang.blogspot.com/

thanks!
chad





On Fri, Jun 7, 2019 at 2:58 PM Kyle Weaver  wrote:

> Awesome! Thanks for leading the release Ankur.
>
> On Fri, Jun 7, 2019 at 2:57 PM Ankur Goenka  wrote:
>
>> The Apache Beam team is pleased to announce the release of version 2.13
>> .0!
>>
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org
>>
>> You can download the release here:
>>
>> https://beam.apache.org/get-started/downloads/
>>
>> This release includes bugfixes, features, and improvements detailed on
>> the Beam blog: https://beam.apache.org/blog/2019/05/22/beam-2.13.0.html
>>
>> Thanks to everyone who contributed to this release, and we hope you enjoy
>> using Beam 2.13.0.
>>
>> -- Ankur Goenka, on behalf of The Apache Beam team
>>
> --
> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
> | +1650203
>


Re: [ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-07 Thread Kyle Weaver
Awesome! Thanks for leading the release Ankur.

On Fri, Jun 7, 2019 at 2:57 PM Ankur Goenka  wrote:

> The Apache Beam team is pleased to announce the release of version 2.13.0!
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
>
> https://beam.apache.org/get-started/downloads/
>
> This release includes bugfixes, features, and improvements detailed on
> the Beam blog: https://beam.apache.org/blog/2019/05/22/beam-2.13.0.html
>
> Thanks to everyone who contributed to this release, and we hope you enjoy
> using Beam 2.13.0.
>
> -- Ankur Goenka, on behalf of The Apache Beam team
>
-- 
Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com |
+1650203


[ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-07 Thread Ankur Goenka
The Apache Beam team is pleased to announce the release of version 2.13.0!

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bugfixes, features, and improvements detailed on
the Beam blog: https://beam.apache.org/blog/2019/05/22/beam-2.13.0.html

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.13.0.

-- Ankur Goenka, on behalf of The Apache Beam team


Re: How Can I access the key in subclass of CombinerFn when combining a PCollection of KV pairs?

2019-06-07 Thread Massy Bourennani
I was looking for some built-in feature, as suggested in the programming
guide. However the answer does the job.

Massy

Le ven. 7 juin 2019 à 22:13, Kenneth Knowles  a écrit :

> The answer on StackOverflow looks good to me.
>
> Kenn
>
> On Fri, Jun 7, 2019 at 4:11 AM Massy Bourennani 
> wrote:
>
>> Hi,
>>
>> here is the SO post:
>>
>> https://stackoverflow.com/questions/56451796/how-can-i-access-the-key-in-subclass-of-combinerfn-when-combining-a-pcollection
>>
>> Many thanks
>>
>


Re: [ANNOUNCEMENT] Common Pipeline Patterns - new section in the documentation + contributions welcome

2019-06-07 Thread Tim Robertson
This is great. Thanks Pablo and all

I've seen several folk struggle with writing avro to dynamic locations
which I think might be a good addition. If you agree I'll offer a PR unless
someone gets there first - I have an example here:

https://github.com/gbif/pipelines/blob/master/pipelines/export-gbif-hbase/src/main/java/org/gbif/pipelines/hbase/beam/ExportHBase.java#L81


On Fri, Jun 7, 2019 at 10:52 PM Pablo Estrada  wrote:

> Hello everyone,
> A group of community members has been working on gathering and providing
> common pipeline patterns for pipelines in Beam. These are examples on how
> to perform certain operations, and useful ways of using Beam in your
> pipelines. Some of them relate to processing of files, use of side inputs,
> sate/timers, etc. Check them out[1].
>
> These initial patterns have been chosen based on evidence gathered from
> StackOverflow, and from talking to users of Beam.
>
> It would be great if this section could grow, and be useful to many Beam
> users. For that reason, we invite anyone to share patterns, and pipeline
> examples that they have used in the past. If you are interested in
> contributing, please submit a pull request, or get in touch with Cyrus
> Maden, Reza Rokni, Melissa Pashniak or myself.
>
> Thanks!
> Best
> -P.
>
> [1] https://beam.apache.org/documentation/patterns/overview/
>


[ANNOUNCEMENT] Common Pipeline Patterns - new section in the documentation + contributions welcome

2019-06-07 Thread Pablo Estrada
Hello everyone,
A group of community members has been working on gathering and providing
common pipeline patterns for pipelines in Beam. These are examples on how
to perform certain operations, and useful ways of using Beam in your
pipelines. Some of them relate to processing of files, use of side inputs,
sate/timers, etc. Check them out[1].

These initial patterns have been chosen based on evidence gathered from
StackOverflow, and from talking to users of Beam.

It would be great if this section could grow, and be useful to many Beam
users. For that reason, we invite anyone to share patterns, and pipeline
examples that they have used in the past. If you are interested in
contributing, please submit a pull request, or get in touch with Cyrus
Maden, Reza Rokni, Melissa Pashniak or myself.

Thanks!
Best
-P.

[1] https://beam.apache.org/documentation/patterns/overview/


Re: How Can I access the key in subclass of CombinerFn when combining a PCollection of KV pairs?

2019-06-07 Thread Kenneth Knowles
The answer on StackOverflow looks good to me.

Kenn

On Fri, Jun 7, 2019 at 4:11 AM Massy Bourennani 
wrote:

> Hi,
>
> here is the SO post:
>
> https://stackoverflow.com/questions/56451796/how-can-i-access-the-key-in-subclass-of-combinerfn-when-combining-a-pcollection
>
> Many thanks
>


RE: [Sender Auth Failure] Re: How to build a beam python pipeline which does GET/POST request to API's

2019-06-07 Thread Anjana Pydi
Hi Soliman,

Thanks for providing the example !! I tried this and it worked out.

Regards,
Anjana

From: Soliman ElSaber [soli...@mindvalley.com]
Sent: Wednesday, June 05, 2019 8:56 PM
To: user@beam.apache.org
Cc: Anjana Pydi
Subject: Re: [Sender Auth Failure] Re: How to build a beam python pipeline 
which does GET/POST request to API's

Hi Anjana,

I used this code before to get some data form API call and store it into 
BigQuery using Apache Beam



def get_api_data(data):
data_every_sec = 
requests.get("https://min-api.cryptocompare.com/data/price?fsym=ETH&tsyms=BTC,USD,EUR";).json()
return [data_every_sec]
# expected: {u'USD': 210.76, u'BTC': 0.03273, u'EUR': 184.02}

def parse_btc(btc_item):
usd, btc, eur = btc_item['USD'], btc_item['BTC'], btc_item['EUR']
return [(btc,usd,eur)]

dayData = (p
   | 'get data' >> beam.ParDo(get_api_data)
   | 'parse btc' >> beam.ParDo(parse_btc)
   | 'Write' >> beam.io.WriteToBigQuery(...)
   )

Hope it will help you...



On Tue, Jun 4, 2019 at 8:01 PM Anjana Pydi 
mailto:anjan...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for the suggestion.

Could you please provide me any examples if you know which are close to this 
use case.

Regards,
Anjana

From: Ankur Goenka [goe...@google.com]
Sent: Monday, June 03, 2019 4:27 PM
To: user@beam.apache.org
Subject: [Sender Auth Failure] Re: How to build a beam python pipeline which 
does GET/POST request to API's

By looking at your usecase, the whole processing logic seems to be very custom.
I would recommend using ParDo's to express your use case. If the processing for 
individual dictionary is expensive then you can potentially use a reshuffle 
operation to distribute the updation of dictionary over multiple workers.

Note: As you are going to make write API calls your self, in case of worker 
failure, your transform can be executed multiple times.

On Mon, Jun 3, 2019 at 11:41 AM Anjana Pydi 
mailto:anjan...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for reply. Please find responses updated in below mail.

Thanks,
Anjana

From: Ankur Goenka [goe...@google.com]
Sent: Monday, June 03, 2019 11:01 AM
To: user@beam.apache.org
Subject: Re: How to build a beam python pipeline which does GET/POST request to 
API's

Thanks for providing more information.

Some follow up questions/comments
1. Call an API which would provide a dictionary as response.
Question: Do you need to make multiple of these API calls? If yes, what 
distinguishes API call1 from call2? If its the input to the API, then can you 
provide the inputs to in a file etc? What I am trying to identify is an input 
source to the pipeline so that beam can distribute the work.
Answer : When an API call is made, it can provide a list of dictionaries as 
response, we have to go through every dictionary, do the same transformations 
for each and send it.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as 
output.
Question: Are these write operation idempotent? As you are doing your own api 
calls, its possible that after a failure, the calls are done again for the same 
input. If write calls are not idempotent then their can be duplicate data.
Answer : Suppose, if I receive a list of 1000 dictionaries as response when I 
called API in point1, I should do only 1000 write operations respectively to 
each input. If there is a failure for any input, only that should not be posted 
and remaining should be posted successfully.

On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi 
mailto:anjan...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for the reply! Below is more details of the usecase:

1. Call an API which would provide a dictionary as response.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as 
output.

Please let me know in case of any clarifications.

Thanks,
Anjana

From: Ankur Goenka [goe...@google.com]
Sent: Saturday, June 01, 2019 6:47 PM
To: user@beam.apache.org
Subject: Re: How to build a beam python pipeline which does GET/POST request to 
API's

Hi Anjana,

You can write your API logic in a ParDo and subsequently pass the elements to 
other ParDos to transform and eventually make an API call to to another 
endpoint.

However, this might not be a good fit for Beam as the input is not well defined 
and hence scaling and "once processing" of elements will not be possible as 
their is no well defined input.

It will be better to elaborate a bit more on the usecase for better suggestions.

Thanks,
Ankur

On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi 
mailto:anjan...@bahwancybertek.com>> wrote:
Hi,

RE: How to convert python dictionary / JSON to a pcollection

2019-06-07 Thread Anjana Pydi
Hi Robert,

Thanks for the suggestion!! I will try this out, I made a list of dictionaries 
and converted that to a pcollection and it worked.

Regards,
Anjana

From: Robert Bradshaw [rober...@google.com]
Sent: Wednesday, June 05, 2019 12:27 AM
To: user
Subject: Re: How to convert python dictionary / JSON to a pcollection

I assume you have the dictionary in memory? If so, you can do

pcoll = p | beam.Create(my_dict.items())

On Wed, Jun 5, 2019 at 8:10 AM Anjana Pydi  wrote:
>
> Hi,
>
> Can some one please let me know how to convert a dictionary / JSON to a 
> pcollection ?
>
> Thanks,
> Anjana
> ---
>  The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others authorized 
> to receive it. It may contain confidential or legally privileged information. 
> If you are not the intended recipient you are hereby notified that any 
> disclosure, copying, distribution or taking any action in reliance on the 
> contents of this information is strictly prohibited and may be unlawful. If 
> you are not the intended recipient, please notify us immediately by 
> responding to this email and then delete it from your system. Bahwan Cybertek 
> is neither liable for the proper and complete transmission of the information 
> contained in this communication nor for any delay in its receipt.
---
 The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you are not the 
intended recipient, please notify us immediately by responding to this email 
and then delete it from your system. Bahwan Cybertek is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.


How Can I access the key in subclass of CombinerFn when combining a PCollection of KV pairs?

2019-06-07 Thread Massy Bourennani
Hi,

here is the SO post:
https://stackoverflow.com/questions/56451796/how-can-i-access-the-key-in-subclass-of-combinerfn-when-combining-a-pcollection

Many thanks