[ANNOUNCE] Apache Beam 2.4.0 released

2018-03-22 Thread Robert Bradshaw
We are pleased to announce the release of Apache Beam 2.4.0. Thanks goes to
the many people who made this possible.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

 https://beam.apache.org/get-started/downloads/

As well as many bugfixes, some notable changes in this release are:
- A new Python Direct runner, up to 15x faster than the old one.
- Kinesis support for reading and writing in Java
- Several refactoring to enable portability (Go/Python on Flink/Spark)

Full release notes can be found at

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342682&projectId=12319527

Enjoy!


Re: [ANNOUNCE] Apache Beam 2.4.0 released

2018-03-22 Thread Etienne Chauchot
Great !
Le jeudi 22 mars 2018 à 08:24 +, Robert Bradshaw a écrit :
> We are pleased to announce the release of Apache Beam 2.4.0. Thanks goes to
> the many people who made this possible.
> 
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
> 
> You can download the release here:
> 
>  https://beam.apache.org/get-started/downloads/
> 
> As well as many bugfixes, some notable changes in this release are:
> - A new Python Direct runner, up to 15x faster than the old one.
> - Kinesis support for reading and writing in Java
> - Several refactoring to enable portability (Go/Python on Flink/Spark)
> 
> Full release notes can be found at
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342682&projectId=12319527
> 
> Enjoy!

Re: [ANNOUNCE] Apache Beam 2.4.0 released

2018-03-22 Thread Romain Manni-Bucau
congrats guys


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-03-22 9:50 GMT+01:00 Etienne Chauchot :

> Great !
> Le jeudi 22 mars 2018 à 08:24 +, Robert Bradshaw a écrit :
>
> We are pleased to announce the release of Apache Beam 2.4.0. Thanks goes to
> the many people who made this possible.
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
>
>  https://beam.apache.org/get-started/downloads/
>
> As well as many bugfixes, some notable changes in this release are:
> - A new Python Direct runner, up to 15x faster than the old one.
> - Kinesis support for reading and writing in Java
> - Several refactoring to enable portability (Go/Python on Flink/Spark)
>
> Full release notes can be found at
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342682&projectId=12319527
>
> Enjoy!
>
>


Re: [ANNOUNCE] Apache Beam 2.4.0 released

2018-03-22 Thread Alexey Romanenko
Great news! Congrats!

WBR,
Alexey

> On 22 Mar 2018, at 10:10, Romain Manni-Bucau  wrote:
> 
> congrats guys
> 
> 
> Romain Manni-Bucau
> @rmannibucau  |  Blog 
>  | Old Blog 
>  | Github  
> | LinkedIn  | Book 
> 
> 2018-03-22 9:50 GMT+01:00 Etienne Chauchot  >:
> Great !
> Le jeudi 22 mars 2018 à 08:24 +, Robert Bradshaw a écrit :
>> We are pleased to announce the release of Apache Beam 2.4.0. Thanks goes to
>> the many people who made this possible.
>> 
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org 
>> 
>> 
>> You can download the release here:
>> 
>>  https://beam.apache.org/get-started/downloads/ 
>> 
>> 
>> As well as many bugfixes, some notable changes in this release are:
>> - A new Python Direct runner, up to 15x faster than the old one.
>> - Kinesis support for reading and writing in Java
>> - Several refactoring to enable portability (Go/Python on Flink/Spark)
>> 
>> Full release notes can be found at
>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342682&projectId=12319527
>>  
>> 
>> 
>> Enjoy!
> 



Re: Apache beam DataFlow runner throwing setup error

2018-03-22 Thread Ahmet Altay
Hi Rajesh,

Have you looked at the worker-startup logs [1]? You should be able to see
the setup error there. It is possible that something in your requirements
file is failing to install in the workers. If that is the case,
see Managing Python Pipeline Dependencies [2] for alternative options. You
could also reach out to Google Cloud Dataflow support for getting
additional help [3]

Thank you,
Ahmet

[1]
https://cloud.google.com/dataflow/pipelines/logging#monitoring-pipeline-logs
[2] https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
[3] https://cloud.google.com/dataflow/support

On Thu, Mar 22, 2018 at 10:08 PM, Rajesh Hegde 
wrote:

> Hi,
> We are building data pipeline using Beam Python SDK and trying to run on
> Dataflow, but getting the below error,
>
> *A setup error was detected in
> beamapp--0322102737-03220329-8a74-harness-lm6v. Please refer to the
> worker-startup log for detailed information.*
>
> But could not find detailed worker-startup logs.
>
> We tried increasing memory size, worker count etc, but still getting the
> same error.
>
> Here is the command we use,
> *python run.py \*
> *--project=xyz \*
> *--runner=DataflowRunner \*
> *--staging_location=gs://xyz/staging \*
> *--temp_location=gs://xyz/temp \*
> *--requirements_file=requirements.txt \*
> *--worker_machine_type n1-standard-8 \*
> *--num_workers 2*
>
>
> pipeline snippet
>
> *data = pipeline | "load data" >> beam.io.Read(*
> *beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")*
> *)*
>
> *data | "filter data" >> beam.Filter(lambda x: x.get('column_name') ==
> value)*
>
>
> Above pipeline is just loading the data from BigQuery and filtering based
> on some column value. This pipeline works like a charm in DirectRunner but
> fails on Dataflow.
>
> Are we doing any obvious setup mistake? anyone else getting the same
> error? We could use some help to resolve the issue.
>
>
> --
>
> *Rajesh Hegde | Lead Product Developer | Datalicious*
> *e*: rhe...@datalicious.com | *m*: +919167571827 <+91%2091675%2071827>
> *a*: L-77, 15th Cross Rd, Sector 6, HSR Layout,
> 
> Bangalore Karnataka- 560102
> 
> *w*: www.datalicious.com
> 
>
> *Contact supp...@datalicious.com  anytime, we're
> keen to help!*
>
> 
>    
>
>
>
> 
>
>