Re: Avro Parquet/Flink/Beam

2016-12-13 Thread Jean-Baptiste Onofré

Hi Billy,

no, ParquetIO is in early stage and won't be included in 
0.4.0-incubating (that I will prepare pretty soon).


I will push the branch on my github (didn't have time yet, sorry about 
that).


Regards
JB

On 12/13/2016 05:08 PM, Newport, Billy wrote:

Is your parquetio going to be accepted in to 0.4?

Also, do you have a link to your github?


Thanks

-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
Sent: Monday, December 12, 2016 11:49 AM
To: user@flink.apache.org
Subject: Re: Avro Parquet/Flink/Beam

Hi Billy,

I will push my branch with ParquetIO on my github.

Yes, the Beam IO is independent from the runner.

Regards
JB

On 12/12/2016 05:29 PM, Newport, Billy wrote:

I don't mind writing one, is there a fork for the ParquetIO works that's 
already been done or is it in trunk?

The ParquetIO is independent of the runner being used? Is that right?

Thanks

-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
Sent: Monday, December 12, 2016 11:25 AM
To: user@flink.apache.org
Subject: Re: Avro Parquet/Flink/Beam

Hi,

Beam provides a AvroCoder/AvroIO that you can use, but not yet a
ParquetIO (I created a Jira about that and started to work on it).

You can use the Avro reader to populate the PCollection and then use a
custom DoFn to create the Parquet (waiting for the ParquetIO).

Regards
JB

On 12/12/2016 05:19 PM, Newport, Billy wrote:

Are there any examples showing the use of beam with avro/parquet and a
flink runner? I see an avro reader for beam, is it a matter of writing
another one for avro-parquet or does this need to use the flink
HadoopOutputFormat for example?



Thanks

Billy









--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Avro Parquet/Flink/Beam

2016-12-12 Thread Jean-Baptiste Onofré

Hi Billy,

I will push my branch with ParquetIO on my github.

Yes, the Beam IO is independent from the runner.

Regards
JB

On 12/12/2016 05:29 PM, Newport, Billy wrote:

I don't mind writing one, is there a fork for the ParquetIO works that's 
already been done or is it in trunk?

The ParquetIO is independent of the runner being used? Is that right?

Thanks

-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
Sent: Monday, December 12, 2016 11:25 AM
To: user@flink.apache.org
Subject: Re: Avro Parquet/Flink/Beam

Hi,

Beam provides a AvroCoder/AvroIO that you can use, but not yet a
ParquetIO (I created a Jira about that and started to work on it).

You can use the Avro reader to populate the PCollection and then use a
custom DoFn to create the Parquet (waiting for the ParquetIO).

Regards
JB

On 12/12/2016 05:19 PM, Newport, Billy wrote:

Are there any examples showing the use of beam with avro/parquet and a
flink runner? I see an avro reader for beam, is it a matter of writing
another one for avro-parquet or does this need to use the flink
HadoopOutputFormat for example?



Thanks

Billy







--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Avro Parquet/Flink/Beam

2016-12-12 Thread Jean-Baptiste Onofré

Hi,

Beam provides a AvroCoder/AvroIO that you can use, but not yet a 
ParquetIO (I created a Jira about that and started to work on it).


You can use the Avro reader to populate the PCollection and then use a 
custom DoFn to create the Parquet (waiting for the ParquetIO).


Regards
JB

On 12/12/2016 05:19 PM, Newport, Billy wrote:

Are there any examples showing the use of beam with avro/parquet and a
flink runner? I see an avro reader for beam, is it a matter of writing
another one for avro-parquet or does this need to use the flink
HadoopOutputFormat for example?



Thanks

Billy





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com