Re: Avro Parquet/Flink/Beam
Hi Billy, no, ParquetIO is in early stage and won't be included in 0.4.0-incubating (that I will prepare pretty soon). I will push the branch on my github (didn't have time yet, sorry about that). Regards JB On 12/13/2016 05:08 PM, Newport, Billy wrote: Is your parquetio going to be accepted in to 0.4? Also, do you have a link to your github? Thanks -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Monday, December 12, 2016 11:49 AM To: user@flink.apache.org Subject: Re: Avro Parquet/Flink/Beam Hi Billy, I will push my branch with ParquetIO on my github. Yes, the Beam IO is independent from the runner. Regards JB On 12/12/2016 05:29 PM, Newport, Billy wrote: I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk? The ParquetIO is independent of the runner being used? Is that right? Thanks -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Monday, December 12, 2016 11:25 AM To: user@flink.apache.org Subject: Re: Avro Parquet/Flink/Beam Hi, Beam provides a AvroCoder/AvroIO that you can use, but not yet a ParquetIO (I created a Jira about that and started to work on it). You can use the Avro reader to populate the PCollection and then use a custom DoFn to create the Parquet (waiting for the ParquetIO). Regards JB On 12/12/2016 05:19 PM, Newport, Billy wrote: Are there any examples showing the use of beam with avro/parquet and a flink runner? I see an avro reader for beam, is it a matter of writing another one for avro-parquet or does this need to use the flink HadoopOutputFormat for example? Thanks Billy -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: Avro Parquet/Flink/Beam
Hi Billy, I will push my branch with ParquetIO on my github. Yes, the Beam IO is independent from the runner. Regards JB On 12/12/2016 05:29 PM, Newport, Billy wrote: I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk? The ParquetIO is independent of the runner being used? Is that right? Thanks -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Monday, December 12, 2016 11:25 AM To: user@flink.apache.org Subject: Re: Avro Parquet/Flink/Beam Hi, Beam provides a AvroCoder/AvroIO that you can use, but not yet a ParquetIO (I created a Jira about that and started to work on it). You can use the Avro reader to populate the PCollection and then use a custom DoFn to create the Parquet (waiting for the ParquetIO). Regards JB On 12/12/2016 05:19 PM, Newport, Billy wrote: Are there any examples showing the use of beam with avro/parquet and a flink runner? I see an avro reader for beam, is it a matter of writing another one for avro-parquet or does this need to use the flink HadoopOutputFormat for example? Thanks Billy -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: Avro Parquet/Flink/Beam
Hi, Beam provides a AvroCoder/AvroIO that you can use, but not yet a ParquetIO (I created a Jira about that and started to work on it). You can use the Avro reader to populate the PCollection and then use a custom DoFn to create the Parquet (waiting for the ParquetIO). Regards JB On 12/12/2016 05:19 PM, Newport, Billy wrote: Are there any examples showing the use of beam with avro/parquet and a flink runner? I see an avro reader for beam, is it a matter of writing another one for avro-parquet or does this need to use the flink HadoopOutputFormat for example? Thanks Billy -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com