Hi Davor,

thanks for your reply. And do not worry, we did know in advance, that our
solution would be temporary only.

So for the time being, we go for the HDFSFileSource/Sink approach and look
forward for the final solution as soon as HDFSFileSystem will be ready.

Thanks again for your help!

On Tue, Mar 7, 2017 at 4:39 AM, Davor Bonaci <da...@apache.org> wrote:

> Hi Michael,
> Sorry about the inconvenience here; AvroWrapperCoder is indeed removed
> recently from Hadoop/HDFS IO.
>
> I think the best approach would be to use HDFSFileSource; this is the only
> approach I can recommend today.
>
> Going forward, we are working on being able to read Avro files via AvroIO,
> regardless which file system the files may be stored on. So, you'd do
> something like AvroIO.Read.from("hdfs://..."), just as you can today do
> AvroIO.Read.from("gs://...").
>
> Hope this helps!
>
> Davor
>
> On Tue, Feb 28, 2017 at 4:24 PM, Michael Luckey <adude3...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> we are currently using beam over spark, reading and writing avro files to
>> hdfs.
>>
>> Until now we use HDFSFileSource for reading and HadoopIO for writing,
>> essentially reading and writing PCollection<AvroKey<GenericRecord>>
>>
>> With the changes introduced by https://issues.apache.org/jira
>> /browse/BEAM-1497 this seems to be not directly supported anymore by
>> beam, as the required AvroWrapperCoder is deleted.
>>
>> So as we have to change our code anyway, we are wondering, what would be
>> the recommended approach to read/write avro files from/to hdfs with beam on
>> spark.
>>
>> - use the new implementation of HDFSFileSource/HDFSFileSink
>> - use spark provided HadoopIO (and probably reimplement AvroWrapperCoder
>> by ourself?)
>>
>> What ware the trade offs here, possibly also considering already planned
>> changes on IO? Do we have advantages using the spark HadoopIO as our
>> underlying engine is currently spark, or will this eventually be deprecated
>> and exists only for ‘historical’ reasons?
>>
>> Any thoughts and advice here?
>>
>> Regards,
>>
>> michel
>>
>
>

Reply via email to