Thanks for the update Eila ! Much appreciated.
Regards JB On 03/23/2018 12:57 PM, OrielResearch Eila Arich-Landkof wrote: > Hi All, > > Cham and myself were trying to initiate the HDF5 support with the HDF5 team. > It > seems that their forum might be able to provide the required support. > I have created a ticket on their system. https://forum.hdfgroup.org/ and will > follow up after that to make sure that this is not being forgotten. > Please let me know if you have any comments > > Best, > Eila > > > On Fri, Mar 23, 2018 at 3:07 AM, Jean-Baptiste Onofré <[email protected] > <mailto:[email protected]>> wrote: > > Hi all, > > Sorry for the delay, but I got issues with my e-mail provider (I was not > able to > send e-mails :( ). > > Last week during Beam Summit, I had the change to participate to the IO > brainstorming session. > > Here's the minute notes: > > 1. IOs set > We now have a decent number of IOs in Beam, and new are coming (ParquetIO, > RabbitMQIO). Users mentioned a new file format you could support: HDF5. > It would > be an Python IO. > I will create the Jira about HDF5. > Other IOs will also be in preparation, coming along with SDF support. > > 2. IOs and SDKs > This point was related to the portability layer: how can I use a Java IO > in > Python or the opposite ? Today, most of the IOs are related to Java SDK, > and > it's a bit frustrating for Python SDK users. Users are looking forward > portability layer, however they also expressed some questions about Docker > requirements. I think we should prepare a clean answer to this point. > > 3. PCollection Headers > Users want more "dynamic" IOs, maybe that a IO behavior could change > depending > of the element they are considering in the PCollection. I introduced what > we are > using in Apache Camel: Message Headers. The Camel components endpoints > (equivalent of Beam IOs) can use the headers: for instance the camel-http > component can use a Camel.HTTP_URL header. We already discussed about > PCollection headers/hints/annotation/metadata (whatever the name we give) > and I > still think it would be a great feature for both IOs and even the runners. > I'm proposing to create a Jira about that, I will be more than happy to > work on > this one. > > 4. Schema > As you might know, we are working on adding schema support in > PCollection. This > feature can be leveraged by IOs. Especially, I think it would reduce the > "wrapping" made by IOs (like KafkaRecord, JmsRecord, ...) and easier data > convert. > > 5. Error Handling > Users would need a generic error handling in the IOs. Today the error > handling > is managed by each IOs. I introduced the error handler we are using in > Apache > Camel (sorry again ;)) and especially the default error handler features > like: > redelivery policy, recoverable/irrecoverable error handling, onWhen, > onException, whileTrue, ... > The error handler is not at component level but at routing engine level. > We > could imagine something similar at pipeline level. > Thoughts ? > > I hope I didn't forget something ;) > > To summarize: > - I will create new Jiras for HDF5 and other new IOs > - We have to work on documentation/explanation about portability layer & > IOs > - I will start a separate thread for error handling discussion > - Nothing to do about schema: it has already started. > > Regards > JB > -- > Jean-Baptiste Onofré > [email protected] <mailto:[email protected]> > http://blog.nanthrax.net > Talend - http://www.talend.com > > > > > -- > Eila > www.orielresearch.org <http://www.orielresearch.org> > https://www.meetup.com/Deep-Learning-In-Production/ -- Jean-Baptiste Onofré [email protected] http://blog.nanthrax.net Talend - http://www.talend.com
