Dear Martjin, Yes, this is very promising. Thank you for bringing this to my attention.
Michaël 2017-12-05 21:34 GMT+01:00 Martijn Jasperse <[email protected]>: > Dear Michaël, > Have you tried using the core driver with a file image? Seems to me that > this is what you want to do, see H5Pset_file_image > <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFileImage>. > This enables you to "open" the file data in memory and then retrieve it > again after you've finished operations, using H5Fget_file_image. > > We have previously used this for networked HDF5-based data transfer; > admittedly with small data instead of big data, but the disk access > overhead was unacceptable in that case too. > > Cheers, > Martijn > > On 6 December 2017 at 03:43, Michaël Melchiore <[email protected]> wrote: > >> Dear Andrey, >> >> While Apache Spark does aim at working in memory when possible, my need >> is not related to Spark. There are many alternatives to Spark which can be >> used to perform in memory processing (Apache Storm, Apache Flink, Google >> Dataflow...) >> I have registered for more information regarding the Spark Connector but >> I am not sure it is what I am looking for. >> >> Kind regards, >> >> Michaël >> >> 2017-12-05 15:11 GMT+01:00 Андрей Парамонов <[email protected]>: >> >>> Hello Michaël! >>> >>> 04.12.2017 21:23, Michaël Melchiore пишет: >>> >>>> I build an application which operates on NetCDF data using Big Data >>>> technologies. >>>> >>>> My design aims at avoiding unnecessarily writing data to disk. Instead, >>>> I want to operate as much as possible in memory. The challenge is data >>>> (de)serialization for distributed communications between computing nodes. >>>> >>>> Since NetCDF4 and HDF5 already provide a portable data format, a simple >>>> and efficient design would simply access and then exchange the raw binary >>>> data over the network. >>>> >>>> Currently, I fail to access this buffer without creating files. I am >>>> investigating the use of the Apache Common VFS Ram file system to trick >>>> NetCDF into working in memory. >>>> >>>> But, a suggestion on the NetCDF Java mailing list (see ticket >>>> MQO-415619) was to build an alternative to the core driver. I feel this is >>>> the more desirable course of actions as it is about improving the existing >>>> solutions instead of working around their limitations. >>>> >>>> Do you think this approach is feasible ? Any starting pointers would be >>>> appreciated ! >>>> >>> >>> I am probably not a distinguished expert in HDF5, but I take courage to >>> suggest you to check >>> https://www.hdfgroup.org/downloads/spark-connector/ >>> It would be superb if you could share your experience and whether Spark >>> connector helped you to implement in-memory processing. >>> >>> Best wishes, >>> Andrey Paramonov >>> >>> -- >>> This message has been scanned for viruses and >>> dangerous content by MailScanner, and is >>> believed to be clean. >>> >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> [email protected] >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> Twitter: https://twitter.com/hdf5 >> >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >> Twitter: https://twitter.com/hdf5 >> > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
