Dear Michaël, Have you tried using the core driver with a file image? Seems to me that this is what you want to do, see H5Pset_file_image <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFileImage>. This enables you to "open" the file data in memory and then retrieve it again after you've finished operations, using H5Fget_file_image.
We have previously used this for networked HDF5-based data transfer; admittedly with small data instead of big data, but the disk access overhead was unacceptable in that case too. Cheers, Martijn On 6 December 2017 at 03:43, Michaël Melchiore <[email protected]> wrote: > Dear Andrey, > > While Apache Spark does aim at working in memory when possible, my need is > not related to Spark. There are many alternatives to Spark which can be > used to perform in memory processing (Apache Storm, Apache Flink, Google > Dataflow...) > I have registered for more information regarding the Spark Connector but I > am not sure it is what I am looking for. > > Kind regards, > > Michaël > > 2017-12-05 15:11 GMT+01:00 Андрей Парамонов <[email protected]>: > >> Hello Michaël! >> >> 04.12.2017 21:23, Michaël Melchiore пишет: >> >>> I build an application which operates on NetCDF data using Big Data >>> technologies. >>> >>> My design aims at avoiding unnecessarily writing data to disk. Instead, >>> I want to operate as much as possible in memory. The challenge is data >>> (de)serialization for distributed communications between computing nodes. >>> >>> Since NetCDF4 and HDF5 already provide a portable data format, a simple >>> and efficient design would simply access and then exchange the raw binary >>> data over the network. >>> >>> Currently, I fail to access this buffer without creating files. I am >>> investigating the use of the Apache Common VFS Ram file system to trick >>> NetCDF into working in memory. >>> >>> But, a suggestion on the NetCDF Java mailing list (see ticket >>> MQO-415619) was to build an alternative to the core driver. I feel this is >>> the more desirable course of actions as it is about improving the existing >>> solutions instead of working around their limitations. >>> >>> Do you think this approach is feasible ? Any starting pointers would be >>> appreciated ! >>> >> >> I am probably not a distinguished expert in HDF5, but I take courage to >> suggest you to check >> https://www.hdfgroup.org/downloads/spark-connector/ >> It would be superb if you could share your experience and whether Spark >> connector helped you to implement in-memory processing. >> >> Best wishes, >> Andrey Paramonov >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >> Twitter: https://twitter.com/hdf5 > > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
