Re: [Hdf-forum] Efficient serialization of HDF5 data

Martijn Jasperse Tue, 05 Dec 2017 12:36:54 -0800

Dear Michaël,
Have you tried using the core driver with a file image? Seems to me that
this is what you want to do, see H5Pset_file_image
<https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFileImage>.
This enables you to "open" the file data in memory and then retrieve it
again after you've finished operations, using H5Fget_file_image.


We have previously used this for networked HDF5-based data transfer;
admittedly with small data instead of big data, but the disk access
overhead was unacceptable in that case too.

Cheers,
Martijn

On 6 December 2017 at 03:43, Michaël Melchiore <[email protected]> wrote:

> Dear Andrey,
>
> While Apache Spark does aim at working in memory when possible, my need is
> not related to Spark. There are many alternatives to Spark which can be
> used to perform in memory processing (Apache Storm, Apache Flink, Google
> Dataflow...)
> I have registered for more information regarding the Spark Connector but I
> am not sure it is what I am looking for.
>
> Kind regards,
>
> Michaël
>
> 2017-12-05 15:11 GMT+01:00 Андрей Парамонов <[email protected]>:
>
>> Hello Michaël!
>>
>> 04.12.2017 21:23, Michaël Melchiore пишет:
>>
>>> I build an application which operates on NetCDF data using Big Data
>>> technologies.
>>>
>>> My design aims at avoiding unnecessarily writing data to disk. Instead,
>>> I want to operate as much as possible in memory. The challenge is data
>>> (de)serialization for distributed communications between computing nodes.
>>>
>>> Since NetCDF4 and HDF5 already provide a portable data format, a simple
>>> and efficient design would simply access and then exchange the raw binary
>>> data over the network.
>>>
>>> Currently, I fail to access this buffer without creating files. I am
>>> investigating the use of the Apache Common VFS Ram file system to trick
>>> NetCDF into working in memory.
>>>
>>> But, a suggestion on the NetCDF Java mailing list (see ticket
>>> MQO-415619) was to build an alternative to the core driver. I feel this is
>>> the more desirable course of actions as it is about improving the existing
>>> solutions instead of working around their limitations.
>>>
>>> Do you think this approach is feasible ? Any starting pointers would be
>>> appreciated !
>>>
>>
>> I am probably not a distinguished expert in HDF5, but I take courage to
>> suggest you to check
>> https://www.hdfgroup.org/downloads/spark-connector/
>> It would be superb if you could share your experience and whether Spark
>> connector helped you to implement in-memory processing.
>>
>> Best wishes,
>> Andrey Paramonov
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Efficient serialization of HDF5 data

Reply via email to