Jim, do you need barebones RDDs or some of the more structured types (Spark 
DataFrame, Dataset)?
How about loading the data via HDF5/JDBC?

G.

From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Rowe, Jim
Sent: Monday, January 30, 2017 9:23 AM
To: HDF Users Discussion List <[email protected]>
Cc: Smith, Jacob <[email protected]>
Subject: [Hdf-forum] Azure, DataLake, Spark, Hadoop suggestions....

Hello HDF Gurus,
We are doing some machine learning work against HDF5 data (several hundred 
files, 5-50GB each).

We are looking for others who may have blazed or been blazing this trail.  We 
are in Azure using Microsoft DataLake storage and working through trying to 
read data into RDDs for use in Spark.

We have been working with h5py, but running into issues where we cannot access 
files that MS exposes using the "adl://" URI-our assumption is that however 
that is implemented, it does not translate to a filesystem the underlying HDF5 
libraries can read (?).   Our best option so far is to copy the files locally, 
which introduces an extra step and delay in the process.

If anyone has suggestions or insights on how to architect a cloud solution as 
roughly described, we would love to talk to you.  We are also potentially 
looking for some paid consulting help in this area if anyone is interested.


Warm regards,
--Jim
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to