On Wednesday, April 07, 2004 1:26 AM Tom Lane wrote:

> 
> But to get back to the point of this discussion: to allow PG 
> to use raw devices instead of filesystems, we'd first have to do a ton of
> portability work 
...

[The following is said in a low, tentative voice :) ]

I wonder if writing the postgresql data structures as HDF5 data structures 
(http://hdf.ncsa.uiuc.edu/whatishdf5.html) within a single HDF5 file (perhaps the WAL 
files would still reside elsewhere) would improve performance while allowing HDF5 to 
handle portability, and other useful features, is a better solution than the relying 
on filesystem features.

HDF5 actually provides an added portability advantage that postgresql does not 
currently enjoy:
"a completely portable file format, so that a file can be written on any system and 
read on any other"
(See http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf).
The HDF5 "distribution" includes tools for dumping data structures, etc. so if you're 
hooked on filesystem level operations, you have the ability to inspect postgresql data 
structures within the HDF5 file, i.e., "outside postgresql".

HDF5's is also designed for clustered/grid computing systems:
"The HDF5 format and library provide a powerful means of organizing and accessing data 
in a manner that allows scientists to share, process, and manipulate data in today's 
heterogeneous and quickly-evolving high-performance computational environment, 
including the emerging computational GRIDs." 
(http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf, p. 3).
So, the main purpose of this post is to suggest that HDF5's design moves a postgresql 
version built on a HDF5 datastore that much closer to being ready for 
cluster-computing environments, with respect to the datastore (there's still the 
shared memory, etc., that need to be addressed, but ...).

We're playing with HDF5 from Python (see the pytables project) for our "analytics" 
work, but that requires moving data out of postgresql. I suspect that an SQL interface 
to HDF5 data structures using postgresql would be a lot more convenient, and that 
postgresql would gain multiple benefits from having all its data structures in a 
single HDF5 file. OTOH, maybe us analytics types are better off with Python over HDF5 
and "postgresql on HDF5" is not a net win for postgresql. Still, there seems to a 
great advantage to having rich data structures to operate on rather than just "files", 
and allowing the HDF5 library to deal with portability, I/O efficiency, and clustering.

Hope my $0.02 worth was.

Cheers,
        Murthy

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to