Paul Sheer wrote: > Hadoop backend for PostGreSQL.... Resurrecting an old thread, it seems some guys at Yale implemented something very similar to what this thread was discussing.
http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html > > > >It's an open source stack that includes PostgreSQL Hadoop, and Hive, along > >with some glue between PostgreSQL and Hadoop, a catalog, a data loader, and > >an interface that accepts queries in MapReduce or SQL and generates query > >plans that are processed partly in Hadoop and partly in different PostgreSQL > >instances spread across many nodes in a shared-nothing cluster of machines. Their detailed paper is here: http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf According to the paper, it scales very well. > A problem that my client has, and one that I come across often, > is that a database seems to always be associated with a particular > physical machine, a physical machine that has to be upgraded, > replaced, or otherwise maintained. > > Even if the database is replicated, it just means there are two or > more machines. Replication is also a difficult thing to properly > manage. > > With a distributed data store, the data would become a logical > object - no adding or removal of machines would affect the data. > This is an ideal that would remove a tremendous maintenance > burden from many sites ---- well, at least the one's I have worked > at as far as I can see. > > Does anyone know of plans to implement PostGreSQL over Hadoop? > > Yahoo seems to be doing this: > > http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html > > But they store tables column-ways for their performance situation. > If one is doing a lot of inserts I don't think this is most efficient - ? > > Has Yahoo put the source code for their work online? > > Many thanks for any pointers. > > -paul > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers