Re: [HACKERS] Hadoop backend?
Paul Sheer wrote: Hadoop backend for PostGreSQL Resurrecting an old thread, it seems some guys at Yale implemented something very similar to what this thread was discussing. http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html It's an open source stack that includes PostgreSQL Hadoop, and Hive, along with some glue between PostgreSQL and Hadoop, a catalog, a data loader, and an interface that accepts queries in MapReduce or SQL and generates query plans that are processed partly in Hadoop and partly in different PostgreSQL instances spread across many nodes in a shared-nothing cluster of machines. Their detailed paper is here: http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf According to the paper, it scales very well. A problem that my client has, and one that I come across often, is that a database seems to always be associated with a particular physical machine, a physical machine that has to be upgraded, replaced, or otherwise maintained. Even if the database is replicated, it just means there are two or more machines. Replication is also a difficult thing to properly manage. With a distributed data store, the data would become a logical object - no adding or removal of machines would affect the data. This is an ideal that would remove a tremendous maintenance burden from many sites well, at least the one's I have worked at as far as I can see. Does anyone know of plans to implement PostGreSQL over Hadoop? Yahoo seems to be doing this: http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html But they store tables column-ways for their performance situation. If one is doing a lot of inserts I don't think this is most efficient - ? Has Yahoo put the source code for their work online? Many thanks for any pointers. -paul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
As far as I can tell, the PG storage manager API is at the wrong level of abstraction for pretty much everything. These days, everything we do is atop the Unix filesystem API, and anything that smgr might have been Is there a complete list of filesystem API calls somewhere that I can get my head around? At least this will give me an idea of the scope of the effort. should be easier to build postgres data feeder in Hadoop (which might even already exist). do you know of a url? -paul
Re: [HACKERS] Hadoop backend?
why not just stream it in via set-returning functions and make sure that we can mark a set returning function as STREAMABLE or so (to prevent joins, whatever). is it the easiest way to get it right and it helps in many other cases. i think that the storage manager is definitely the wrong place to do this. it is also easy to use more than just one backend then if you get the interface code right. regards, hans On Feb 24, 2009, at 12:03 AM, Jonah H. Harris wrote: On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas robertmh...@gmail.com wrote: In theory, I think you could make postgres work on any type of underlying storage you like by writing a second smgr implementation that would exist alongside md.c. The fly in the ointment is that you'd need a more sophisticated implementation of this line of code, from smgropen: reln-smgr_which = 0; /* we only have md.c at present */ I believe there is more than that which would need to be done nowadays. I seem to recall that the storage manager abstraction has slowly been dedicated/optimized for md over the past 6 years or so. It may even be easier/preferred to write a hadoop specific access method depending on what you're looking for from hadoop. -- Jonah H. Harris, Senior DBA myYearbook.com -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: www.postgresql-support.de
Re: [HACKERS] Hadoop backend?
Tom Lane wrote: It's interesting to speculate about where we could draw an abstraction boundary that would be more useful. I don't think the MySQL guys got it right either... The supposed smgr abstraction of PostgreSQL, which tells more or less how to get a byte to the disk, is quite far away from what MySQL calls a storage engine, which has things like open table, scan table, drop table on a logical level (meaning open table, not open heap). To my judgement, neither of these approaches is terribly useful from a current, practical point of view. In any case, in order to solve the where to abstract question, you'd probably want to have one or two other storage APIs that you seriously want to integrate, and then you can analyze how to unify them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
With a distributed data store, the data would become a logical object - no adding or removal of machines would affect the data. This is an ideal that would remove a tremendous maintenance burden from many sites well, at least the one's I have worked at as far as I can see. Two things: 1) Hadoop is the wrong technology. It's not designed to support transactional operations. 2) Transactional operations are, in general, your Big Obstacle for doing anything in the way of a distributed storage manager. It's possible you could make both of the above go away if you were planning for a DW platform in which transactions weren't important. However, that would have to become an incompatible fork of PostgreSQL. AFAIK, the Yahoo platform does not involve Hadoop at all. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
It would only be possible to have the actual PostgreSQL backends running on a single node anyway, because they use shared memory to This is not problem: Performance is a secondary consideration (at least as far as the problem I was referring to). The primary usefulness is to have the data be a logical entity rather than a physical entity so that one can maintain physical machines without having to worry to much about where-is-the-data. At the moment, most databases suffer from the problem of occasionally having to move the data from one place to another. This is a major nightmare that happens once every few years for most DBAs. It happens because a system needs a soft/hard upgrade, or a disk enlarged, or because a piece of hardware fails. I have also found it's no use having RAID or ZFS. Each of these ties the data to an OS installation. If the OS needs to be reinstalled, all the data has to be manually moved in a way that is, well... dangerous. If there is only one machine running postgres that is fine too: I can have a second identical machine on standby in case of a hardware failure. That means a short amount of downtime - most people can live with that. I read somewhere that replication was one of the goals of postgres's coming development efforts. Personally I think hadoop might be a better solution - *shrug*. Thoughts/comments ?? -paul
Re: [HACKERS] Hadoop backend?
On Mon, Feb 23, 2009 at 9:08 AM, Paul Sheer paulsh...@gmail.com wrote: It would only be possible to have the actual PostgreSQL backends running on a single node anyway, because they use shared memory to This is not problem: Performance is a secondary consideration (at least as far as the problem I was referring to). The primary usefulness is to have the data be a logical entity rather than a physical entity so that one can maintain physical machines without having to worry to much about where-is-the-data. At the moment, most databases suffer from the problem of occasionally having to move the data from one place to another. This is a major nightmare that happens once every few years for most DBAs. It happens because a system needs a soft/hard upgrade, or a disk enlarged, or because a piece of hardware fails. I have also found it's no use having RAID or ZFS. Each of these ties the data to an OS installation. If the OS needs to be reinstalled, all the data has to be manually moved in a way that is, well... dangerous. If there is only one machine running postgres that is fine too: I can have a second identical machine on standby in case of a hardware failure. That means a short amount of downtime - most people can live with that. I think the performance aspect of things has to be considered. If the system introduces too much overhead it won't be useful in practice. But apart from that I agree with all of this. I read somewhere that replication was one of the goals of postgres's coming development efforts. Personally I think hadoop might be a better solution - *shrug*. Thoughts/comments ?? I think the two are solving different problems. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
Paul Sheer wrote I have also found it's no use having RAID or ZFS. Each of these ties the data to an OS installation. If the OS needs to be reinstalled, all the data has to be manually moved in a way that is, well... dangerous. How about network storage, fiber attach? If you move the db you only need to redirect the LUN(s) to a new WWN. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
Hi, Paul Sheer wrote: This is not problem: Performance is a secondary consideration (at least as far as the problem I was referring to). Well, if you don't mind your database running .. ehm.. creeping several orders of magnitudes slower, you might also be interested in Single-System Image Clustering Systems [1], like Beowulf, Kerrighed [2], OpenSSI [3], etc.. Besides distributed filesystems, those also provide transparent shared memory across nodes. The primary usefulness is to have the data be a logical entity rather than a physical entity so that one can maintain physical machines without having to worry to much about where-is-the-data. There are lots of solutions offering that already. In what way should Hadoop be better than any of those existing ones? For a slightly different example, you can get equivalent functionality on the block devices layer with DRBD [4], which is in successful use for Postgres as well. The main challenge with distributed filesystems remains reliable failure detection and ensuring that only exactly one node is alive at any time. At the moment, most databases suffer from the problem of occasionally having to move the data from one place to another. This is a major nightmare that happens once every few years for most DBAs. It happens because a system needs a soft/hard upgrade, or a disk enlarged, or because a piece of hardware fails. You are comparing to standalone nodes here, which doesn't make much sense, IMO. I have also found it's no use having RAID or ZFS. Each of these ties the data to an OS installation. If the OS needs to be reinstalled, all the data has to be manually moved in a way that is, well... dangerous. I'm thinking more of Lustre, GFS, OCFS, AFS or some such. Compare with those! If there is only one machine running postgres that is fine too: I can have a second identical machine on standby in case of a hardware failure. That means a short amount of downtime - most people can live with that. What most people have trouble with is a master that revives and suddenly confuses the new master (old slave). I read somewhere that replication was one of the goals of postgres's coming development efforts. Personally I think hadoop might be a better solution - *shrug*. I'm not convinced at all. The trouble is not the replication of the data on disk, it's rather the data in shared memory which poses the hard problems (locks, caches, etc..). The former is solved already, the later is a tad harder to solve. See [5] for my approach (showing my bias). What I do find interesting about Hadoop is the MapReduce approach, but lots more than writing another storage backend is required, if you want to make use of that for Postgres. Greenplum claims to have implemented MapReduce for their Database [6], however, to me it looks like that is working a couple of layers above the filesystem. Regards Markus Wanner [1]: Wikipedia: Single-System Image Clustering http://en.wikipedia.org/wiki/Single-system_image [2]: http://www.kerrighed.org/ [3]: http://www.openssi.org/ [4]: http://www.drbd.org/ [5]: Postgres-R: http://www.postgres-r.org/ [6]: Greenplum MapReduce http://www.greenplum.com/resources/mapreduce/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas robertmh...@gmail.com wrote: In theory, I think you could make postgres work on any type of underlying storage you like by writing a second smgr implementation that would exist alongside md.c. The fly in the ointment is that you'd need a more sophisticated implementation of this line of code, from smgropen: reln-smgr_which = 0; /* we only have md.c at present */ I believe there is more than that which would need to be done nowadays. I seem to recall that the storage manager abstraction has slowly been dedicated/optimized for md over the past 6 years or so. It may even be easier/preferred to write a hadoop specific access method depending on what you're looking for from hadoop. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Hadoop backend?
Jonah H. Harris jonah.har...@gmail.com writes: I believe there is more than that which would need to be done nowadays. I seem to recall that the storage manager abstraction has slowly been dedicated/optimized for md over the past 6 years or so. As far as I can tell, the PG storage manager API is at the wrong level of abstraction for pretty much everything. These days, everything we do is atop the Unix filesystem API, and anything that smgr might have been able to do for us is getting handled in kernel filesystem code or device drivers. (Back in the eighties, when it was more plausible for PG to do direct device access, maybe smgr was good for something; but no more.) It's interesting to speculate about where we could draw an abstraction boundary that would be more useful. I don't think the MySQL guys got it right either... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
| I believe there is more than that which would need to be done nowadays. I seem to recall that the storage manager| | abstraction has slowly been dedicated/optimized for md over the past 6 years or so. It may even be easier/preferred | to write a hadoop specific access method depending on what you're looking for from hadoop. I think you're very right. What Postgres needs is access method abstraction. One should be able to plug in access method for SSD or network file systems if appropriate. I don't talk about MapReduce bit in Hadoop because I think that's a different story. What you need for MapReduce are 1) data store which feeds you data and then 2) MapReduce does the query processing. This has nothing to share with Postgres query processor in common. If you just want data from Postgres then it should be easier to build postgres data feeder in Hadoop (which might even already exist). Pi Song On Tue, Feb 24, 2009 at 11:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jonah H. Harris jonah.har...@gmail.com writes: I believe there is more than that which would need to be done nowadays. I seem to recall that the storage manager abstraction has slowly been dedicated/optimized for md over the past 6 years or so. As far as I can tell, the PG storage manager API is at the wrong level of abstraction for pretty much everything. These days, everything we do is atop the Unix filesystem API, and anything that smgr might have been able to do for us is getting handled in kernel filesystem code or device drivers. (Back in the eighties, when it was more plausible for PG to do direct device access, maybe smgr was good for something; but no more.) It's interesting to speculate about where we could draw an abstraction boundary that would be more useful. I don't think the MySQL guys got it right either... regards, tom lane
Re: [HACKERS] Hadoop backend?
hi ... i think the easiest way to do this is to simply add a mechanism to functions which allows a function to stream data through. it would basically mean losing join support as you cannot read data again in a way which is good enough good enough for joining with the function providing the data from hadoop. hannu ( I think) brought up some concept as well some time ago. i think a straight forward implementation would not be too hard. best regards, hans On Feb 22, 2009, at 3:37 AM, pi song wrote: 1) Hadoop file system is very optimized for mostly read operation 2) As of a few months ago, hdfs doesn't support file appending. There might be a bit of impedance to make them go together. However, I think it should a very good initiative to come up with ideas to be able to run postgres on distributed file system (doesn't have to be specific hadoop). Pi Song On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer paulsh...@gmail.com wrote: Hadoop backend for PostGreSQL A problem that my client has, and one that I come across often, is that a database seems to always be associated with a particular physical machine, a physical machine that has to be upgraded, replaced, or otherwise maintained. Even if the database is replicated, it just means there are two or more machines. Replication is also a difficult thing to properly manage. With a distributed data store, the data would become a logical object - no adding or removal of machines would affect the data. This is an ideal that would remove a tremendous maintenance burden from many sites well, at least the one's I have worked at as far as I can see. Does anyone know of plans to implement PostGreSQL over Hadoop? Yahoo seems to be doing this: http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html But they store tables column-ways for their performance situation. If one is doing a lot of inserts I don't think this is most efficient - ? Has Yahoo put the source code for their work online? Many thanks for any pointers. -paul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: www.postgresql-support.de
Re: [HACKERS] Hadoop backend?
On Sat, Feb 21, 2009 at 9:37 PM, pi song pi.so...@gmail.com wrote: 1) Hadoop file system is very optimized for mostly read operation 2) As of a few months ago, hdfs doesn't support file appending. There might be a bit of impedance to make them go together. However, I think it should a very good initiative to come up with ideas to be able to run postgres on distributed file system (doesn't have to be specific hadoop). In theory, I think you could make postgres work on any type of underlying storage you like by writing a second smgr implementation that would exist alongside md.c. The fly in the ointment is that you'd need a more sophisticated implementation of this line of code, from smgropen: reln-smgr_which = 0; /* we only have md.c at present */ Logically, it seems like the choice of smgr should track with the notion of a tablespace. IOW, you might to have one tablespace that is stored on a magnetic disk (md.c) and another that is stored on your hypothetical distributed filesystem (hypodfs.c). I'm not sure how hard this would be to implement, but I don't think smgropen() is in a position to do syscache lookups, so probably not that easy. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
One more problem is that data placement on HDFS is inherent, meaning you have no explicit control. Thus, you cannot place two sets of data which are likely to be joined together on the same node = uncontrollable latency during query processing. Pi Song On Mon, Feb 23, 2009 at 7:47 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Feb 21, 2009 at 9:37 PM, pi song pi.so...@gmail.com wrote: 1) Hadoop file system is very optimized for mostly read operation 2) As of a few months ago, hdfs doesn't support file appending. There might be a bit of impedance to make them go together. However, I think it should a very good initiative to come up with ideas to be able to run postgres on distributed file system (doesn't have to be specific hadoop). In theory, I think you could make postgres work on any type of underlying storage you like by writing a second smgr implementation that would exist alongside md.c. The fly in the ointment is that you'd need a more sophisticated implementation of this line of code, from smgropen: reln-smgr_which = 0; /* we only have md.c at present */ Logically, it seems like the choice of smgr should track with the notion of a tablespace. IOW, you might to have one tablespace that is stored on a magnetic disk (md.c) and another that is stored on your hypothetical distributed filesystem (hypodfs.c). I'm not sure how hard this would be to implement, but I don't think smgropen() is in a position to do syscache lookups, so probably not that easy. ...Robert
Re: [HACKERS] Hadoop backend?
On Sun, Feb 22, 2009 at 5:18 PM, pi song pi.so...@gmail.com wrote: One more problem is that data placement on HDFS is inherent, meaning you have no explicit control. Thus, you cannot place two sets of data which are likely to be joined together on the same node = uncontrollable latency during query processing. Pi Song It would only be possible to have the actual PostgreSQL backends running on a single node anyway, because they use shared memory to hold lock tables and things. The advantage of a distributed file system would be that you could access more storage (and more system buffer cache) than would be possible on a single system (or perhaps the same amount but at less cost). Assuming some sort of per-tablespace control over the storage manager, you could put your most frequently accessed data locally and the less frequently accessed data into the DFS. But you'd still have to pull all the data back to the master node to do anything with it. Being able to actually distribute the computation would be a much harder problem. Currently, we don't even have the ability to bring multiple CPUs to bear on (for example) a large sequential scan (even though all the data is on a single node). ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
On Mon, Feb 23, 2009 at 3:56 PM, pi song pi.so...@gmail.com wrote: I think the point that you can access more system cache is right but that doesn't mean it will be more efficient than accessing from your local disk. Take Hadoop for example, your request for file content will have to go to Namenode (file chunk indexing service) and then you go ask the data node which then provides you data. Assuming that you're working on a large dataset, the probability of the data chunk you need staying in system cache is very low therefore most of the time you end up reading from a remote disk. I've got a better idea. How about we make the buffer pool multilevel? The first level is the current one. The second level represents memory from remote machines. Things that are used less often should stay on the second level. Has anyone ever thought about something like this before? Pi Song On Mon, Feb 23, 2009 at 1:09 PM, Robert Haas robertmh...@gmail.comwrote: On Sun, Feb 22, 2009 at 5:18 PM, pi song pi.so...@gmail.com wrote: One more problem is that data placement on HDFS is inherent, meaning you have no explicit control. Thus, you cannot place two sets of data which are likely to be joined together on the same node = uncontrollable latency during query processing. Pi Song It would only be possible to have the actual PostgreSQL backends running on a single node anyway, because they use shared memory to hold lock tables and things. The advantage of a distributed file system would be that you could access more storage (and more system buffer cache) than would be possible on a single system (or perhaps the same amount but at less cost). Assuming some sort of per-tablespace control over the storage manager, you could put your most frequently accessed data locally and the less frequently accessed data into the DFS. But you'd still have to pull all the data back to the master node to do anything with it. Being able to actually distribute the computation would be a much harder problem. Currently, we don't even have the ability to bring multiple CPUs to bear on (for example) a large sequential scan (even though all the data is on a single node). ...Robert
[HACKERS] Hadoop backend?
Hadoop backend for PostGreSQL A problem that my client has, and one that I come across often, is that a database seems to always be associated with a particular physical machine, a physical machine that has to be upgraded, replaced, or otherwise maintained. Even if the database is replicated, it just means there are two or more machines. Replication is also a difficult thing to properly manage. With a distributed data store, the data would become a logical object - no adding or removal of machines would affect the data. This is an ideal that would remove a tremendous maintenance burden from many sites well, at least the one's I have worked at as far as I can see. Does anyone know of plans to implement PostGreSQL over Hadoop? Yahoo seems to be doing this: http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html But they store tables column-ways for their performance situation. If one is doing a lot of inserts I don't think this is most efficient - ? Has Yahoo put the source code for their work online? Many thanks for any pointers. -paul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hadoop backend?
1) Hadoop file system is very optimized for mostly read operation2) As of a few months ago, hdfs doesn't support file appending. There might be a bit of impedance to make them go together. However, I think it should a very good initiative to come up with ideas to be able to run postgres on distributed file system (doesn't have to be specific hadoop). Pi Song On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer paulsh...@gmail.com wrote: Hadoop backend for PostGreSQL A problem that my client has, and one that I come across often, is that a database seems to always be associated with a particular physical machine, a physical machine that has to be upgraded, replaced, or otherwise maintained. Even if the database is replicated, it just means there are two or more machines. Replication is also a difficult thing to properly manage. With a distributed data store, the data would become a logical object - no adding or removal of machines would affect the data. This is an ideal that would remove a tremendous maintenance burden from many sites well, at least the one's I have worked at as far as I can see. Does anyone know of plans to implement PostGreSQL over Hadoop? Yahoo seems to be doing this: http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html But they store tables column-ways for their performance situation. If one is doing a lot of inserts I don't think this is most efficient - ? Has Yahoo put the source code for their work online? Many thanks for any pointers. -paul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers