[ https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901627#comment-15901627 ]
Kyle R Dunn commented on HAWQ-1270: ----------------------------------- >From what I can tell, [this | >https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c] > IS the interface. When you look at the {{pg_filesystems}} table, it lists the exact functions requires for a new backend: {code} SELECT * from pg_filesystem ; -[ RECORD 1 ]------+-------------------------- fsysname | hdfs fsysconnfn | gpfs_hdfs_connect fsysdisconnfn | gpfs_hdfs_disconnect fsysopenfn | gpfs_hdfs_openfile fsysclosefn | gpfs_hdfs_closefile fsysseekfn | gpfs_hdfs_seek fsystellfn | gpfs_hdfs_tell fsysreadfn | gpfs_hdfs_read fsyswritefn | gpfs_hdfs_write fsysflushfn | gpfs_hdfs_sync fsysdeletefn | gpfs_hdfs_delete fsyschmodfn | gpfs_hdfs_chmod fsysmkdirfn | gpfs_hdfs_createdirectory fsystruncatefn | gpfs_hdfs_truncate fsysgetpathinfofn | gpfs_hdfs_getpathinfo fsysfreefileinfofn | gpfs_hdfs_freefileinfo fsyslibfile | $libdir/gpfshdfs.so fsysowner | 10 fsystrusted | f fsysacl | {code} > Plugged storage back-ends for HAWQ > ---------------------------------- > > Key: HAWQ-1270 > URL: https://issues.apache.org/jira/browse/HAWQ-1270 > Project: Apache HAWQ > Issue Type: Improvement > Reporter: Dmitry Buzolin > Assignee: Ed Espino > > Since HAWQ only depends on Hadoop and Parquet for columnar format support, I > would like to propose pluggable storage backend design for Hawq. Hadoop is > already supported but there is Ceph - a distributed, storage system which > offers standard Posix compliant file system, object and a block storage. Ceph > is also data location aware, written in C++. and is more sophisticated > storage backend compare to Hadoop at this time. It provides replicated and > erasure encoded storage pools, Other great features of Ceph are: snapshots > and an algorithmic approach to map data to the nodes rather than having > centrally managed namenodes. I don't think HDFS offers any of these features. > In terms of performance, Ceph should be faster than HFDS since it is written > on C++ and because it doesn't have scalability limitations when mapping data > to storage pools, compare to Hadoop, where name node is such point of > contention. -- This message was sent by Atlassian JIRA (v6.3.15#6346)