[jira] [Commented] (HAWQ-1270) Plugged storage back-ends for HAWQ

Alastair "Bell" Turner (JIRA) Fri, 27 Jan 2017 09:12:52 -0800

    [ 
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843135#comment-15843135
 ]


Alastair "Bell" Turner commented on HAWQ-1270:
----------------------------------------------

There are two levels that could be pluggable, physical storage and logical 
formats. Ceph fits into the physical storage category. The contents of the 
files that HAWQ will be reading and writing is exactly the same but the files 
will live in a store that is not HDFS.

There is a standard approach to pluggable physical storage - the Hadoop 
Compatible File System. A quick search returns a few HCFS adapters for Ceph but 
a bit more investigation would be required to check if they are complete.

The HCFS standard does not include a native interface, only a Java interface. 
To me it seems that the best route to achieve HAWQ on Ceph (and ViPR, Orange, 
Gluster, ...) is to work with the Hadoop community to get a native HCFS 
interface standardised and use that for access from HAWQ.

> Plugged storage back-ends for HAWQ
> ----------------------------------
>
>                 Key: HAWQ-1270
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1270
>             Project: Apache HAWQ
>          Issue Type: Improvement
>            Reporter: Dmitry Buzolin
>            Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
> would like to propose pluggable storage backend design for Hawq. Hadoop is 
> already supported but there is Ceph -  a distributed, storage system which 
> offers standard Posix compliant file system, object and a block storage. Ceph 
> is also data location aware, written in C++. and is more sophisticated 
> storage backend compare to Hadoop at this time. It provides replicated and 
> erasure encoded storage pools, Other great features of Ceph are: snapshots 
> and an algorithmic approach to map data to the nodes rather than having 
> centrally managed namenodes. I don't think HDFS offers any of these features. 
> In terms of performance, Ceph should be faster than HFDS since it is written 
> on C++ and because it doesn't have scalability limitations when mapping data 
> to storage pools, compare to Hadoop, where name node is such point of 
> contention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HAWQ-1270) Plugged storage back-ends for HAWQ

Reply via email to