[jira] [Commented] (GEODE-10) HDFS Integration

ASF subversion and git services (JIRA) Thu, 16 Jul 2015 23:04:27 -0700

    [ 
https://issues.apache.org/jira/browse/GEODE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630822#comment-14630822
 ]


ASF subversion and git services commented on GEODE-10:
------------------------------------------------------

Commit 3772869d02148eec8b5ce97fbf1af9415bccd98c in incubator-geode's branch 
refs/heads/develop from Ashvin Agrawal
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=3772869 ]

GEODE-10: Refactor HdfsStore api to match spec

* Currently HdfsStore's configuration object is nested and a user needs to
  create multiple sub objects to manage the store instance. This is less usable
  and gets confusing at times. User also gets exposed to a lot of internal
  details. So replacing nested configuration with a flat structure will be
  better.
* Rename members


> HDFS Integration
> ----------------
>
>                 Key: GEODE-10
>                 URL: https://issues.apache.org/jira/browse/GEODE-10
>             Project: Geode
>          Issue Type: New Feature
>          Components: hdfs
>            Reporter: Dan Smith
>            Assignee: Ashvin
>         Attachments: GEODE-HDFSPersistence-Draft-060715-2109-21516.pdf
>
>
> Ability to persist data on HDFS had been under development for GemFire. It 
> was part of the latest code drop, GEODE-8. As part of this feature we are 
> proposing some changes to the HdfsStore management API (see attached doc for 
> details). 
> # The current API has nested configuration for compaction and async queue. 
> This nested structure forces user to execute multiple steps to manage a 
> store. It also does not seem to be consistent with other management APIs
> # Some member names in current API are confusing
> HDFS Integration: Geode as a transactional layer that microbatches data out 
> to Hadoop. This capability makes Geode a NoSQL store that can sit on top of 
> Hadoop and parallelize the process of moving data from the in memory tier 
> into Hadoop, making it very useful for capturing and processing fast data 
> while making it available for Hadoop jobs relatively quickly. The key 
> requirements being met here are
> # Ingest data into HDFS parallely
> # Cache bloom filters and allow fast lookups of individual elements
> # Have programmable policies for deciding what stays in memory
> # Roll files in HDFS
> # Index data that is in memory
> # Have expiration policies that allows the transactional set to decay out 
> older data
> # Solution needs to support replicated and partitioned regions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (GEODE-10) HDFS Integration

Reply via email to