[ https://issues.apache.org/jira/browse/GEODE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630822#comment-14630822 ]
ASF subversion and git services commented on GEODE-10: ------------------------------------------------------ Commit 3772869d02148eec8b5ce97fbf1af9415bccd98c in incubator-geode's branch refs/heads/develop from Ashvin Agrawal [ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=3772869 ] GEODE-10: Refactor HdfsStore api to match spec * Currently HdfsStore's configuration object is nested and a user needs to create multiple sub objects to manage the store instance. This is less usable and gets confusing at times. User also gets exposed to a lot of internal details. So replacing nested configuration with a flat structure will be better. * Rename members > HDFS Integration > ---------------- > > Key: GEODE-10 > URL: https://issues.apache.org/jira/browse/GEODE-10 > Project: Geode > Issue Type: New Feature > Components: hdfs > Reporter: Dan Smith > Assignee: Ashvin > Attachments: GEODE-HDFSPersistence-Draft-060715-2109-21516.pdf > > > Ability to persist data on HDFS had been under development for GemFire. It > was part of the latest code drop, GEODE-8. As part of this feature we are > proposing some changes to the HdfsStore management API (see attached doc for > details). > # The current API has nested configuration for compaction and async queue. > This nested structure forces user to execute multiple steps to manage a > store. It also does not seem to be consistent with other management APIs > # Some member names in current API are confusing > HDFS Integration: Geode as a transactional layer that microbatches data out > to Hadoop. This capability makes Geode a NoSQL store that can sit on top of > Hadoop and parallelize the process of moving data from the in memory tier > into Hadoop, making it very useful for capturing and processing fast data > while making it available for Hadoop jobs relatively quickly. The key > requirements being met here are > # Ingest data into HDFS parallely > # Cache bloom filters and allow fast lookups of individual elements > # Have programmable policies for deciding what stays in memory > # Roll files in HDFS > # Index data that is in memory > # Have expiration policies that allows the transactional set to decay out > older data > # Solution needs to support replicated and partitioned regions -- This message was sent by Atlassian JIRA (v6.3.4#6332)