[ https://issues.apache.org/jira/browse/USERGRID-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996958#comment-13996958 ]
Todd Nine commented on USERGRID-107: ------------------------------------ This is close. There is a race condition that needs resolved for edge write vs edge delete post processing. > Implement commit logging and sharding on graph edges > ---------------------------------------------------- > > Key: USERGRID-107 > URL: https://issues.apache.org/jira/browse/USERGRID-107 > Project: Usergrid > Issue Type: Story > Reporter: Todd Nine > Assignee: Todd Nine > Priority: Blocker > Fix For: 1.1 > > > Currently, we're limited to 2b graph edges by type from a single source or > target edge. In highly connected graphs, this makes it impossible to > construct the entire graph. To alleviate this, we should use a commit log + > time series post processing. I envision this working in the following way. > # Have a commit log set of CF's for all edges. The gc_grace period should be > set very low. This should be around 1 minute. Re-writing existing edges > (due to phantom deletes) will not be an issue, all algorithms should be > idempotent. > # Always write to the commit log, Fire the async processing as usual, and > immediately return > # In post processing use variable size time series (algorithm TBD) to copy > the edge from the commit log CF into the new correct CF. Remove the entry > from the commit log. > # When seeking values, seek values from the correct shards (via time) on disk > as well as the commit log. These values can are already time ordered, so an > in memory merge can easily take place for a final result set. -- This message was sent by Atlassian JIRA (v6.2#6252)