[ 
https://issues.apache.org/jira/browse/USERGRID-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996958#comment-13996958
 ] 

Todd Nine commented on USERGRID-107:
------------------------------------

This is close.  There is a race condition that needs resolved for edge write vs 
edge delete post processing.  

> Implement commit logging and sharding on graph edges
> ----------------------------------------------------
>
>                 Key: USERGRID-107
>                 URL: https://issues.apache.org/jira/browse/USERGRID-107
>             Project: Usergrid
>          Issue Type: Story
>            Reporter: Todd Nine
>            Assignee: Todd Nine
>            Priority: Blocker
>             Fix For: 1.1
>
>
> Currently, we're limited to 2b graph edges by type from a single source or 
> target edge.  In highly connected graphs, this makes it impossible to 
> construct the entire graph.  To alleviate this, we should use a commit log + 
> time series post processing.  I envision this working in the following way.
> # Have a commit log set of CF's for all edges.  The gc_grace period should be 
> set very low.  This should be around 1 minute.  Re-writing existing edges 
> (due to phantom deletes) will not be an issue, all algorithms should be 
> idempotent.
> # Always write to the commit log, Fire the async processing as usual, and 
> immediately return
> # In post processing use variable size time series (algorithm TBD) to copy 
> the edge from the commit log CF into the new correct CF.  Remove the entry 
> from the commit log.
> # When seeking values, seek values from the correct shards (via time) on disk 
> as well as the commit log.  These values can are already time ordered, so an 
> in memory merge can easily take place for a final result set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to