What is your actual write load?
How big was your batch size? Currently for 2.1 1000 elements is sensible. It
will change back to 30-50k for Neo4j 2.1
#0 use parameters
> MERGE (user:User { name:{user_name} })', 'MERGE (tweet:Tweet {
> tweet_id:{tweet_id} })
#1 can you share your server config / memory / disk etc? (best to share your
data/graph.db/messages.log)
#2 Make sure your driver uses the new transactional endpoint and streams data
back and forth
Usually you can insert 5-10k nodes per second in 2.0 with MERGE and parameters
in batched tx (1k tx-size)
Am 01.02.2014 um 17:51 schrieb Yun Wang <[email protected]>:
> Question background
> We are building a graph (database) for twitter users and tweets (batched
> updates for new data).
> We store as graph nodes: each-user, each-tweet
> We store as graph edges: tweet-tweet relationships, and user-user
> relationships (derived, based on users who retweet or reply to others).
>
> Problem: Updating the graph is very slow / not scalable
>
> Goal: Scalable / efficient update of the existing Neo4J graph as new tweets
> come in (tweets translate to: nodes, edges). Constraint: If a node (e.g.,
> user) already exists, we do not want to duplicate it. Similarly, if an edge
> (user-user relationship) exists, we only want to update the edge weight.
>
> What we have tried:
> Option 1: We tried using Cypher's 'MERGE' function to uniquely insert. We
> also executed Cypher queries in a batch in order to reduce REST latency.
>
> Sample Cypher query used to update database:
> 'MERGE (user:User { name:'tom' })', 'MERGE (tweet:Tweet {
> tweet_id:'101' })'
>
> We created an index on node attributes like 'name' of User node and
> 'tweet_id' of Tweet node.
> We increased the 'open file descriptors' parameter to gain better performance
> in Linux.
>
> Problems with Option 1:
> Performance of checking uniqueness using 'MERGE' function dropped
> dramatically with scale / over time. For example, it took 2.7 second to
> insert 100 records when the database was empty. However, it took 62 seconds
> to insert the same amount of data with 100,000 existing records.
>
> Option 2: The other option we have tried is to check uniqueness externally.
> That is, take all nodes and edges and create a hash table outside Neo4J
> (e.g., in Python or Java) to check uniqueness. This is faster than the
> earlier 'MERGE' function over time. However, it does not seem elegant to have
> to extract existing nodes before each batch update. It requires a read +
> write from the Neo4J database, instead of only a write.
>
> We are wondering if there is an elegant solution for large data updating in
> Neo4j. We feel this may be a common question for many users, and someone may
> have previously encountered this and/or developed a robust solution.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.