[ 
https://issues.apache.org/jira/browse/CASSANDRA-8682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-8682:
-----------------------------------------
    Component/s: Tools

> BulkRecordWriter ends up streaming with non-unique session IDs on large 
> hadoop cluster
> --------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8682
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8682
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Erik Forsberg
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: cassandra-1.2-bulkrecordwriter-sessionid.patch
>
>
> We use BulkOutputFormat extensively to load data from hadoop to Cassandra. We 
> are currently running Cassandra 1.2.18, but are planning an upgrade of 
> Cassandra to 2.0.X, possibly 2.1.X.
> With Cassandra 1.2 we have problems with the streaming session IDs getting 
> duplicated when multiple (20+) java processes start to do streaming at the 
> same time. On the receiving cassandra node, having the same session ID 
> actually correspond to different sending processing would confuse things a 
> lot, leading to aborted connections. 
> This would not happen for every process, but often enough to be a problem in 
> production environment. So it was a bit tricky to test.
> Suspecting this have to do with how UUIDs are generated on the sending 
> (hadoop side). With 20+ processes being started concurrently, the 
> clockSeqAndNode part of the uuid1 probably ended up being exactly the same on 
> all 20 processes. 
> I wrote a patch which I unfortunately never submitted at the time, but it's 
> attached to this issue. The patch constructs a UUID from the map or reduce 
> task ID, which is guaranteed to be unique per hadoop cluster.
> I suspect we're going to face the same issue on Cassandra 2.0 and 2.1, even 
> after the rewrite of the streaming subsystem. Please correct me if I'm wrong, 
> i.e. if there's something in the new code that will make this a non-issue.
> Now the question is how to address this problem. Possible options that I see 
> after some code reading:
> 1. Update patch to apply on 2.0 and 2.1, using same method (generating UUID 
> from hadoop task ID)
> 2. Modify UUIDGen code to use java process pid as clockSeq instead of random 
> number. However, getting the pid in java seems less than simple (and remember 
> that this is code that runs on the hadoop size of things, not inside 
> cassandra daemon)
> 3. This patch might help:
> {noformat}
> diff --git a/src/java/org/apache/cassandra/utils/UUIDGen.java 
> b/src/java/org/apache/cassandra/utils/UUIDGen.java
> index f385744..ae253ab 100644
> --- a/src/java/org/apache/cassandra/utils/UUIDGen.java
> +++ b/src/java/org/apache/cassandra/utils/UUIDGen.java
> @@ -234,7 +234,7 @@ public class UUIDGen
>  
>      private static long makeClockSeqAndNode()
>      {
> -        long clock = new Random(System.currentTimeMillis()).nextLong();
> +        long clock = new Random().nextLong();
>  
>          long lsb = 0;
>          lsb |= 0x8000000000000000L;                 // variant (2 bits)
> {noformat}
> ..but I don't know the reason System.currentTimeMillis() is being used.
> Opinions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to