[ https://issues.apache.org/jira/browse/CASSANDRA-8682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Yeschenko updated CASSANDRA-8682: ----------------------------------------- Component/s: Tools > BulkRecordWriter ends up streaming with non-unique session IDs on large > hadoop cluster > -------------------------------------------------------------------------------------- > > Key: CASSANDRA-8682 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8682 > Project: Cassandra > Issue Type: Bug > Components: Tools > Reporter: Erik Forsberg > Priority: Minor > Fix For: 3.x > > Attachments: cassandra-1.2-bulkrecordwriter-sessionid.patch > > > We use BulkOutputFormat extensively to load data from hadoop to Cassandra. We > are currently running Cassandra 1.2.18, but are planning an upgrade of > Cassandra to 2.0.X, possibly 2.1.X. > With Cassandra 1.2 we have problems with the streaming session IDs getting > duplicated when multiple (20+) java processes start to do streaming at the > same time. On the receiving cassandra node, having the same session ID > actually correspond to different sending processing would confuse things a > lot, leading to aborted connections. > This would not happen for every process, but often enough to be a problem in > production environment. So it was a bit tricky to test. > Suspecting this have to do with how UUIDs are generated on the sending > (hadoop side). With 20+ processes being started concurrently, the > clockSeqAndNode part of the uuid1 probably ended up being exactly the same on > all 20 processes. > I wrote a patch which I unfortunately never submitted at the time, but it's > attached to this issue. The patch constructs a UUID from the map or reduce > task ID, which is guaranteed to be unique per hadoop cluster. > I suspect we're going to face the same issue on Cassandra 2.0 and 2.1, even > after the rewrite of the streaming subsystem. Please correct me if I'm wrong, > i.e. if there's something in the new code that will make this a non-issue. > Now the question is how to address this problem. Possible options that I see > after some code reading: > 1. Update patch to apply on 2.0 and 2.1, using same method (generating UUID > from hadoop task ID) > 2. Modify UUIDGen code to use java process pid as clockSeq instead of random > number. However, getting the pid in java seems less than simple (and remember > that this is code that runs on the hadoop size of things, not inside > cassandra daemon) > 3. This patch might help: > {noformat} > diff --git a/src/java/org/apache/cassandra/utils/UUIDGen.java > b/src/java/org/apache/cassandra/utils/UUIDGen.java > index f385744..ae253ab 100644 > --- a/src/java/org/apache/cassandra/utils/UUIDGen.java > +++ b/src/java/org/apache/cassandra/utils/UUIDGen.java > @@ -234,7 +234,7 @@ public class UUIDGen > > private static long makeClockSeqAndNode() > { > - long clock = new Random(System.currentTimeMillis()).nextLong(); > + long clock = new Random().nextLong(); > > long lsb = 0; > lsb |= 0x8000000000000000L; // variant (2 bits) > {noformat} > ..but I don't know the reason System.currentTimeMillis() is being used. > Opinions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)