[jira] [Created] (CASSANDRA-4223) Non Unique Streaming session ID's

2012-05-06 Thread Aaron Morton (JIRA)
Aaron Morton created CASSANDRA-4223:
---

 Summary: Non Unique Streaming session ID's
 Key: CASSANDRA-4223
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4223
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: Ubuntu 10.04.2 LTS

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

"Bare metal" servers from https://www.stormondemand.com/servers/baremetal.html 
The servers run on a custom hypervisor.
 
Reporter: Aaron Morton
Assignee: Aaron Morton


I have observed repair processes failing due to duplicate Streaming session 
ID's. In this installation it is preventing rebalance from completing. I 
believe it has also prevented repair from completing in the past. 

The attached streaming-logs.txt file contains log messages and an explanation 
of what was happening during a repair operation. it has the evidence for 
duplicate session ID's.

The duplicate session id's were generated on the repairing node and sent to the 
streaming node. The streaming source replaced the first session with the second 
which resulted in both sessions failing when the first FILE_COMPLETE message 
was received. 

The errors were:

{code:java}
DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line 
47) Received StreamReply StreamReply(sessionId=26132848816442266, 
file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', 
action=FILE_FINISHED)
ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java (line 
139) Fatal exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is 
/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null
at 
org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
at 
org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
{code}

and

{code:java}
DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line 
47) Received StreamReply StreamReply(sessionId=26132848816442266, 
file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', 
action=FILE_FINISHED)
ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java (line 
139) Fatal exception in thread Thread[MiscStage:2,5,main]
java.lang.IllegalStateException: target reports current file is 
/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null
at 
org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
at 
org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
{code}


I think this is because System.nanoTime() is used for the session ID when 
creating the StreamInSession objects (driven from 
StorageService.requestRanges()) . 

>From the documentation 
>(http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) 

{quote}
This method provides nanosecond precision, but not necessarily nanosecond 
accuracy. No guarantees are made about how frequently values change. 
{quote}

Also some info here on clocks and timers 
https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks

The hypervisor may be at fault here. But it seems like we cannot rely on 
successive calls to nanoTime() to return different values. 

To avoid message/interface changes on the StreamHeader it would be good to keep 
the session ID a long. The simplest approach may be to make successive calls to 
nanoTime until the result changes. We could fail if a certain number of 
milliseconds have passed. 

Hashing the file names and ranges is also a possibility, but more involved. 

(We may also want to drop latency times that are 0 nano seconds.)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4223) Non Unique Streaming session ID's

2012-05-06 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-4223:


Attachment: fmm streaming bug.txt

> Non Unique Streaming session ID's
> -
>
> Key: CASSANDRA-4223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4223
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.9
> Environment: Ubuntu 10.04.2 LTS
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> "Bare metal" servers from 
> https://www.stormondemand.com/servers/baremetal.html 
> The servers run on a custom hypervisor.
>  
>Reporter: Aaron Morton
>Assignee: Aaron Morton
> Attachments: fmm streaming bug.txt
>
>
> I have observed repair processes failing due to duplicate Streaming session 
> ID's. In this installation it is preventing rebalance from completing. I 
> believe it has also prevented repair from completing in the past. 
> The attached streaming-logs.txt file contains log messages and an explanation 
> of what was happening during a repair operation. it has the evidence for 
> duplicate session ID's.
> The duplicate session id's were generated on the repairing node and sent to 
> the streaming node. The streaming source replaced the first session with the 
> second which resulted in both sessions failing when the first FILE_COMPLETE 
> message was received. 
> The errors were:
> {code:java}
> DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line 
> 47) Received StreamReply StreamReply(sessionId=26132848816442266, 
> file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', 
> action=FILE_FINISHED)
> ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java 
> (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is 
> /var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null
> at 
> org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
> at 
> org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {code}
> and
> {code:java}
> DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line 
> 47) Received StreamReply StreamReply(sessionId=26132848816442266, 
> file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', 
> action=FILE_FINISHED)
> ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java 
> (line 139) Fatal exception in thread Thread[MiscStage:2,5,main]
> java.lang.IllegalStateException: target reports current file is 
> /var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null
> at 
> org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
> at 
> org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {code}
> I think this is because System.nanoTime() is used for the session ID when 
> creating the StreamInSession objects (driven from 
> StorageService.requestRanges()) . 
> From the documentation 
> (http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) 
> {quote}
> This method provides nanosecond precision, but not necessarily nanosecond 
> accuracy. No guarantees are made about how frequently values change. 
> {quote}
> Also some info here on clocks and timers 
> https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks
> The hypervisor may be at fault here. But it seems like we cannot rely on 
> successive calls to nanoTime() to return different values. 
> To avoid message/interface changes on the StreamHeader it would be good to 
> keep the session ID a long. The simplest approach may be to make successive 
> calls to nanoTime until the result changes. We could fail if a certain number 
> of milliseconds have passed. 
> Hashing the file names and ranges is also a possibility, but more involved. 
> (We may also want to drop latency times that are 0 nano seconds.)

--
This message is automatically generat

[jira] [Commented] (CASSANDRA-4196) While loading data using BulkOutPutFormat gettting an exception "java.lang.ClassCastException: org.apache.cassandra.utils.Murmur3BloomFilter cannot be cast to org.a

2012-05-06 Thread Samarth Gahire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269381#comment-13269381
 ] 

Samarth Gahire commented on CASSANDRA-4196:
---

So is this issue fixed for cassandra-1.1 rc1 ? Do I need to apply a patch to 
resolve this? or will it be fixed only for cassandra-0.2?

> While loading data using BulkOutPutFormat gettting an exception 
> "java.lang.ClassCastException: org.apache.cassandra.utils.Murmur3BloomFilter 
> cannot be cast to org.apache.cassandra.utils.Murmur2BloomFilter"
> -
>
> Key: CASSANDRA-4196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4196
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop, Tools
>Affects Versions: 1.2
>Reporter: Samarth Gahire
>Assignee: Dave Brosius
>Priority: Minor
>  Labels: bulkloader, cassandra, hadoop, hash
> Fix For: 1.2
>
> Attachments: 4196_create_correct_bf_type.diff
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> We are using cassandra-1.1 rc1 for production setup and getting following 
> error while bulkloading data using BulkOutPutFormat.
> {code}
> WARN 09:04:52,384 Failed closing 
> IndexWriter(/cassandra/production/Data_daily/production-Data_daily-tmp-hc-2692)
> java.lang.ClassCastException: org.apache.cassandra.utils.Murmur3BloomFilter 
> cannot be cast to org.apache.cassandra.utils.Murmur2BloomFilter
> at 
> org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:50)
> at 
> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:410)
> at 
> org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:94)
> at 
> org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:255)
> at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:154)
> at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:92)
> at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:178)
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
>  WARN 09:04:52,393 Failed closing 
> IndexWriter(/cassandra/production/Data_daily/production-Data_daily-tmp-hc-2693)
> java.lang.ClassCastException: org.apache.cassandra.utils.Murmur3BloomFilter 
> cannot be cast to org.apache.cassandra.utils.Murmur2BloomFilter
> at 
> org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:50)
> at 
> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:410)
> at 
> org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:94)
> at 
> org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:255)
> at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:154)
> at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:92)
> at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:178)
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
>  WARN 09:04:52,544 Failed closing 
> IndexWriter(/cassandra/production/Data_daily/production-Data_daily-tmp-hc-2698)
> java.lang.ClassCastException: org.apache.cassandra.utils.Murmur3BloomFilter 
> cannot be cast to org.apache.cassandra.utils.Murmur2BloomFilter
> at 
> org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:50)
> at 
> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:410)
> at 
> org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:94)
> at 
> org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:255)
> at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:154)
> at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:92)
> at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:178)
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
> ERROR 09:04:52,544 Exception in thread Thread[Thread-39,5,main]
> [3:02:34 PM] Mariusz Dymarek: java.lang.IndexOutOfBoundsException
> at java.nio.Buffer.checkIndex(Buffer.java:520)
> at java.nio.HeapByteBuffer.getShort(HeapByteBuffer.java:289)
> at org.apache.cassandra.db.CounterColumn.create(Counter