[jira] [Updated] (CASSANDRA-4223) Non Unique Streaming session ID's
[ https://issues.apache.org/jira/browse/CASSANDRA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Morton updated CASSANDRA-4223: Attachment: 4223_counter_session_id-V2.diff 4223_counter_session_id-V2.diff Uses stream source flag as discussed. Added the flags to StreamHeader so they were together. > Non Unique Streaming session ID's > - > > Key: CASSANDRA-4223 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4223 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 10.04.2 LTS > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07) > Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) > "Bare metal" servers from > https://www.stormondemand.com/servers/baremetal.html > The servers run on a custom hypervisor. > >Reporter: Aaron Morton >Assignee: Aaron Morton > Labels: datastax_qa > Fix For: 1.0.11, 1.1.1 > > Attachments: 4223_counter_session_id-V2.diff, > 4223_counter_session_id.diff, NanoTest.java, fmm streaming bug.txt > > > I have observed repair processes failing due to duplicate Streaming session > ID's. In this installation it is preventing rebalance from completing. I > believe it has also prevented repair from completing in the past. > The attached streaming-logs.txt file contains log messages and an explanation > of what was happening during a repair operation. it has the evidence for > duplicate session ID's. > The duplicate session id's were generated on the repairing node and sent to > the streaming node. The streaming source replaced the first session with the > second which resulted in both sessions failing when the first FILE_COMPLETE > message was received. > The errors were: > {code:java} > DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:1,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > and > {code:java} > DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:2,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > I think this is because System.nanoTime() is used for the session ID when > creating the StreamInSession objects (driven from > StorageService.requestRanges()) . > From the documentation > (http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) > {quote} > This method provides nanosecond precision, but not necessarily nanosecond > accuracy. No guarantees are made about how frequently values change. > {quote} > Also some info here on clocks and timers > https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks > The hypervisor may be at fault here. But it seems like we cannot rely on > successive calls to nanoTime() to return different values. > To avoid message/interface changes on the StreamHeader it would be good to > keep the session ID a long. The simplest approach may be to make successive > calls to nanoTime un
[jira] [Updated] (CASSANDRA-4223) Non Unique Streaming session ID's
[ https://issues.apache.org/jira/browse/CASSANDRA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Morton updated CASSANDRA-4223: Attachment: 4223_counter_session_id.diff Use an AtomicLong in StreamInSession and one in StreamOutSession for the session id. Sessions are always accessed using , and in and out session are in their own collections. > Non Unique Streaming session ID's > - > > Key: CASSANDRA-4223 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4223 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 10.04.2 LTS > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07) > Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) > "Bare metal" servers from > https://www.stormondemand.com/servers/baremetal.html > The servers run on a custom hypervisor. > >Reporter: Aaron Morton >Assignee: Aaron Morton > Labels: datastax_qa > Fix For: 1.0.11, 1.1.1 > > Attachments: 4223_counter_session_id.diff, NanoTest.java, fmm > streaming bug.txt > > > I have observed repair processes failing due to duplicate Streaming session > ID's. In this installation it is preventing rebalance from completing. I > believe it has also prevented repair from completing in the past. > The attached streaming-logs.txt file contains log messages and an explanation > of what was happening during a repair operation. it has the evidence for > duplicate session ID's. > The duplicate session id's were generated on the repairing node and sent to > the streaming node. The streaming source replaced the first session with the > second which resulted in both sessions failing when the first FILE_COMPLETE > message was received. > The errors were: > {code:java} > DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:1,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > and > {code:java} > DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:2,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > I think this is because System.nanoTime() is used for the session ID when > creating the StreamInSession objects (driven from > StorageService.requestRanges()) . > From the documentation > (http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) > {quote} > This method provides nanosecond precision, but not necessarily nanosecond > accuracy. No guarantees are made about how frequently values change. > {quote} > Also some info here on clocks and timers > https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks > The hypervisor may be at fault here. But it seems like we cannot rely on > successive calls to nanoTime() to return different values. > To avoid message/interface changes on the StreamHeader it would be good to > keep the session ID a long. The simplest approach may be to make successive > calls
[jira] [Updated] (CASSANDRA-4223) Non Unique Streaming session ID's
[ https://issues.apache.org/jira/browse/CASSANDRA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-4223: -- Reviewer: yukim Affects Version/s: (was: 1.0.9) Fix Version/s: 1.1.1 1.0.11 > Non Unique Streaming session ID's > - > > Key: CASSANDRA-4223 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4223 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 10.04.2 LTS > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07) > Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) > "Bare metal" servers from > https://www.stormondemand.com/servers/baremetal.html > The servers run on a custom hypervisor. > >Reporter: Aaron Morton >Assignee: Aaron Morton > Labels: datastax_qa > Fix For: 1.0.11, 1.1.1 > > Attachments: NanoTest.java, fmm streaming bug.txt > > > I have observed repair processes failing due to duplicate Streaming session > ID's. In this installation it is preventing rebalance from completing. I > believe it has also prevented repair from completing in the past. > The attached streaming-logs.txt file contains log messages and an explanation > of what was happening during a repair operation. it has the evidence for > duplicate session ID's. > The duplicate session id's were generated on the repairing node and sent to > the streaming node. The streaming source replaced the first session with the > second which resulted in both sessions failing when the first FILE_COMPLETE > message was received. > The errors were: > {code:java} > DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:1,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > and > {code:java} > DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:2,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > I think this is because System.nanoTime() is used for the session ID when > creating the StreamInSession objects (driven from > StorageService.requestRanges()) . > From the documentation > (http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) > {quote} > This method provides nanosecond precision, but not necessarily nanosecond > accuracy. No guarantees are made about how frequently values change. > {quote} > Also some info here on clocks and timers > https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks > The hypervisor may be at fault here. But it seems like we cannot rely on > successive calls to nanoTime() to return different values. > To avoid message/interface changes on the StreamHeader it would be good to > keep the session ID a long. The simplest approach may be to make successive > calls to nanoTime until the result changes. We could fail if a certain number > of milliseconds have passed. > Hashing the file names and range
[jira] [Updated] (CASSANDRA-4223) Non Unique Streaming session ID's
[ https://issues.apache.org/jira/browse/CASSANDRA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Morton updated CASSANDRA-4223: Attachment: NanoTest.java Test for unique nanoTime() results. > Non Unique Streaming session ID's > - > > Key: CASSANDRA-4223 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4223 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.9 > Environment: Ubuntu 10.04.2 LTS > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07) > Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) > "Bare metal" servers from > https://www.stormondemand.com/servers/baremetal.html > The servers run on a custom hypervisor. > >Reporter: Aaron Morton >Assignee: Aaron Morton > Labels: datastax_qa > Attachments: NanoTest.java, fmm streaming bug.txt > > > I have observed repair processes failing due to duplicate Streaming session > ID's. In this installation it is preventing rebalance from completing. I > believe it has also prevented repair from completing in the past. > The attached streaming-logs.txt file contains log messages and an explanation > of what was happening during a repair operation. it has the evidence for > duplicate session ID's. > The duplicate session id's were generated on the repairing node and sent to > the streaming node. The streaming source replaced the first session with the > second which resulted in both sessions failing when the first FILE_COMPLETE > message was received. > The errors were: > {code:java} > DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:1,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > and > {code:java} > DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:2,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > I think this is because System.nanoTime() is used for the session ID when > creating the StreamInSession objects (driven from > StorageService.requestRanges()) . > From the documentation > (http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) > {quote} > This method provides nanosecond precision, but not necessarily nanosecond > accuracy. No guarantees are made about how frequently values change. > {quote} > Also some info here on clocks and timers > https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks > The hypervisor may be at fault here. But it seems like we cannot rely on > successive calls to nanoTime() to return different values. > To avoid message/interface changes on the StreamHeader it would be good to > keep the session ID a long. The simplest approach may be to make successive > calls to nanoTime until the result changes. We could fail if a certain number > of milliseconds have passed. > Hashing the file names and ranges is also a possibility, but more involved. > (We may also wan
[jira] [Updated] (CASSANDRA-4223) Non Unique Streaming session ID's
[ https://issues.apache.org/jira/browse/CASSANDRA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Morton updated CASSANDRA-4223: Attachment: fmm streaming bug.txt > Non Unique Streaming session ID's > - > > Key: CASSANDRA-4223 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4223 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.9 > Environment: Ubuntu 10.04.2 LTS > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07) > Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) > "Bare metal" servers from > https://www.stormondemand.com/servers/baremetal.html > The servers run on a custom hypervisor. > >Reporter: Aaron Morton >Assignee: Aaron Morton > Attachments: fmm streaming bug.txt > > > I have observed repair processes failing due to duplicate Streaming session > ID's. In this installation it is preventing rebalance from completing. I > believe it has also prevented repair from completing in the past. > The attached streaming-logs.txt file contains log messages and an explanation > of what was happening during a repair operation. it has the evidence for > duplicate session ID's. > The duplicate session id's were generated on the repairing node and sent to > the streaming node. The streaming source replaced the first session with the > second which resulted in both sessions failing when the first FILE_COMPLETE > message was received. > The errors were: > {code:java} > DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:1,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > and > {code:java} > DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line > 47) Received StreamReply StreamReply(sessionId=26132848816442266, > file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', > action=FILE_FINISHED) > ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[MiscStage:2,5,main] > java.lang.IllegalStateException: target reports current file is > /var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > I think this is because System.nanoTime() is used for the session ID when > creating the StreamInSession objects (driven from > StorageService.requestRanges()) . > From the documentation > (http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) > {quote} > This method provides nanosecond precision, but not necessarily nanosecond > accuracy. No guarantees are made about how frequently values change. > {quote} > Also some info here on clocks and timers > https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks > The hypervisor may be at fault here. But it seems like we cannot rely on > successive calls to nanoTime() to return different values. > To avoid message/interface changes on the StreamHeader it would be good to > keep the session ID a long. The simplest approach may be to make successive > calls to nanoTime until the result changes. We could fail if a certain number > of milliseconds have passed. > Hashing the file names and ranges is also a possibility, but more involved. > (We may also want to drop latency times that are 0 nano seconds.) -- This message is automatically generat