[jira] [Created] (CASSANDRA-8310) Assertion error in 2.1.1: SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:988)

2014-11-13 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8310:
---

 Summary: Assertion error in 2.1.1: 
SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:988)
 Key: CASSANDRA-8310
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8310
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Donald Smith


Using C* 2.1.1  on linux Centos 6.4, we're getting this AssertionError on 5 
nodes in a 12 node cluster. Also, compactions are lagging on all nodes.
{noformat}
ERROR [IndexSummaryManager:1] 2014-11-13 09:15:16,221 CassandraDaemon.java 
(line 153) Exception in thread Thread[IndexSummaryManager:1,1,main]
java.lang.AssertionError: null
at 
org.apache.cassandra.io.sstable.SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:988)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager.adjustSamplingLevels(IndexSummaryManager.java:420)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:298)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:238)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:139)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:77)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[na:1.7.0_60]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
[na:1.7.0_60]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
 [na:1.7.0_60]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [na:1.7.0_60]
{noformat} 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8311) C* 2.1.1: AssertionError in AbstractionCompactionTask "not correctly marked compacting"

2014-11-13 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8311:
---

 Summary: C* 2.1.1:  AssertionError in AbstractionCompactionTask 
"not correctly marked compacting"
 Key: CASSANDRA-8311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8311
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith


Using 2.1.1 on CentOS6.4, we see this AssertionError on 3 out of 12 nodes in 
one DC.
{noformat}
ERROR [CompactionExecutor:7] 2014-11-12 10:15:13,980 CassandraDaemon.java (line 
153) Exception in thread Thread[CompactionExecutor:7,1,RMI Runtime]
java.lang.AssertionError: 
/data/data/KEYSPACE_NAME/TABLE_NAME/KEYSPACE_NAME-TABLE_NAME-jb-308572-Data.db 
is not correctly marked compacting
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.(AbstractCompactionTask.java:49)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.compaction.CompactionTask.(CompactionTask.java:62)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.(LeveledCompactionTask.java:33)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getCompactionTask(LeveledCompactionStrategy.java:170)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8314) C* 2.1.1: AssertionError: "stream can only read forward"

2014-11-13 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8314:
---

 Summary: C* 2.1.1: AssertionError:  "stream can only read forward"
 Key: CASSANDRA-8314
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8314
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Donald Smith


I see this multiple nodes on a 2.1.1 cluster running on CentOS 6.4:
{noformat}
ERROR [STREAM-IN-/10.6.1.104] 2014-11-13 14:13:16,565 StreamSession.java (line 
470) [Stream #45bdfe30-6b81-11e4-a7ca-b150b4554347] Streaming error occurred
java.io.IOException: Too many retries for Header (cfId: 
aaefa7d7-9d72-3d18-b5f0-02b30cee5bd7, #29, version: jb, estimated keys: 12672, 
transfer size: 130005779, compressed?: true, repairedAt: 0)
at 
org.apache.cassandra.streaming.StreamSession.doRetry(StreamSession.java:594) 
[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:53)
 [apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38)
 [apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
 [apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)
 [apache-cassandra-2.1.1.jar:2.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60]
Caused by: java.lang.AssertionError: stream can only read forward.
at 
org.apache.cassandra.streaming.compress.CompressedInputStream.position(CompressedInputStream.java:107)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:85)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:48)
 [apache-cassandra-2.1.1.jar:2.1.1]
... 4 common frames omitted
{noformat}

We couldn't upgrade SStables due to exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8314) C* 2.1.1: AssertionError: "stream can only read forward"

2014-11-13 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211469#comment-14211469
 ] 

Donald Smith commented on CASSANDRA-8314:
-

BTW, earlier we got this exception when run “nodetool upgradesstables”:
{noformat}
java.lang.NullPointerException
at 
org.apache.cassandra.io.sstable.SSTableReader.cloneWithNewStart(SSTableReader.java:951)
at 
org.apache.cassandra.io.sstable.SSTableRewriter.moveStarts(SSTableRewriter.java:238)
at 
org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:180)
at 
org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:109)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:183)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:75)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$4.execute(CompactionManager.java:340)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

> C* 2.1.1: AssertionError:  "stream can only read forward"
> -
>
> Key: CASSANDRA-8314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8314
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Donald Smith
>
> I see this multiple nodes on a 2.1.1 cluster running on CentOS 6.4:
> {noformat}
> ERROR [STREAM-IN-/10.6.1.104] 2014-11-13 14:13:16,565 StreamSession.java 
> (line 470) [Stream #45bdfe30-6b81-11e4-a7ca-b150b4554347] Streaming error 
> occurred
> java.io.IOException: Too many retries for Header (cfId: 
> aaefa7d7-9d72-3d18-b5f0-02b30cee5bd7, #29, version: jb, estimated keys: 
> 12672, transfer size: 130005779, compressed?: true, repairedAt: 0)
> at 
> org.apache.cassandra.streaming.StreamSession.doRetry(StreamSession.java:594) 
> [apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:53)
>  [apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38)
>  [apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
>  [apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)
>  [apache-cassandra-2.1.1.jar:2.1.1]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60]
> Caused by: java.lang.AssertionError: stream can only read forward.
> at 
> org.apache.cassandra.streaming.compress.CompressedInputStream.position(CompressedInputStream.java:107)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:85)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:48)
>  [apache-cassandra-2.1.1.jar:2.1.1]
> ... 4 common frames omitted
> {noformat}
> We couldn't upgrade SStables due to exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215093#comment-14215093
 ] 

Donald Smith commented on CASSANDRA-7830:
-

Yes, I'm seeing this with 2.0.11:
{noformat}
Exception in thread "main" java.lang.UnsupportedOperationException: data is 
currently moving to this node; unable to leave the ring
at 
org.apache.cassandra.service.StorageService.decommission(StorageService.java:2912)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
...
{noformat}
And *nodetool netstats* shows:
{noformat}
dc1-cassandra13.dc01 ~> nodetool netstats
Mode: NORMAL
Restore replica count d7efb410-6c58-11e4-896c-a1382b792927
Read Repair Statistics:
Attempted: 1123
Mismatch (Blocking): 0
Mismatch (Background): 540
Pool NameActive   Pending  Completed
Commandsn/a 0 1494743209
Responses   n/a 1 1651558975
{noformat}


> Decommissioning fails on a live node
> 
>
> Key: CASSANDRA-7830
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ananthkumar K S
>
> Exception in thread "main" java.lang.UnsupportedOperationException: data is 
> currently moving to this node; unable to leave the ring at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>  at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>  at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
>  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
> sun.rmi.transport.Transport$1.run(Transport.java:177) at 
> sun.rmi.transport.Transport$1.run(Transport.java:174) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
>  at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:722)
> I got the following exception when i was trying to decommission a live node. 
> There is no reference in the manual saying that i need to stop the data 
> coming into this node. Even then, decommissioning is specified for live nodes.
> Can anyone let me know if am doing something wrong or if this is a bug on 
> cassandra part?
> Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215101#comment-14215101
 ] 

Donald Smith commented on CASSANDRA-7830:
-

Stopping and restarting the cassandra process did not help.

> Decommissioning fails on a live node
> 
>
> Key: CASSANDRA-7830
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ananthkumar K S
>
> {code}Exception in thread "main" java.lang.UnsupportedOperationException: 
> data is currently moving to this node; unable to leave the ring at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>  at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>  at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
>  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
> sun.rmi.transport.Transport$1.run(Transport.java:177) at 
> sun.rmi.transport.Transport$1.run(Transport.java:174) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
>  at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:722){code}
> I got the following exception when i was trying to decommission a live node. 
> There is no reference in the manual saying that i need to stop the data 
> coming into this node. Even then, decommissioning is specified for live nodes.
> Can anyone let me know if am doing something wrong or if this is a bug on 
> cassandra part?
> Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215101#comment-14215101
 ] 

Donald Smith edited comment on CASSANDRA-7830 at 11/17/14 8:29 PM:
---

Stopping and restarting the cassandra process did not help.

Also, I tried it on two other nodes and it didn't work there either, even when 
I first stopped the process.


was (Author: thinkerfeeler):
Stopping and restarting the cassandra process did not help.

> Decommissioning fails on a live node
> 
>
> Key: CASSANDRA-7830
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ananthkumar K S
>
> {code}Exception in thread "main" java.lang.UnsupportedOperationException: 
> data is currently moving to this node; unable to leave the ring at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>  at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>  at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
>  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
> sun.rmi.transport.Transport$1.run(Transport.java:177) at 
> sun.rmi.transport.Transport$1.run(Transport.java:174) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
>  at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:722){code}
> I got the following exception when i was trying to decommission a live node. 
> There is no reference in the manual saying that i need to stop the data 
> coming into this node. Even then, decommissioning is specified for live nodes.
> Can anyone let me know if am doing something wrong or if this is a bug on 
> cassandra part?
> Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215101#comment-14215101
 ] 

Donald Smith edited comment on CASSANDRA-7830 at 11/17/14 8:30 PM:
---

Stopping and restarting the cassandra process did not help.

Also, I tried it on two other nodes and it didn't work there either, even when 
I first stopped and restarted the process.


was (Author: thinkerfeeler):
Stopping and restarting the cassandra process did not help.

Also, I tried it on two other nodes and it didn't work there either, even when 
I first stopped the process.

> Decommissioning fails on a live node
> 
>
> Key: CASSANDRA-7830
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ananthkumar K S
>
> {code}Exception in thread "main" java.lang.UnsupportedOperationException: 
> data is currently moving to this node; unable to leave the ring at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>  at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>  at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
>  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
> sun.rmi.transport.Transport$1.run(Transport.java:177) at 
> sun.rmi.transport.Transport$1.run(Transport.java:174) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
>  at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:722){code}
> I got the following exception when i was trying to decommission a live node. 
> There is no reference in the manual saying that i need to stop the data 
> coming into this node. Even then, decommissioning is specified for live nodes.
> Can anyone let me know if am doing something wrong or if this is a bug on 
> cassandra part?
> Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215136#comment-14215136
 ] 

Donald Smith commented on CASSANDRA-7830:
-

Following the advice in 
http://comments.gmane.org/gmane.comp.db.cassandra.user/5554, I stopped all 
nodes and restarted. Now the decommission is working. So this is a workaround.

> Decommissioning fails on a live node
> 
>
> Key: CASSANDRA-7830
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ananthkumar K S
>
> {code}Exception in thread "main" java.lang.UnsupportedOperationException: 
> data is currently moving to this node; unable to leave the ring at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>  at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>  at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
>  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
> sun.rmi.transport.Transport$1.run(Transport.java:177) at 
> sun.rmi.transport.Transport$1.run(Transport.java:174) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
>  at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:722){code}
> I got the following exception when i was trying to decommission a live node. 
> There is no reference in the manual saying that i need to stop the data 
> coming into this node. Even then, decommissioning is specified for live nodes.
> Can anyone let me know if am doing something wrong or if this is a bug on 
> cassandra part?
> Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-18 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216530#comment-14216530
 ] 

Donald Smith commented on CASSANDRA-7830:
-

(BTW, the decommission failed after an hour with Runteime Exception "Stream 
failed."  I  tried again and it failed again with the same exception, after 
about an hour and 15 minutes.  There was no load on the cluster at all.   The 
nodes in the dc were all up.   I gave up, stopped the process, and am running 
"nodetool removenode " from another node.)

> Decommissioning fails on a live node
> 
>
> Key: CASSANDRA-7830
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ananthkumar K S
>
> {code}Exception in thread "main" java.lang.UnsupportedOperationException: 
> data is currently moving to this node; unable to leave the ring at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>  at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>  at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
>  at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
>  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601) at 
> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
> sun.rmi.transport.Transport$1.run(Transport.java:177) at 
> sun.rmi.transport.Transport$1.run(Transport.java:174) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
>  at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:722){code}
> I got the following exception when i was trying to decommission a live node. 
> There is no reference in the manual saying that i need to stop the data 
> coming into this node. Even then, decommissioning is specified for live nodes.
> Can anyone let me know if am doing something wrong or if this is a bug on 
> cassandra part?
> Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8738) "/etc/init.d/cassandra stop" prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8738:
---

 Summary: "/etc/init.d/cassandra stop" prints OK even when it 
doesn't work
 Key: CASSANDRA-8738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith


Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *"stopped"* ]]
! then
!echo "OK"
! else
!echo "ERROR: could not stop the process: $THE_STATUS"
! fi
  ;;
--- 69,71 
  sleep 5
! echo "OK"
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
"ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8738) "/etc/init.d/cassandra stop" prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305548#comment-14305548
 ] 

Donald Smith commented on CASSANDRA-8738:
-

Here's the change in context:
{noformat}
   stop)
# Cassandra shutdown
echo -n "Shutdown Cassandra: "
su $CASSANDRA_OWNR -c "kill `cat $pid_file`"
for t in `seq 40`; do $0 status > /dev/null 2>&1 && sleep 0.5 || break; 
done
# Adding a sleep here to give jmx time to wind down (CASSANDRA-4483). 
Not ideal...
# Adam Holmberg suggests this, but that would break if the jmx port is 
changed
# for t in `seq 40`; do netstat -tnlp | grep "0.0.0.0:7199" > /dev/null 
2>&1 && sleep 0.1 || break; done
sleep 5
THE_STATUS=`$0 status`
if [[ $THE_STATUS == *"stopped"* ]]
then
   echo "OK"
else
   echo "ERROR: could not stop the process: $THE_STATUS"
fi
;;
{noformat}

> "/etc/init.d/cassandra stop" prints OK even when it doesn't work
> 
>
> Key: CASSANDRA-8738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
> server is still running.  (This happens, for example, if it's busy doing 
> GCs.)  The current init script prints out OK after sleeping but without 
> checking if the process really stopped. I suggest changing it to:
> {noformat}
> pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
> *** cassandra   2015-02-04 09:15:58.088209988 -0800
> --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
> ***
> *** 69,77 
>   sleep 5
> ! THE_STATUS=`$0 status`
> ! if [[ $THE_STATUS == *"stopped"* ]]
> ! then
> !echo "OK"
> ! else
> !echo "ERROR: could not stop the process: $THE_STATUS"
> ! fi
>   ;;
> --- 69,71 
>   sleep 5
> ! echo "OK"
>   ;;
> {noformat}
> Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
> message like
> {quote}
> "ERROR: could not stop the process: cassandra (pid  10764) is running...
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8738) "/etc/init.d/cassandra stop" prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8738:

Description: 
Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *"stopped"* ]]
! then
!echo "OK"
! else
!echo "ERROR: could not stop the process: $THE_STATUS"
! fi
  ;;
--- 69,71 
  sleep 5
! echo "OK"
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}

  was:
Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *"stopped"* ]]
! then
!echo "OK"
! else
!echo "ERROR: could not stop the process: $THE_STATUS"
! fi
  ;;
--- 69,71 
  sleep 5
! echo "OK"
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
"ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}


> "/etc/init.d/cassandra stop" prints OK even when it doesn't work
> 
>
> Key: CASSANDRA-8738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
> server is still running.  (This happens, for example, if it's busy doing 
> GCs.)  The current init script prints out OK after sleeping but without 
> checking if the process really stopped. I suggest changing it to:
> {noformat}
> pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
> *** cassandra   2015-02-04 09:15:58.088209988 -0800
> --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
> ***
> *** 69,77 
>   sleep 5
> ! THE_STATUS=`$0 status`
> ! if [[ $THE_STATUS == *"stopped"* ]]
> ! then
> !echo "OK"
> ! else
> !echo "ERROR: could not stop the process: $THE_STATUS"
> ! fi
>   ;;
> --- 69,71 
>   sleep 5
> ! echo "OK"
>   ;;
> {noformat}
> Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
> message like
> {quote}
> ERROR: could not stop the process: cassandra (pid  10764) is running...
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8738) "/etc/init.d/cassandra stop" prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305548#comment-14305548
 ] 

Donald Smith edited comment on CASSANDRA-8738 at 2/4/15 5:36 PM:
-

Here's the change in context:
{noformat}
   stop)
# Cassandra shutdown
echo -n "Shutdown Cassandra: "
su $CASSANDRA_OWNR -c "kill `cat $pid_file`"
for t in `seq 40`; do $0 status > /dev/null 2>&1 && sleep 0.5 || break; 
done
# Adding a sleep here to give jmx time to wind down (CASSANDRA-4483). 
Not ideal...
# Adam Holmberg suggests this, but that would break if the jmx port is 
changed
# for t in `seq 40`; do netstat -tnlp | grep "0.0.0.0:7199" > /dev/null 
2>&1 && sleep 0.1 || break; done
sleep 5
THE_STATUS=`$0 status`
if [[ $THE_STATUS == *"stopped"* ]]
then
   echo "OK"
else
   echo "ERROR: could not stop the process: $THE_STATUS"
   exit 1
fi
;;
{noformat}


was (Author: thinkerfeeler):
Here's the change in context:
{noformat}
   stop)
# Cassandra shutdown
echo -n "Shutdown Cassandra: "
su $CASSANDRA_OWNR -c "kill `cat $pid_file`"
for t in `seq 40`; do $0 status > /dev/null 2>&1 && sleep 0.5 || break; 
done
# Adding a sleep here to give jmx time to wind down (CASSANDRA-4483). 
Not ideal...
# Adam Holmberg suggests this, but that would break if the jmx port is 
changed
# for t in `seq 40`; do netstat -tnlp | grep "0.0.0.0:7199" > /dev/null 
2>&1 && sleep 0.1 || break; done
sleep 5
THE_STATUS=`$0 status`
if [[ $THE_STATUS == *"stopped"* ]]
then
   echo "OK"
else
   echo "ERROR: could not stop the process: $THE_STATUS"
fi
;;
{noformat}

> "/etc/init.d/cassandra stop" prints OK even when it doesn't work
> 
>
> Key: CASSANDRA-8738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
> server is still running.  (This happens, for example, if it's busy doing 
> GCs.)  The current init script prints out OK after sleeping but without 
> checking if the process really stopped. I suggest changing it to:
> {noformat}
> pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
> *** cassandra   2015-02-04 09:15:58.088209988 -0800
> --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
> ***
> *** 69,77 
>   sleep 5
> ! THE_STATUS=`$0 status`
> ! if [[ $THE_STATUS == *"stopped"* ]]
> ! then
> !echo "OK"
> ! else
> !echo "ERROR: could not stop the process: $THE_STATUS"
> !exit 1
> ! fi
>   ;;
> --- 69,71 
>   sleep 5
> ! echo "OK"
>   ;;
> {noformat}
> Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
> message like
> {quote}
> ERROR: could not stop the process: cassandra (pid  10764) is running...
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8738) "/etc/init.d/cassandra stop" prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8738:

Description: 
Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *"stopped"* ]]
! then
!echo "OK"
! else
!echo "ERROR: could not stop the process: $THE_STATUS"
!exit 1
! fi
  ;;
--- 69,71 
  sleep 5
! echo "OK"
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}

  was:
Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *"stopped"* ]]
! then
!echo "OK"
! else
!echo "ERROR: could not stop the process: $THE_STATUS"
! fi
  ;;
--- 69,71 
  sleep 5
! echo "OK"
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}


> "/etc/init.d/cassandra stop" prints OK even when it doesn't work
> 
>
> Key: CASSANDRA-8738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
> server is still running.  (This happens, for example, if it's busy doing 
> GCs.)  The current init script prints out OK after sleeping but without 
> checking if the process really stopped. I suggest changing it to:
> {noformat}
> pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
> *** cassandra   2015-02-04 09:15:58.088209988 -0800
> --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
> ***
> *** 69,77 
>   sleep 5
> ! THE_STATUS=`$0 status`
> ! if [[ $THE_STATUS == *"stopped"* ]]
> ! then
> !echo "OK"
> ! else
> !echo "ERROR: could not stop the process: $THE_STATUS"
> !exit 1
> ! fi
>   ;;
> --- 69,71 
>   sleep 5
> ! echo "OK"
>   ;;
> {noformat}
> Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
> message like
> {quote}
> ERROR: could not stop the process: cassandra (pid  10764) is running...
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8738) "/etc/init.d/cassandra stop" prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305578#comment-14305578
 ] 

Donald Smith commented on CASSANDRA-8738:
-

I'm using 2.0.11. But I see that 2.1.1 has the same problem. It looks like 
3.0's version at https://github.com/apache/cassandra/blob/trunk/debian/init 
fixes it:
{noformat}
do_stop()
{
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
RET=$?
rm -f "$PIDFILE"
return $RET
}
{noformat}

> "/etc/init.d/cassandra stop" prints OK even when it doesn't work
> 
>
> Key: CASSANDRA-8738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
> server is still running.  (This happens, for example, if it's busy doing 
> GCs.)  The current init script prints out OK after sleeping but without 
> checking if the process really stopped. I suggest changing it to:
> {noformat}
> pd0-cassandra16 ~> diff -C 1 cassandra cassandra-original
> *** cassandra   2015-02-04 09:15:58.088209988 -0800
> --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
> ***
> *** 69,77 
>   sleep 5
> ! THE_STATUS=`$0 status`
> ! if [[ $THE_STATUS == *"stopped"* ]]
> ! then
> !echo "OK"
> ! else
> !echo "ERROR: could not stop the process: $THE_STATUS"
> !exit 1
> ! fi
>   ;;
> --- 69,71 
>   sleep 5
> ! echo "OK"
>   ;;
> {noformat}
> Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
> message like
> {quote}
> ERROR: could not stop the process: cassandra (pid  10764) is running...
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6666) Avoid accumulating tombstones after partial hint replay

2014-09-19 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141239#comment-14141239
 ] 

Donald Smith commented on CASSANDRA-:
-

Perhaps this question is inappropriate here. But can tombstones be completely 
omitted for system.hints, given that they're not replicated and given that only 
internal code modifies them in normal operation?  If a hint is delivered 
successfully, why does it need a tombstone at all?  If it times out, then 
cassandra is going to give up on delivering it. So, again, why does it need a 
tombstone?   On the cassandra irc channel, several people speculated that 
the cassandra developers didn't want to make a *special case* for system.hints. 
  Also, system.hints has *gc_grace_seconds=0*, so they won't survive a 
compaction, presumably.  I realize that in C* 3.0, tombstones will be moved out 
of tables, but I still am perplexed why tombstones are needed at all for hints. 
  My apologies if this is a dumb question.

> Avoid accumulating tombstones after partial hint replay
> ---
>
> Key: CASSANDRA-
> URL: https://issues.apache.org/jira/browse/CASSANDRA-
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
>  Labels: hintedhandoff
> Fix For: 2.0.11
>
> Attachments: .txt, cassandra_system.log.debug.gz
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6666) Avoid accumulating tombstones after partial hint replay

2014-09-28 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151127#comment-14151127
 ] 

Donald Smith commented on CASSANDRA-:
-

I know this is moot because of the redesign of hints, but I want to understand 
this. OK, if a hint was successfully delivered, then I can see how a tombstone 
would be useful for causing deletion of *older* instances in other sstables.  
But if a hint timed-out (tombstone), then any older instance will also have 
timed out (presumably). So, could tombstones be deleted in that case (timeout)? 
 Perhaps a timed out cell IS a tombstone, but my point is: I don't see why they 
need to take up space.

> Avoid accumulating tombstones after partial hint replay
> ---
>
> Key: CASSANDRA-
> URL: https://issues.apache.org/jira/browse/CASSANDRA-
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
>  Labels: hintedhandoff
> Attachments: .txt, cassandra_system.log.debug.gz
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8060) Geography-aware replication

2014-10-05 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8060:
---

 Summary: Geography-aware replication
 Key: CASSANDRA-8060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
 Project: Cassandra
  Issue Type: Wish
Reporter: Donald Smith


We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

It would have major ramifications for latency architecture but might be 
appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8060) Geography-aware, daisy-chaining replication

2014-10-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8060:

Summary: Geography-aware, daisy-chaining replication  (was: Geography-aware 
replication)

> Geography-aware, daisy-chaining replication
> ---
>
> Key: CASSANDRA-8060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Donald Smith
>
> We have three data centers in the US (CA in California, TX in Texas, and NJ 
> in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
> our writing to CA.  That represents a bottleneck, since the coordinator nodes 
> in CA are responsible for all the replication to every data center.
> Far better if we had the option of setting things up so that CA replicated to 
> TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
> for replicating to UK, which should replicate to DE.  Etc, etc.
> This could be controlled by the topology file.
> It would have major ramifications for latency architecture but might be 
> appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8060) Geography-aware, daisy-chaining replication

2014-10-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8060:

Description: 
We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

It would require architectural changes and would have major ramifications for 
latency but might be appropriate for some scenarios.

  was:
We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

It would have major ramifications for latency architecture but might be 
appropriate for some scenarios.


> Geography-aware, daisy-chaining replication
> ---
>
> Key: CASSANDRA-8060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Donald Smith
>
> We have three data centers in the US (CA in California, TX in Texas, and NJ 
> in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
> our writing to CA.  That represents a bottleneck, since the coordinator nodes 
> in CA are responsible for all the replication to every data center.
> Far better if we had the option of setting things up so that CA replicated to 
> TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
> for replicating to UK, which should replicate to DE.  Etc, etc.
> This could be controlled by the topology file.
> It would require architectural changes and would have major ramifications for 
> latency but might be appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8060) Geography-aware, distributed replication

2014-10-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8060:

Summary: Geography-aware, distributed replication  (was: Geography-aware, 
daisy-chaining replication)

> Geography-aware, distributed replication
> 
>
> Key: CASSANDRA-8060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Donald Smith
>
> We have three data centers in the US (CA in California, TX in Texas, and NJ 
> in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
> our writing to CA.  That represents a bottleneck, since the coordinator nodes 
> in CA are responsible for all the replication to every data center.
> Far better if we had the option of setting things up so that CA replicated to 
> TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
> for replicating to UK, which should replicate to DE.  Etc, etc.
> This could be controlled by the topology file.
> The replication could be organized in a tree-like structure instead of a 
> daisy-chain.
> It would require architectural changes and would have major ramifications for 
> latency but might be appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8060) Geography-aware, daisy-chaining replication

2014-10-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8060:

Description: 
We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

The replication could be organized in a tree-like structure instead of a 
daisy-chain.

It would require architectural changes and would have major ramifications for 
latency but might be appropriate for some scenarios.

  was:
We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

It would require architectural changes and would have major ramifications for 
latency but might be appropriate for some scenarios.


> Geography-aware, daisy-chaining replication
> ---
>
> Key: CASSANDRA-8060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Donald Smith
>
> We have three data centers in the US (CA in California, TX in Texas, and NJ 
> in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
> our writing to CA.  That represents a bottleneck, since the coordinator nodes 
> in CA are responsible for all the replication to every data center.
> Far better if we had the option of setting things up so that CA replicated to 
> TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
> for replicating to UK, which should replicate to DE.  Etc, etc.
> This could be controlled by the topology file.
> The replication could be organized in a tree-like structure instead of a 
> daisy-chain.
> It would require architectural changes and would have major ramifications for 
> latency but might be appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8433) Add jmx control to reset lifetime metrics to zero

2014-12-05 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8433:
---

 Summary: Add jmx control to reset lifetime metrics to zero
 Key: CASSANDRA-8433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor


Often I change some parameter in cassandra, in the OS, or in an external 
component and want to see the effect on cassandra performance.  Because some 
the jmx metrics are for the lifetime of the process, it's hard to see the 
effect of changes.  It's inconvenient to restart all the nodes. And if you 
restart only some nodes (as I often do) then only those metrics reset to zero.

The jmx interface should provide a way to reset all lifetime metrics to zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8433) Add jmx and mdoetool controls to reset lifetime metrics to zero

2014-12-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8433:

Summary: Add jmx and mdoetool controls to reset lifetime metrics to zero  
(was: Add jmx control to reset lifetime metrics to zero)

> Add jmx and mdoetool controls to reset lifetime metrics to zero
> ---
>
> Key: CASSANDRA-8433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often I change some parameter in cassandra, in the OS, or in an external 
> component and want to see the effect on cassandra performance.  Because some 
> the jmx metrics are for the lifetime of the process, it's hard to see the 
> effect of changes.  It's inconvenient to restart all the nodes. And if you 
> restart only some nodes (as I often do) then only those metrics reset to zero.
> The jmx interface should provide a way to reset all lifetime metrics to zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8433) Add jmx and mdoetool controls to reset lifetime metrics to zero

2014-12-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8433:

Description: 
Often I change some parameter in cassandra, in the OS, or in an external 
component and want to see the effect on cassandra performance.  Because some 
the jmx metrics are for the lifetime of the process, it's hard to see the 
effect of changes.  It's inconvenient to restart all the nodes. And if you 
restart only some nodes (as I often do) then only those metrics reset to zero.

The jmx interface should provide a way to reset all lifetime metrics to zero.  
And *nodetool* should invoke that to allow resetting metrics from the command 
line.


  was:
Often I change some parameter in cassandra, in the OS, or in an external 
component and want to see the effect on cassandra performance.  Because some 
the jmx metrics are for the lifetime of the process, it's hard to see the 
effect of changes.  It's inconvenient to restart all the nodes. And if you 
restart only some nodes (as I often do) then only those metrics reset to zero.

The jmx interface should provide a way to reset all lifetime metrics to zero.


> Add jmx and mdoetool controls to reset lifetime metrics to zero
> ---
>
> Key: CASSANDRA-8433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often I change some parameter in cassandra, in the OS, or in an external 
> component and want to see the effect on cassandra performance.  Because some 
> the jmx metrics are for the lifetime of the process, it's hard to see the 
> effect of changes.  It's inconvenient to restart all the nodes. And if you 
> restart only some nodes (as I often do) then only those metrics reset to zero.
> The jmx interface should provide a way to reset all lifetime metrics to zero. 
>  And *nodetool* should invoke that to allow resetting metrics from the 
> command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8433) Add jmx and nodetool controls to reset lifetime metrics to zero

2014-12-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8433:

Summary: Add jmx and nodetool controls to reset lifetime metrics to zero  
(was: Add jmx and mdoetool controls to reset lifetime metrics to zero)

> Add jmx and nodetool controls to reset lifetime metrics to zero
> ---
>
> Key: CASSANDRA-8433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often I change some parameter in cassandra, in the OS, or in an external 
> component and want to see the effect on cassandra performance.  Because some 
> the jmx metrics are for the lifetime of the process, it's hard to see the 
> effect of changes.  It's inconvenient to restart all the nodes. And if you 
> restart only some nodes (as I often do) then only those metrics reset to zero.
> The jmx interface should provide a way to reset all lifetime metrics to zero. 
>  And *nodetool* should invoke that to allow resetting metrics from the 
> command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8433) Add jmx and nodetool controls to reset lifetime metrics to zero

2014-12-08 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238029#comment-14238029
 ] 

Donald Smith commented on CASSANDRA-8433:
-

Maybe I could use the 'recent' metrtics if I knew which ones were 'lifetime' 
and which ones were 'recent'.   Also, how often do the 'recent' metrics reset?  
It doesn't seem to say here: http://wiki.apache.org/cassandra/Metrics .

> Add jmx and nodetool controls to reset lifetime metrics to zero
> ---
>
> Key: CASSANDRA-8433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often I change some parameter in cassandra, in the OS, or in an external 
> component and want to see the effect on cassandra performance.  Because some 
> the jmx metrics are for the lifetime of the process, it's hard to see the 
> effect of changes.  It's inconvenient to restart all the nodes. And if you 
> restart only some nodes (as I often do) then only those metrics reset to zero.
> The jmx interface should provide a way to reset all lifetime metrics to zero. 
>  And *nodetool* should invoke that to allow resetting metrics from the 
> command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8433) Add jmx and nodetool controls to reset lifetime metrics to zero

2014-12-08 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238060#comment-14238060
 ] 

Donald Smith commented on CASSANDRA-8433:
-

Ideally, the output from jmx and nodetool would better document what the fields 
mean and what time period they cover. I's unclear whether some of the latencies 
refer to coordinator node latency for client requests or local disk latency. 

I get the impression that "Mean" latency is lifetime.  But how do I know? Look 
at the source code? That's something I'd like to reset to zero.

> Add jmx and nodetool controls to reset lifetime metrics to zero
> ---
>
> Key: CASSANDRA-8433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often I change some parameter in cassandra, in the OS, or in an external 
> component and want to see the effect on cassandra performance.  Because some 
> the jmx metrics are for the lifetime of the process, it's hard to see the 
> effect of changes.  It's inconvenient to restart all the nodes. And if you 
> restart only some nodes (as I often do) then only those metrics reset to zero.
> The jmx interface should provide a way to reset all lifetime metrics to zero. 
>  And *nodetool* should invoke that to allow resetting metrics from the 
> command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration

2014-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261238#comment-14261238
 ] 

Donald Smith commented on CASSANDRA-8245:
-

We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

> Cassandra nodes periodically die in 2-DC configuration
> --
>
> Key: CASSANDRA-8245
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8245
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Scientific Linux release 6.5
> java version "1.7.0_51"
> Cassandra 2.0.9
>Reporter: Oleg Poleshuk
>Assignee: Brandon Williams
>Priority: Minor
> Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, 
> stack5.txt
>
>
> We have 2 DCs with 3 nodes in each.
> Second DC periodically has 1-2 nodes down.
> Looks like it looses connectivity with another nodes and then Gossiper starts 
> to accumulate tasks until Cassandra dies with OOM.
> WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting 
> live ratio to maximum of 64.0 instead of Infinity
>  WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip 
> stage has 1 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip 
> stage has 4 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip 
> stage has 8 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip 
> stage has 11 pending tasks; skipping status check (no nodes will be marked 
> down)
> ...
> WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip 
> stage has 1014764 pending tasks; skipping status check (no nodes will be 
> marked down)
>  WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) 
> Unexpected exception in the selector loop.
> java.lang.OutOfMemoryError: Java heap space
> Also those lines but not sure it is relevant:
> DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) 
> Ignoring interval time of 2085963047



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration

2014-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261238#comment-14261238
 ] 

Donald Smith edited comment on CASSANDRA-8245 at 12/30/14 5:20 PM:
---

We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the 
OutOfMemory exception.  You'd expect it to exit.
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip 
stage has 2695 pending tasks; skipping status check (no nodes will be marked 
down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space

ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}


was (Author: thinkerfeeler):
We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

> Cassandra nodes periodically die in 2-DC configuration
> --
>
> Key: CASSANDRA-8245
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8245
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Scientific Linux release 6.5
> java version "1.7.0_51"
> Cassandra 2.0.9
>Reporter: Oleg Poleshuk
>Assignee: Brandon Williams
>Priority: Minor
> Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, 
> stack5.txt
>
>
> We have 2 DCs with 3 nodes in each.
> Second DC periodically has 1-2 nodes down.
> Looks like it looses connectivity with another nodes and then Gossiper starts 
> to accumulate tasks until Cassandra dies with OOM.
> WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting 
> live ratio to maximum of 64.0 instead of Infinity
>  WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip 
> stage has 1 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip 
> stage has 4 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip 
> stage has 8 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip 
> stage has 11 pending tasks; skipping status check (no nodes will be marked 
> down)
> ...
> WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip 
> stage has 1014764 pending tasks; skipping status check (no nodes will be 
> marked down)
>  WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) 
> Unexpected exception in the selector loop.
> java.lang.OutOfMemoryError: Java heap space
> Also those lines but not sure it is relevant:
> DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) 
> Ignoring interval time of 2085963047



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration

2014-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261238#comment-14261238
 ] 

Donald Smith edited comment on CASSANDRA-8245 at 12/30/14 7:36 PM:
---

We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the 
OutOfMemory exception.  You'd expect it to exit.

Prior to getting OutOfMemory, I notice that such nodes are slow in responding 
to commands and queries (e.g., jmx).
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip 
stage has 2695 pending tasks; skipping status check (no nodes will be marked 
down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space

ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}


was (Author: thinkerfeeler):
We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the 
OutOfMemory exception.  You'd expect it to exit.
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip 
stage has 2695 pending tasks; skipping status check (no nodes will be marked 
down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space

ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}

> Cassandra nodes periodically die in 2-DC configuration
> --
>
> Key: CASSANDRA-8245
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8245
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Scientific Linux release 6.5
> java version "1.7.0_51"
> Cassandra 2.0.9
>Reporter: Oleg Poleshuk
>Assignee: Brandon Williams
>Priority: Minor
> Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, 
> stack5.txt
>
>
> We have 2 DCs with 3 nodes in each.
> Second DC periodically has 1-2 nodes down.
> Looks like it looses connectivity with another nodes and then Gossiper starts 
> to accumulate tasks until Cassandra dies with OOM.
> WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting 
> live ratio to maximum of 64.0 instead of Infinity
>  WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip 
> stage has 1 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip 
> stage has 4 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip 
> stage has 8 pending tasks; skipping status check (no nodes will be marked 
> down)
>  WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip 
> stage has 11 pending tasks; skipping status check (no nodes will be marked 
> down)
> ...
> WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip 
> stage has 1014764 pending tasks; skipping status check (no nodes will be 
> marked down)
>  WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) 
> Unexpected exception in the selector loop.
> java.lang.OutOfMemoryError: Java heap space
> Also those lines but not sure it is relevant:
> DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) 
> Ignoring interval time of 2085963047



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8591:
---

 Summary: Tunable bootstrapping
 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith


Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters, and should print out a report about what ranges were 
missing.  For many apps, it's far better to bootstrap what's available then to 
fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tuneable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Summary: Tuneable bootstrapping  (was: Tunable bootstrapping)

> Tuneable bootstrapping
> --
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters, and should print out a report about what ranges 
> were missing.  For many apps, it's far better to bootstrap what's available 
> then to fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Summary: Tunable bootstrapping  (was: Tuneable bootstrapping)

> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tuneable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters, and should print out a report about what ranges 
> were missing.  For many apps, it's far better to bootstrap what's available 
> then to fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tuneable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters, and should print out a report about what ranges were 
missing.  For many apps, it's far better to bootstrap what's available then to 
fail flat.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.
> Same with rebuilds.
> We were doing maintenance on some disks and when we started back up, some 
> nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
> we've been unable to bootstrap new nodes.  But bootstrapping with partial 
> success would be far better than being unable to bootstrap at all, and 
> cheaper than a repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes, due to "unable to find sufficient sources for 
streaming range."  But bootstrapping with partial success would be far better 
than being unable to bootstrap at all, and cheaper than a repair. Our 
consistency requirements are low but not zero.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes, due to "unable to find sufficient sources for 
streaming range."  But bootstrapping with partial success would be far better 
than being unable to bootstrap at all, and cheaper than a repair.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.
> Same with rebuilds.
> We were doing maintenance on some disks and when we started back up, some 
> nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
> we've been unable to bootstrap new nodes, due to "unable to find sufficient 
> sources for streaming range."  But bootstrapping with partial success would 
> be far better than being unable to bootstrap at all, and cheaper than a 
> repair. Our consistency requirements are low but not zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes, due to "unable to find sufficient sources for 
streaming range."  But bootstrapping with partial success would be far better 
than being unable to bootstrap at all, and cheaper than a repair.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.
> Same with rebuilds.
> We were doing maintenance on some disks and when we started back up, some 
> nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
> we've been unable to bootstrap new nodes, due to "unable to find sufficient 
> sources for streaming range."  But bootstrapping with partial success would 
> be far better than being unable to bootstrap at all, and cheaper than a 
> repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tuneable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tuneable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters, and should print out a report about what ranges were 
missing.  For many apps, it's far better to bootstrap what's available then to 
fail flat.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters, and should print out a report about what ranges were 
missing.  For many apps, it's far better to bootstrap what's available then to 
fail flat.


> Tuneable bootstrapping
> --
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tuneable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters, and should print out a report about what ranges 
> were missing.  For many apps, it's far better to bootstrap what's available 
> then to fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Priority: Minor  (was: Major)

> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.
> Same with rebuilds.
> We were doing maintenance on some disks and when we started back up, some 
> nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
> we've been unable to bootstrap new nodes, due to "unable to find sufficient 
> sources for streaming range."  But bootstrapping with partial success would 
> be far better than being unable to bootstrap at all, and cheaper than a 
> repair. Our consistency requirements are low but not zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements are low but not zero.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes, due to "unable to find sufficient sources for 
streaming range."  But bootstrapping with partial success would be far better 
than being unable to bootstrap at all, and cheaper than a repair. Our 
consistency requirements are low but not zero.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.
> Same with rebuilds.
> We were doing maintenance on some disks and when we started cassandra back 
> up, some nodes ran out of disk space, due to operator miscaluculation. 
> Thereafter, we've been unable to bootstrap new nodes, due to "unable to find 
> sufficient sources for streaming range."  But bootstrapping with partial 
> success would be far better than being unable to bootstrap at all, and 
> cheaper than a repair. Our consistency requirements are low but not zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements are low but not zero.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements are low but not zero.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.
> Same with rebuilds.
> We were doing maintenance on some disks, and when we started cassandra back 
> up, some nodes ran out of disk space, due to operator miscalculation. 
> Thereafter, we've been unable to bootstrap new nodes, due to "unable to find 
> sufficient sources for streaming range."  But bootstrapping with partial 
> success would be far better than being unable to bootstrap at all, and 
> cheaper than a repair. Our consistency requirements are low but not zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements are low but not zero.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.
> Same with rebuilds.
> We were doing maintenance on some disks, and when we started cassandra back 
> up, some nodes ran out of disk space, due to operator miscalculation. 
> Thereafter, we've been unable to bootstrap new nodes, due to "unable to find 
> sufficient sources for streaming range."  But bootstrapping with partial 
> success would be far better than being unable to bootstrap at all, and 
> cheaper than a repair. Our consistency requirements aren't high but we prefer 
> as much consistency as cassandra can give us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-09 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271721#comment-14271721
 ] 

Donald Smith commented on CASSANDRA-8494:
-

Tunable consistency is related:  don't fail if a range is missing. Be fault 
tolerant and bootstrap as much as it can.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.  Faults happen.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.  Faults happen.
> If it can't find some sources, it should allow bootstrapping to continue, 
> under control by parameters (up to 100 failures, for example), and should 
> print out a report about what ranges were missing.  For many apps, it's far 
> better to bootstrap what's available then to fail flat.
> Same with rebuilds.
> We were doing maintenance on some disks, and when we started cassandra back 
> up, some nodes ran out of disk space, due to operator miscalculation. 
> Thereafter, we've been unable to bootstrap new nodes, due to "unable to find 
> sufficient sources for streaming range."  But bootstrapping with partial 
> success would be far better than being unable to bootstrap at all, and 
> cheaper than a repair. Our consistency requirements aren't high but we prefer 
> as much consistency as cassandra can give us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.  

If it can't find sources for some ranges, it should allow bootstrapping to 
continue and should print out a report about what ranges were missing.   Allow 
the bootstrap to be tunable, under control of parameters ("allow up to 100 
failures", for example).

For many apps, it's far better to bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.

  was:
Often bootstrapping fails due to errors like "unable to find sufficient sources 
for streaming range". But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.  Faults happen.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to "unable to find sufficient 
sources for streaming range."  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.


> Tunable bootstrapping
> -
>
> Key: CASSANDRA-8591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> Often bootstrapping fails due to errors like "unable to find sufficient 
> sources for streaming range". But cassandra is supposed to be fault tolerant, 
> and it's supposed to have tunable consistency.  
> If it can't find sources for some ranges, it should allow bootstrapping to 
> continue and should print out a report about what ranges were missing.   
> Allow the bootstrap to be tunable, under control of parameters ("allow up to 
> 100 failures", for example).
> For many apps, it's far better to bootstrap what's available then to fail 
> flat.
> Same with rebuilds.
> We were doing maintenance on some disks, and when we started cassandra back 
> up, some nodes ran out of disk space, due to operator miscalculation. 
> Thereafter, we've been unable to bootstrap new nodes, due to "unable to find 
> sufficient sources for streaming range."  But bootstrapping with partial 
> success would be far better than being unable to bootstrap at all, and 
> cheaper than a repair. Our consistency requirements aren't high but we prefer 
> as much consistency as cassandra can give us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8990) Allow clients to override the DCs the data gets sent to, per write request, overriding keyspace settings

2015-03-18 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8990:
---

 Summary: Allow clients to override the DCs the data gets sent to, 
per write request, overriding keyspace settings
 Key: CASSANDRA-8990
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8990
 Project: Cassandra
  Issue Type: New Feature
Reporter: Donald Smith


Currently each keyspace specifies how many replicas to write to each data 
center. In CQL one specifies:
{noformat}
 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1: '3',
  'DC2': '3'
}
{noformat}
But in some use cases there's no need to write certain rows to a certain 
datacenter.  Requiring the user to create two keyspaces is burdensome and 
complicates code and queries.  

For example, we have global replication of our data to multiple continents. But 
we want the option to send only certain rows globally with certain values for 
certain columns -- e.g., only for users that visited that country).

Cassandra and CQL should support the ability of client code to specify, on a 
per request basis, that a write should go only to specified data centers. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8990) Allow clients to override the DCs the data gets sent to, per write request, overriding keyspace settings

2015-03-18 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8990:

Description: 
Currently each keyspace specifies how many replicas to write to each data 
center. In CQL one specifies:
{noformat}
 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1: '3',
  'DC2': '3'
}
{noformat}
But in some use cases there's no need to write certain rows to a certain 
datacenter.  Requiring the user to create two keyspaces is burdensome and 
complicates code and queries.  

For example, we have global replication of our data to multiple continents. But 
we want the option to send only certain rows globally with certain values for 
certain columns -- e.g., only for users that visited that country.

Cassandra and CQL should support the ability of client code to specify, on a 
per request basis, that a write should go only to specified data centers 
(probably restricted to being a subset of the DCs specified in the keyspace).


  was:
Currently each keyspace specifies how many replicas to write to each data 
center. In CQL one specifies:
{noformat}
 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1: '3',
  'DC2': '3'
}
{noformat}
But in some use cases there's no need to write certain rows to a certain 
datacenter.  Requiring the user to create two keyspaces is burdensome and 
complicates code and queries.  

For example, we have global replication of our data to multiple continents. But 
we want the option to send only certain rows globally with certain values for 
certain columns -- e.g., only for users that visited that country).

Cassandra and CQL should support the ability of client code to specify, on a 
per request basis, that a write should go only to specified data centers. 



> Allow clients to override the DCs the data gets sent to, per write request, 
> overriding keyspace settings
> 
>
> Key: CASSANDRA-8990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8990
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Donald Smith
>
> Currently each keyspace specifies how many replicas to write to each data 
> center. In CQL one specifies:
> {noformat}
>  WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'DC1: '3',
>   'DC2': '3'
> }
> {noformat}
> But in some use cases there's no need to write certain rows to a certain 
> datacenter.  Requiring the user to create two keyspaces is burdensome and 
> complicates code and queries.  
> For example, we have global replication of our data to multiple continents. 
> But we want the option to send only certain rows globally with certain values 
> for certain columns -- e.g., only for users that visited that country.
> Cassandra and CQL should support the ability of client code to specify, on a 
> per request basis, that a write should go only to specified data centers 
> (probably restricted to being a subset of the DCs specified in the keyspace).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9274) Changing memtable_flush_writes per recommendations in cassandra.yaml causes memtable_cleanup_threshold to be too small

2015-04-30 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-9274:
---

 Summary: Changing memtable_flush_writes per recommendations in 
cassandra.yaml causes  memtable_cleanup_threshold to be too small
 Key: CASSANDRA-9274
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9274
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor


It says in cassandra.yaml:
{noformat}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
{noformat}
so we raised it to 24.

Much later we noticed a warning in the logs:
{noformat}
WARN  [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - 
memtable_cleanup_threshold is set very low, which may cause performance 
degradation
{noformat}
Looking at cassandra.yaml again I see:
{noformat}
# memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
# memtable_cleanup_threshold: 0.11
#memtable_cleanup_threshold: 0.11
{noformat}
So, I uncommented that last line (figuring that 0.11 is a reasonable value).

Cassandra.yaml should give better guidance or the code should *prevent* the 
value from going outside a reasonable range.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9274) Changing memtable_flush_writes per recommendations in cassandra.yaml causes memtable_cleanup_threshold to be too small

2015-04-30 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-9274:

Description: 
It says in cassandra.yaml:
{noformat}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
{noformat}
so we raised it to 24.

Much later we noticed a warning in the logs:
{noformat}
WARN  [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - 
memtable_cleanup_threshold is set very low, which may cause performance 
degradation
{noformat}
Looking at cassandra.yaml again I see:
{noformat}
# memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
# memtable_cleanup_threshold: 0.11
#memtable_cleanup_threshold: 0.11
{noformat}
So, I uncommented that last line (figuring that 0.11 is a reasonable value).

Cassandra.yaml should give better guidance or the code should *prevent* the 
value from going outside a reasonable range.


  was:
It says in cassandra.yaml:
{noformat}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
{noformat}
so we raised it to 24.

Much later we noticed a warning in the logs:
{noformat}
WARN  [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - 
memtable_cleanup_threshold is set very low, which may cause performance 
degradation
{noformat}
Looking at cassandra.yaml again I see:
{noformat}
# memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
# memtable_cleanup_threshold: 0.11
#memtable_cleanup_threshold: 0.11
{noformat}
So, I uncommented that last line (figuring that 0.11 is a reasonable value).

Cassandra.yaml should give better guidance or the code should *prevent* the 
value from going outside a reasonable range.
{noformat}


> Changing memtable_flush_writes per recommendations in cassandra.yaml causes  
> memtable_cleanup_threshold to be too small
> ---
>
> Key: CASSANDRA-9274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9274
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Donald Smith
>Priority: Minor
>
> It says in cassandra.yaml:
> {noformat}
> # If your data directories are backed by SSD, you should increase this
> # to the number of cores.
> #memtable_flush_writers: 8
> {noformat}
> so we raised it to 24.
> Much later we noticed a warning in the logs:
> {noformat}
> WARN  [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - 
> memtable_cleanup_threshold is set very low, which may cause performance 
> degradation
> {noformat}
> Looking at cassandra.yaml again I see:
> {noformat}
> # memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
> # memtable_cleanup_threshold: 0.11
> #memtable_cleanup_threshold: 0.11
> {noformat}
> So, I uncommented that last line (figuring that 0.11 is a reasonable value).
> Cassandra.yaml should give better guidance or the code should *prevent* the 
> value from going outside a reasonable range.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-06 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-6152:
---

 Summary: Assertion error in 2.0.1 at 
db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 Key: CASSANDRA-6152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 6.2 (Final)
With default set up on single node.
I also saw this exception in 2.0.0 on a three node cluster.
Reporter: Donald Smith



{noformat}
ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
at 
org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
at 
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:722)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-06 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787772#comment-13787772
 ] 

Donald Smith commented on CASSANDRA-6152:
-

The exception seems to happen first during a delete.  Let me know if you need 
more info.

> Assertion error in 2.0.1 at 
> db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> ---
>
> Key: CASSANDRA-6152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS release 6.2 (Final)
> With default set up on single node.
> I also saw this exception in 2.0.0 on a three node cluster.
>Reporter: Donald Smith
>
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
> 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError
> at 
> org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
> at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
> at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
> at 
> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
> at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-06 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787808#comment-13787808
 ] 

Donald Smith commented on CASSANDRA-6152:
-

I was running a functional test suite, which populates some tables after 
deleting the old rows for the same keys. 

I ran it by a command like:
{noformat}
repeat 10 ./run-test.sh 
{noformat}
So, it was deleting and writing rows in quick succession.  

If you want to see more detail than that, I'll see what I can provide.


> Assertion error in 2.0.1 at 
> db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> ---
>
> Key: CASSANDRA-6152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS release 6.2 (Final)
> With default set up on single node.
> I also saw this exception in 2.0.0 on a three node cluster.
>Reporter: Donald Smith
>
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
> 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError
> at 
> org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
> at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
> at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
> at 
> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
> at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-09 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790717#comment-13790717
 ] 

Donald Smith commented on CASSANDRA-6152:
-

The test just runs our test suite repeatedly. After a few runs it gets the 
following, with TRACE level. I'll document more later.
{noformat}
DEBUG [Native-Transport-Requests:11] 2013-10-09 11:40:49,459 Message.java (line 
302) Received: PREPARE INSERT INTO 
as_reports.data_report_details(report_id,item_name,item_value) VALUES 
(7bc2a570-a42b-4632-b245-f0db9255ccc3,?,?);, v=1
TRACE [Native-Transport-Requests:11] 2013-10-09 11:40:49,460 
QueryProcessor.java (line 208) Stored prepared statement 
ef4c655e042ffab2c1f0eef1e53a573e with 2 bind markers
DEBUG [Native-Transport-Requests:11] 2013-10-09 11:40:49,460 Tracing.java (line 
157) request complete
DEBUG [Native-Transport-Requests:11] 2013-10-09 11:40:49,460 Message.java (line 
309) Responding: RESULT PREPARED ef4c655e042ffab2c1f0eef1e53a573e 
[item_name(as_reports, data_report_details), 
org.apache.cassandra.db.marshal.UTF8Type][item_value(as_reports, 
data_report_details), org.apache.cassandra.db.marshal.UTF8Type] 
(resultMetadata=[0 columns]), v=1
DEBUG [Native-Transport-Requests:13] 2013-10-09 11:40:49,464 Message.java (line 
302) Received: EXECUTE ef4c655e042ffab2c1f0eef1e53a573e with 2 values at 
consistency ONE, v=1
TRACE [Native-Transport-Requests:13] 2013-10-09 11:40:49,464 
QueryProcessor.java (line 232) [1] 'java.nio.HeapByteBuffer[pos=0 lim=0 cap=0]'
TRACE [Native-Transport-Requests:13] 2013-10-09 11:40:49,464 
QueryProcessor.java (line 232) [2] 'java.nio.HeapByteBuffer[pos=36 lim=41 
cap=43]'
TRACE [Native-Transport-Requests:13] 2013-10-09 11:40:49,464 
QueryProcessor.java (line 97) Process 
org.apache.cassandra.cql3.statements.UpdateStatement@321baa4a @CL.ONE
ERROR [COMMIT-LOG-WRITER] 2013-10-09 11:40:49,465 CassandraDaemon.java (line 
185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
at 
org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
at 
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:722)
DEBUG [Native-Transport-Requests:13] 2013-10-09 11:40:49,466 Tracing.java (line 
157) request complete
DEBUG [Native-Transport-Requests:13] 2013-10-09 11:40:49,466 Message.java (line 
309) Responding: EMPTY RESULT, v=1
{noformat}

> Assertion error in 2.0.1 at 
> db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> ---
>
> Key: CASSANDRA-6152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS release 6.2 (Final)
> With default set up on single node.
> I also saw this exception in 2.0.0 on a three node cluster.
>Reporter: Donald Smith
>
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
> 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError
> at 
> org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
> at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
> at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
> at 
> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
> at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-11 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792877#comment-13792877
 ] 

Donald Smith commented on CASSANDRA-6152:
-

No, the test suite does *not* drop and create new tables (i.e., it does not 
call "DROP TABLE" and "CREATE TABLE").  It deletes tables and re-inserts. I'm 
working right now on submitting a focused example that reproduces the bug.

> Assertion error in 2.0.1 at 
> db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> ---
>
> Key: CASSANDRA-6152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS release 6.2 (Final)
> With default set up on single node.
> I also saw this exception in 2.0.0 on a three node cluster.
>Reporter: Donald Smith
>
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
> 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError
> at 
> org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
> at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
> at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
> at 
> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
> at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-11 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792877#comment-13792877
 ] 

Donald Smith edited comment on CASSANDRA-6152 at 10/11/13 5:56 PM:
---

No, the test suite does *not* drop and create new tables (i.e., it does not 
call "DROP TABLE" and "CREATE TABLE").  It deletes rows from tables and 
re-inserts. I'm working right now on submitting a focused example that 
reproduces the bug.


was (Author: thinkerfeeler):
No, the test suite does *not* drop and create new tables (i.e., it does not 
call "DROP TABLE" and "CREATE TABLE").  It deletes tables and re-inserts. I'm 
working right now on submitting a focused example that reproduces the bug.

> Assertion error in 2.0.1 at 
> db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> ---
>
> Key: CASSANDRA-6152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS release 6.2 (Final)
> With default set up on single node.
> I also saw this exception in 2.0.0 on a three node cluster.
>Reporter: Donald Smith
>
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
> 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError
> at 
> org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
> at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
> at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
> at 
> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
> at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-11 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793025#comment-13793025
 ] 

Donald Smith commented on CASSANDRA-6152:
-

I found a *simple* example of the bug.  

If I insert an empty string ("") into the table it causes the AssertionError. 
If I insert a non-empty string there's no AssertionError!

{noformat}
create keyspace if not exists bug with replication = {'class':'SimpleStrategy', 
'replication_factor':1};


create table if not exists bug.bug_table ( -- compact; column values are 
ordered by item_name
report_id   uuid,
item_name   text,
item_value  text,
primary key (report_id, item_name)) with compact storage;
}
{noformat}

BugMain.java:
{noformat}
package bug;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class BugMain {
private static String CASSANDRA_HOST = 
System.getProperty("cassandraServer","172.17.1.169"); 
//"donalds01lx.uscorp.audsci.com";
private static BugInterface dao = new BugImpl(CASSANDRA_HOST);

public static void bug() throws IOException {
List items = new ArrayList();
items.add(new BugItem("",1,2,3));   // if you change the empty string 
"" to a non-empty string, the AssertionError goes away!
items.add(new BugItem("twp",2,2,3));
items.add(new BugItem("three",3,2,3));
items.add(new BugItem("four",4,2,3));
dao.saveReport(items);
}

public static void main(String [] args) throws IOException { 
   try {
   for(int i=0;i<1000;i++) {
   System.out.println("\ndas: iteration " + i + "\n");
   bug();
   }
   } finally {
   dao.shutdown();
   }
}
}
{noformat}

BugItem.java:
{noformat}
package bug;

public class BugItem {
public String name;
public long long1; 
public long long2;
public long long3; 
public BugItem(String string, long i, long j, long k) {
name=string;
long1 = i;
long2= j;
long3 = k;
}
public String toString() {return "Item with name = " + name + ", long1 = " 
+ long1 + ", long2 = " + long2 + ", long3 = " + long3;}
}
{noformat}

BugInterface.java:
{noformat}
package bug;

import java.util.List;


public interface BugInterface {
public static final String VALUE_DELIMITER = ":";
public static final String HIERARCHY_DELIMITER = " > ";
void saveReport(List item);

void connect();
void shutdown();
}
{noformat}

BugImpl.java:
{noformat}
package bug;

import java.text.NumberFormat;
import java.util.List;
import java.util.UUID;

import org.apache.log4j.Logger;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.PreparedStatement;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.querybuilder.Insert;
import com.datastax.driver.core.querybuilder.QueryBuilder;

public class BugImpl implements BugInterface {
private static final String CASSANDRA_NODE_PROPERTY="CASSANDRA_NODE";
private static final Logger L = Logger.getLogger(new Throwable()
.getStackTrace()[0].getClassName());
private static final String KEYSPACE_NAME = "bug";
private static final String REPORT_DATA_TABLE_NAME = "bug_table";
private static NumberFormat numberFormat = NumberFormat.getInstance();
private Cluster m_cluster;
private Session m_session;
private int m_writeBatchSize = 64;
private String m_cassandraNode = "";

static {
numberFormat.setMaximumFractionDigits(1);
}

public BugImpl() {
m_cassandraNode=System.getProperty(CASSANDRA_NODE_PROPERTY, 
m_cassandraNode); // Get from command line
}
public BugImpl(String cassandraNode) {
m_cassandraNode=cassandraNode;
}
@Override
public void shutdown() {
if (m_session!=null) {m_session.shutdown();}
if (m_cluster!=null) {m_cluster.shutdown();}
}
@Override
public void connect() {
 m_cluster = 
Cluster.builder().addContactPoint(m_cassandraNode).build();
 m_session = m_cluster.connect();
}
// 
-
@Override
public void saveReport(List items) {
final long time1 = System.currentTimeMillis();
if (m_session==null) {
connect();
}
UUID reportId = UUID.randomUUID(); 
saveReportAux(items,reportId);
final long time2 = System.currentTimeMillis();
L.info("saveReport: t=" + 
numberFormat.format((double)(time2-time1) * 0.001) + " seconds");
}

public void 

[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-11 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793070#comment-13793070
 ] 

Donald Smith commented on CASSANDRA-6152:
-

I have a hunch that when the column name is "" and the Memtable flushes to an 
SSTable is when this bug bites.   I notice it happens at about the same 
iteration of the *for* loop in BugMain.java.

> Assertion error in 2.0.1 at 
> db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> ---
>
> Key: CASSANDRA-6152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS release 6.2 (Final)
> With default set up on single node.
> I also saw this exception in 2.0.0 on a three node cluster.
>Reporter: Donald Smith
>Assignee: Sylvain Lebresne
> Fix For: 2.0.2
>
>
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
> 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError
> at 
> org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
> at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
> at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
> at 
> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
> at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6586) Cassandra touches all columns on CQL3 select

2014-01-15 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872555#comment-13872555
 ] 

Donald Smith commented on CASSANDRA-6586:
-

To clarify (and correct me if I'm wrong), this ticket does *not* imply that all 
physical thrift columns (cells) of a physical thrift row (partition) are read 
when you do a CQL select on a CQL primary key. It just means that all columns 
mentioned in the CQL primary key are read. There's still a lot of confusion 
between thrift terminology and CQL terminology.

You can still have wide rows and you still can avoid reading all the (physical 
thrift) columns of that row.

People are still confused by terminology and this scares them unnecessarily.  
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows 
explains the terminology to use, but it's still unclear.

> Cassandra touches all columns on CQL3 select
> 
>
> Key: CASSANDRA-6586
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6586
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jan Chochol
>Priority: Minor
>
> It seems that Cassandra is checking (garbage collecting) all columns of all 
> returned rows, despite the fact that not all columns are requested.
> Example:
> * use following script to fill Cassandra with test data:
> {noformat}
> perl -e "print(\"DROP KEYSPACE t;\nCREATE KEYSPACE t WITH replication = 
> {'class': 'SimpleStrategy', 'replication_factor' : 1};\nuse t;\nCREATE TABLE 
> t (a varchar PRIMARY KEY, b varchar, c varchar, d varchar);\nCREATE INDEX t_b 
> ON t (b);\nCREATE INDEX t_c ON t (c);\nCREATE INDEX t_d ON t (d);\n\");\$max 
> = 200; for(\$i = 0; \$i < \$max; \$i++) { \$j = int(\$i * 10 / \$max); \$k = 
> int(\$i * 100 / \$max); print(\"INSERT INTO t (a, b, c, d) VALUES ('a\$i', 
> 'b\$j', 'c\$k', 'd\$i');\n\")}\n" | cqlsh
> {noformat}
> * turn on {{ALL}} logging for Cassandra
> * issue this query:
> {noformat}
> select a from t where c = 'c1';
> {noformat}
> This is result:
> {noformat}
> [root@jch3-devel:~/c4] cqlsh --no-color
> Connected to C4 Cluster Single at localhost:9160.
> [cqlsh 3.1.7 | Cassandra 1.2.11-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 
> 19.36.1]
> Use HELP for help.
> cqlsh> use t;
> cqlsh:t> select a from t where c = 'c1';
>  a
> 
>  a3
>  a2
> {noformat}
> From Cassandra log:
> {noformat}
> 2014-01-15 09:14:56.663+0100 [Thrift:1] [TRACE] QueryProcessor.java(125) 
> org.apache.cassandra.cql3.QueryProcessor: component=c4 Process 
> org.apache.cassandra.cql3.statements.SelectStatement@614b3189 @CL.ONE
> 2014-01-15 09:14:56.810+0100 [Thrift:1] [TRACE] ReadCallback.java(67) 
> org.apache.cassandra.service.ReadCallback: component=c4 Blockfor is 1; 
> setting up requests to /127.0.0.1
> 2014-01-15 09:14:56.816+0100 [ReadStage:2] [DEBUG] 
> CompositesSearcher.java(112) 
> org.apache.cassandra.db.index.composites.CompositesSearcher: component=c4 
> Most-selective indexed predicate is 't.c EQ c1'
> 2014-01-15 09:14:56.817+0100 [ReadStage:2] [TRACE] 
> ColumnFamilyStore.java(1493) org.apache.cassandra.db.ColumnFamilyStore: 
> component=c4 Filtering 
> org.apache.cassandra.db.index.composites.CompositesSearcher$1@e15911 for rows 
> matching 
> org.apache.cassandra.db.filter.ExtendedFilter$FilterWithCompositeClauses@4a9e6b8a
> 2014-01-15 09:14:56.817+0100 [ReadStage:2] [TRACE] 
> CompositesSearcher.java(237) 
> org.apache.cassandra.db.index.composites.CompositesSearcher: component=c4 
> Scanning index 't.c EQ c1' starting with 
> 2014-01-15 09:14:56.820+0100 [ReadStage:2] [TRACE] SSTableReader.java(776) 
> org.apache.cassandra.io.sstable.SSTableReader: component=c4 Adding cache 
> entry for KeyCacheKey(/mnt/ebs/cassandra/data/t/t/t-t.t_c-ic-1, 6331) -> 
> org.apache.cassandra.db.RowIndexEntry@66a6574b
> 2014-01-15 09:14:56.821+0100 [ReadStage:2] [TRACE] SliceQueryFilter.java(164) 
> org.apache.cassandra.db.filter.SliceQueryFilter: component=c4 collecting 0 of 
> 1: 6133:false:0@1389773577394000
> 2014-01-15 09:14:56.821+0100 [ReadStage:2] [TRACE] SliceQueryFilter.java(164) 
> org.apache.cassandra.db.filter.SliceQueryFilter: component=c4 collecting 1 of 
> 1: 6132:false:0@1389773577391000
> 2014-01-15 09:14:56.822+0100 [ReadStage:2] [TRACE] 
> CompositesSearcher.java(313) 
> org.apache.cassandra.db.index.composites.CompositesSearcher: component=c4 
> Adding index hit to current row for 6133
> 2014-01-15 09:14:56.825+0100 [ReadStage:2] [TRACE] SSTableReader.java(776) 
> org.apache.cassandra.io.sstable.SSTableReader: component=c4 Adding cache 
> entry for KeyCacheKey(/mnt/ebs/cassandra/data/t/t/t-t-ic-1, 6133) -> 
> org.apache.cassandra.db.RowIndexEntry@32ad3193
> 2014-01-15 09:14:56.826+0100 [ReadStage:2] [TRACE] SliceQueryFilter.java(164) 
> org.apache.cassandra.d

[jira] [Created] (CASSANDRA-6611) Allow for FINAL ttls and FINAL (immutable) inserts to eliminate the need for tombstones

2014-01-22 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-6611:
---

 Summary: Allow for FINAL ttls and FINAL (immutable) inserts to 
eliminate the need for tombstones
 Key: CASSANDRA-6611
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6611
 Project: Cassandra
  Issue Type: New Feature
Reporter: Donald Smith


Suppose you're not allowed to update the TTL of a column (cell) -- either 
because CQL is extended to allow syntax like "USING *FINAL* TTL 86400" or 
because there were a table option saying that TTL is immutable.

If you never update the TTL of a column, then there should be no need for 
tombstones at all:  any replicas will have the same TTL.  So there’d be no risk 
of missed deletes.  You wouldn’t even need GCable tombstones.  The purpose of a 
tombstone is to cover the case where a different node was down and it didn’t 
notice the delete and it still had the column and tried to replicate it back; 
but that won’t happen if it too had the TTL.

So, if – and it’s a big if – a table disallowed updates to TTL, then you could 
really optimize deletion of TTLed columns: you could do away with tombstones 
entirely.   If a table allows updates to TTL then it’s possible a different 
node will have the row without the TTL and the tombstone would be needed.

Or am I missing something?

Disallowing updates to rows would seem to enable optimizations in general.   
Write-once, non-updatable rows are a common use case. If cassandra had FINAL 
tables (or FINAL INSERTS) then it could eliminate tombstones for those too. 
Probably other optimizations would be enabled too.






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6611) Allow for FINAL ttls and FINAL (immutable) inserts to eliminate the need for tombstones

2014-01-22 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879340#comment-13879340
 ] 

Donald Smith commented on CASSANDRA-6611:
-

Would it be better, then, to enforce this at the schema level, in a CREATE 
TABLE statement?

> Allow for FINAL ttls and FINAL (immutable) inserts to eliminate the need for 
> tombstones
> ---
>
> Key: CASSANDRA-6611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6611
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Donald Smith
>
> Suppose you're not allowed to update the TTL of a column (cell) -- either 
> because CQL is extended to allow syntax like "USING *FINAL* TTL 86400" or 
> because there were a table option saying that TTL is immutable.
> If you never update the TTL of a column, then there should be no need for 
> tombstones at all:  any replicas will have the same TTL.  So there’d be no 
> risk of missed deletes.  You wouldn’t even need GCable tombstones.  The 
> purpose of a tombstone is to cover the case where a different node was down 
> and it didn’t notice the delete and it still had the column and tried to 
> replicate it back; but that won’t happen if it too had the TTL.
> So, if – and it’s a big if – a table disallowed updates to TTL, then you 
> could really optimize deletion of TTLed columns: you could do away with 
> tombstones entirely.   If a table allows updates to TTL then it’s possible a 
> different node will have the row without the TTL and the tombstone would be 
> needed.
> Or am I missing something?
> Disallowing updates to rows would seem to enable optimizations in general.   
> Write-once, non-updatable rows are a common use case. If cassandra had FINAL 
> tables (or FINAL INSERTS) then it could eliminate tombstones for those too. 
> Probably other optimizations would be enabled too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6611) Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the need for tombstones

2014-01-22 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-6611:


Summary: Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the 
need for tombstones  (was: Allow for FINAL ttls and FINAL (immutable) inserts 
to eliminate the need for tombstones)

> Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the need for 
> tombstones
> -
>
> Key: CASSANDRA-6611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6611
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Donald Smith
>
> Suppose you're not allowed to update the TTL of a column (cell) -- either 
> because CQL is extended to allow syntax like "USING *FINAL* TTL 86400" or 
> because there were a table option saying that TTL is immutable.
> If you never update the TTL of a column, then there should be no need for 
> tombstones at all:  any replicas will have the same TTL.  So there’d be no 
> risk of missed deletes.  You wouldn’t even need GCable tombstones.  The 
> purpose of a tombstone is to cover the case where a different node was down 
> and it didn’t notice the delete and it still had the column and tried to 
> replicate it back; but that won’t happen if it too had the TTL.
> So, if – and it’s a big if – a table disallowed updates to TTL, then you 
> could really optimize deletion of TTLed columns: you could do away with 
> tombstones entirely.   If a table allows updates to TTL then it’s possible a 
> different node will have the row without the TTL and the tombstone would be 
> needed.
> Or am I missing something?
> Disallowing updates to rows would seem to enable optimizations in general.   
> Write-once, non-updatable rows are a common use case. If cassandra had FINAL 
> tables (or FINAL INSERTS) then it could eliminate tombstones for those too. 
> Probably other optimizations would be enabled too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6611) Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the need for tombstones

2014-01-22 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879464#comment-13879464
 ] 

Donald Smith commented on CASSANDRA-6611:
-

I see. But setting gc_grace_seconds to zero will affect deletes other than TTL 
expirations, won't it? So, I want something in the TABLE declaration that 
states this more declaratively.

> Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the need for 
> tombstones
> -
>
> Key: CASSANDRA-6611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6611
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Donald Smith
>
> Suppose you're not allowed to update the TTL of a column (cell) -- either 
> because CQL is extended to allow syntax like "USING *FINAL* TTL 86400" or 
> because there were a table option saying that TTL is immutable.
> If you never update the TTL of a column, then there should be no need for 
> tombstones at all:  any replicas will have the same TTL.  So there’d be no 
> risk of missed deletes.  You wouldn’t even need GCable tombstones.  The 
> purpose of a tombstone is to cover the case where a different node was down 
> and it didn’t notice the delete and it still had the column and tried to 
> replicate it back; but that won’t happen if it too had the TTL.
> So, if – and it’s a big if – a table disallowed updates to TTL, then you 
> could really optimize deletion of TTLed columns: you could do away with 
> tombstones entirely.   If a table allows updates to TTL then it’s possible a 
> different node will have the row without the TTL and the tombstone would be 
> needed.
> Or am I missing something?
> Disallowing updates to rows would seem to enable optimizations in general.   
> Write-once, non-updatable rows are a common use case. If cassandra had FINAL 
> tables (or FINAL INSERTS) then it could eliminate tombstones for those too. 
> Probably other optimizations would be enabled too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (CASSANDRA-6929) Corrupted Index File: read 8599 but expected 8600 chunks.

2014-03-25 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-6929:
---

 Summary: Corrupted Index File: read 8599 but expected 8600 chunks.
 Key: CASSANDRA-6929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6929
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith


I have a 3 node cassandra cluster running 2.0.6 (we started at 2.0.1). It has 
several terabytes of data. We've been seeing exceptions in system.log like 
"Corrupted Index File ... read 21478 but expected 21479 chunks."



Here's a stack trace from one server:
{noformat}
 INFO [CompactionExecutor:9109] 2014-03-24 06:55:28,148 ColumnFamilyStore.java 
(line 785) Enqueuing flush of Memtable-compactions_in_progress@1299803435(0/0 
serialized/live bytes, 1 ops)
 INFO [FlushWriter:496] 2014-03-24 06:55:28,148 Memtable.java (line 331) 
Writing Memtable-compactions_in_progress@1299803435(0/0 serialized/live bytes, 
1 ops)
 INFO [FlushWriter:496] 2014-03-24 06:55:28,299 Memtable.java (line 371) 
Completed flushing 
/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12862-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1395195644764, 
position=17842243)
 INFO [CompactionExecutor:9142] 2014-03-24 06:55:28,299 CompactionTask.java 
(line 115) Compacting 
[SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12861-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12860-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12858-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12862-Data.db')]
ERROR [CompactionExecutor:9109] 2014-03-24 06:55:28,302 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:9109,1,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
Corrupted Index File 
/mnt/cassandra-storage/data/as_reports/data_hierarchy_details/as_reports-data_hierarchy_details-jb-55104-CompressionInfo.db:
 read 21478 but expected 21479 chunks.
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:152)
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:64)
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:330)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:204)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Corrupted Index File 
/mnt/cassandra-storage/data/as_reports/data_hierarchy_details/as_reports-data_hierarchy_details-jb-55104-CompressionInfo.db:
 read 21478 but expected 21479 chunks.
... 16 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readLong(Unknown Source)
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:146)
... 15 more
 INFO [CompactionExecutor:9142] 2014-03-24 06:55:28,739 CompactionTask.java 
(line 275) Compacted 4 sstables to 
[/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12863,].
  571 bytes to 42 (~7% of original) in 439ms = 0.91MB/s.  4 total 
partitions merged to 1.  Partition merge counts were {2:2, }
{noformat}
Here's another example:



{noformat}
 INFO [CompactionExecutor:9566] 2014-03-25 06:32:02,234 ColumnFamilyStore.java 
(line 785) Enqueuing flush of Memtable-compactions_in_progress@1216289160(0/0 
serialized/live bytes, 1 ops)
 INFO [FlushWriter:474] 2014-03-25 06:32:02,234 Memtable.java (line 331)

[jira] [Updated] (CASSANDRA-6929) Corrupted Index File: read 8599 but expected 8600 chunks.

2014-03-25 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-6929:


Description: 
I have a 3 node cassandra cluster running 2.0.6 (we started at 2.0.1). It has 
several terabytes of data. We've been seeing exceptions in system.log like 
"Corrupted Index File ... read 21478 but expected 21479 chunks."



Here's a stack trace from one server:
{noformat}
 INFO [CompactionExecutor:9109] 2014-03-24 06:55:28,148 ColumnFamilyStore.java 
(line 785) Enqueuing flush of Memtable-compactions_in_progress@1299803435(0/0 
serialized/live bytes, 1 ops)
 INFO [FlushWriter:496] 2014-03-24 06:55:28,148 Memtable.java (line 331) 
Writing Memtable-compactions_in_progress@1299803435(0/0 serialized/live bytes, 
1 ops)
 INFO [FlushWriter:496] 2014-03-24 06:55:28,299 Memtable.java (line 371) 
Completed flushing 
/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12862-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1395195644764, 
position=17842243)
 INFO [CompactionExecutor:9142] 2014-03-24 06:55:28,299 CompactionTask.java 
(line 115) Compacting 
[SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12861-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12860-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12858-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12862-Data.db')]
ERROR [CompactionExecutor:9109] 2014-03-24 06:55:28,302 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:9109,1,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
Corrupted Index File 
/mnt/cassandra-storage/data/as_reports/data_hierarchy_details/as_reports-data_hierarchy_details-jb-55104-CompressionInfo.db:
 read 21478 but expected 21479 chunks.
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:152)
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:64)
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:330)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:204)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Corrupted Index File 
/mnt/cassandra-storage/data/as_reports/data_hierarchy_details/as_reports-data_hierarchy_details-jb-55104-CompressionInfo.db:
 read 21478 but expected 21479 chunks.
... 16 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readLong(Unknown Source)
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:146)
... 15 more
 INFO [CompactionExecutor:9142] 2014-03-24 06:55:28,739 CompactionTask.java 
(line 275) Compacted 4 sstables to 
[/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12863,].
  571 bytes to 42 (~7% of original) in 439ms = 0.91MB/s.  4 total 
partitions merged to 1.  Partition merge counts were {2:2, }
{noformat}
Here's another example:



{noformat}
 INFO [CompactionExecutor:9566] 2014-03-25 06:32:02,234 ColumnFamilyStore.java 
(line 785) Enqueuing flush of Memtable-compactions_in_progress@1216289160(0/0 
serialized/live bytes, 1 ops)
 INFO [FlushWriter:474] 2014-03-25 06:32:02,234 Memtable.java (line 331) 
Writing Memtable-compactions_in_progress@1216289160(0/0 serialized/live bytes, 
1 ops)
 INFO [FlushWriter:474] 2014-03-25 06:32:02,445 Mem

[jira] [Created] (CASSANDRA-7034) commitlog files are 32MB in size, even with a 64bit OS and jvm

2014-04-14 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-7034:
---

 Summary: commitlog files are 32MB in size, even with a 64bit  OS 
and jvm
 Key: CASSANDRA-7034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7034
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith


We did a rpm install of cassandra 2.0.6 on CentOS 6.4 running 
{noformat}
> java -version
Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
{noformat}
That is the version of java CassandraDaemon is using.

We used the default setting (None) in cassandra.yaml for 
commitlog_total_space_in_mb:
{noformat}
# Total space to use for commitlogs.  Since commitlog segments are
# mmapped, and hence use up address space, the default size is 32
# on 32-bit JVMs, and 1024 on 64-bit JVMs.
#
# If space gets above this value (it will round up to the next nearest
# segment multiple), Cassandra will flush every dirty CF in the oldest
# segment and remove it.  So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
# commitlog_total_space_in_mb: 4096
{noformat}
But our commitlog files are 32MB in size, not 1024MB.

OpsCenter confirms that commitlog_total_space_in_mb is None.

I don't think the problem is in cassandra-env.sh, because when I run it 
manually and echo the  values of the version variables I get:
{noformat}
jvmver=1.7.0_40
JVM_VERSION=1.7.0
JVM_ARCH=64-Bit
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7034) commitlog files are 32MB in size, even with a 64bit OS and jvm

2014-04-15 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969886#comment-13969886
 ] 

Donald Smith commented on CASSANDRA-7034:
-

Benedict, I'm aware that *commitlog_total_space_in_mb* has that purpose.  What 
I'm raising is the issue that this comment in cassandra,yaml is now wrong: "the 
default size is 32 on 32-bit JVMs, and 1024 on 64-bit JVMs.."  That's no longer 
being enforced.

> commitlog files are 32MB in size, even with a 64bit  OS and jvm
> ---
>
> Key: CASSANDRA-7034
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7034
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Donald Smith
>
> We did a rpm install of cassandra 2.0.6 on CentOS 6.4 running 
> {noformat}
> > java -version
> Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
> Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
> {noformat}
> That is the version of java CassandraDaemon is using.
> We used the default setting (None) in cassandra.yaml for 
> commitlog_total_space_in_mb:
> {noformat}
> # Total space to use for commitlogs.  Since commitlog segments are
> # mmapped, and hence use up address space, the default size is 32
> # on 32-bit JVMs, and 1024 on 64-bit JVMs.
> #
> # If space gets above this value (it will round up to the next nearest
> # segment multiple), Cassandra will flush every dirty CF in the oldest
> # segment and remove it.  So a small total commitlog space will tend
> # to cause more flush activity on less-active columnfamilies.
> # commitlog_total_space_in_mb: 4096
> {noformat}
> But our commitlog files are 32MB in size, not 1024MB.
> OpsCenter confirms that commitlog_total_space_in_mb is None.
> I don't think the problem is in cassandra-env.sh, because when I run it 
> manually and echo the  values of the version variables I get:
> {noformat}
> jvmver=1.7.0_40
> JVM_VERSION=1.7.0
> JVM_ARCH=64-Bit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-6215) Possible space leak in datastax.driver.core

2013-10-17 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-6215:
---

 Summary: Possible space leak in datastax.driver.core
 Key: CASSANDRA-6215
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6215
 Project: Cassandra
  Issue Type: Bug
  Components: Drivers (now out of tree)
 Environment: CentOS 6.4
Reporter: Donald Smith


I wrote a java benchmark app that uses CQL cassandra-driver-core:1.0.3  and 
repeatedly saves to column families using code like:
{noformat}
   final Insert writeReportInfo = QueryBuilder.insertInto(KEYSPACE_NAME, 
REPORT_INFO_TABLE_NAME).value("type",report.type.toString()).value(...) ...

m_session.execute(writeReportInfo);
{noformat}
After running for about an hour, with -Xmx2000m, and writing about 20,000 
reports (each with about 1 rows), it got: java.lang.OutOfMemoryError: Java 
heap space.

Using jmap and jhat I can see that the objects taking up space are 
{noformat}
 Instance Counts for All Classes (excluding platform)
1657280 instances of class com.datastax.driver.core.ColumnDefinitions$Definition
31628 instances of class com.datastax.driver.core.ColumnDefinitions
31628 instances of class 
[Lcom.datastax.driver.core.ColumnDefinitions$Definition;
31627 instances of class com.datastax.driver.core.PreparedStatement
31627 instances of class org.apache.cassandra.utils.MD5Digest 
...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6215) Possible space leak in datastax.driver.core

2013-10-18 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799228#comment-13799228
 ] 

Donald Smith commented on CASSANDRA-6215:
-

Created https://datastax-oss.atlassian.net/browse/JAVA-201

> Possible space leak in datastax.driver.core
> ---
>
> Key: CASSANDRA-6215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6215
> Project: Cassandra
>  Issue Type: Bug
>  Components: Drivers (now out of tree)
> Environment: CentOS 6.4
>Reporter: Donald Smith
>
> I wrote a java benchmark app that uses CQL cassandra-driver-core:1.0.3  and 
> repeatedly saves to column families using code like:
> {noformat}
>final Insert writeReportInfo = QueryBuilder.insertInto(KEYSPACE_NAME, 
> REPORT_INFO_TABLE_NAME).value("type",report.type.toString()).value(...) ...
> m_session.execute(writeReportInfo);
> {noformat}
> After running for about an hour, with -Xmx2000m, and writing about 20,000 
> reports (each with about 1 rows), it got: java.lang.OutOfMemoryError: 
> Java heap space.
> Using jmap and jhat I can see that the objects taking up space are 
> {noformat}
>  Instance Counts for All Classes (excluding platform)
> 1657280 instances of class 
> com.datastax.driver.core.ColumnDefinitions$Definition
> 31628 instances of class com.datastax.driver.core.ColumnDefinitions
> 31628 instances of class 
> [Lcom.datastax.driver.core.ColumnDefinitions$Definition;
> 31627 instances of class com.datastax.driver.core.PreparedStatement
> 31627 instances of class org.apache.cassandra.utils.MD5Digest 
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith commented on CASSANDRA-5220:
-

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column family, as reported by "nodetool cfstats":
{n

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855778#comment-13855778
 ] 

Donald Smith commented on CASSANDRA-5351:
-

As reported in https://issues.apache.org/jira/browse/CASSANDRA-5220, we ran 
"nodetool repair" on a 3-node cassandra v. 2.0.3 cluster using production 
hardware, on realistic test data. Each node has about 1TB of data. After over 
five days, the repair job is still running.

> Avoid repairing already-repaired data by default
> 
>
> Key: CASSANDRA-5351
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
> Project: Cassandra
>  Issue Type: Task
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Lyuben Todorov
>  Labels: repair
> Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, 
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been 
> successfully repaired, and only repairing sstables new since the last repair. 
>  (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired 
> data together with non-repaired.  So we should segregate unrepaired sstables 
> from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 6:21 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:22 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:26 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:30 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:49 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Commented] (CASSANDRA-5396) Repair process is a joke leading to a downward spiralling and eventually unusable cluster

2013-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858966#comment-13858966
 ] 

Donald Smith commented on CASSANDRA-5396:
-

We ran "nodetool repair -pr" on one node of a three node cluster running on 
production-quality hardware, each node with about 1TB of data. It was using 
cassandra version 2.0.3. After 5 days it was still running and had apparently 
frozen.  See https://issues.apache.org/jira/browse/CASSANDRA-5220 (Dec 23 
comment by Donald Smith) for more detail.  We tried running repair on our 
smallest column family (with 12G of data), and it took 31 hours to complete.
We're not yet in production but we plan on not running repair, since we do very 
few deletes or updates and since we don't trust it.

> Repair process is a joke leading to a downward spiralling and eventually 
> unusable cluster
> -
>
> Key: CASSANDRA-5396
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5396
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.3
> Environment: all
>Reporter: David Berkman
>Priority: Critical
>
> Let's review the repair process...
> 1) It's mandatory to run repair.
> 2) Repair has a high impact and can take hours.
> 3) Repair provides no estimation of completion time and no progress indicator.
> 4) Repair is extremely fragile, and can fail to complete, or become stuck 
> quite easily in real operating environments.
> 5) When repair fails it provides no feedback whatsoever of the problem or 
> possible resolution.
> 6) A failed repair operation saddles the effected nodes with a huge amount of 
> extra data (judging from node size).
> 7) There is no way to rid the node of the extra data associated with a failed 
> repair short of completely rebuilding the node.
> 8) The extra data from a failed repair makes any subsequent repair take 
> longer and increases the likelihood that it will simply become stuck or fail, 
> leading to yet more node corruption.
> 9) Eventually no repair operation will complete successfully, and node 
> operations will eventually become impacted leading to a failing cluster.
> Who would design such a system for a service meant to operate as a fault 
> tolerant clustered data store operating on a lot of commodity hardware?
> Solution...
> 1) Repair must be robust.
> 2) Repair must *never* become 'stuck'.
> 3) Failure to complete must result in reasonable feedback.
> 4) Failure to complete must not result in a node whose state is worse than 
> before the operation began.
> 5) Repair must provide some means of determining completion percentage.
> 6) It would be nice if repair could estimate its run time, even if it could 
> do so only based upon previous runs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (CASSANDRA-5396) Repair process is a joke leading to a downward spiralling and eventually unusable cluster

2013-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858966#comment-13858966
 ] 

Donald Smith edited comment on CASSANDRA-5396 at 12/30/13 6:14 PM:
---

We ran "nodetool repair -pr" on one node of a three node cluster running on 
production-quality hardware, each node with about 1TB of data. It was using 
cassandra version 2.0.3. After 5 days it was still running and had apparently 
frozen.  See https://issues.apache.org/jira/browse/CASSANDRA-5220 (Dec 23 
comment by Donald Smith) for more detail.  We tried running repair on our 
smallest column family (with 12G of data), and it took 31 hours to complete.
We're not yet in production but we plan on not running repair, since we do very 
few deletes or updates and since we don't trust it. Also, our data isn't 
critical.


was (Author: thinkerfeeler):
We ran "nodetool repair -pr" on one node of a three node cluster running on 
production-quality hardware, each node with about 1TB of data. It was using 
cassandra version 2.0.3. After 5 days it was still running and had apparently 
frozen.  See https://issues.apache.org/jira/browse/CASSANDRA-5220 (Dec 23 
comment by Donald Smith) for more detail.  We tried running repair on our 
smallest column family (with 12G of data), and it took 31 hours to complete.
We're not yet in production but we plan on not running repair, since we do very 
few deletes or updates and since we don't trust it.

> Repair process is a joke leading to a downward spiralling and eventually 
> unusable cluster
> -
>
> Key: CASSANDRA-5396
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5396
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.3
> Environment: all
>Reporter: David Berkman
>Priority: Critical
>
> Let's review the repair process...
> 1) It's mandatory to run repair.
> 2) Repair has a high impact and can take hours.
> 3) Repair provides no estimation of completion time and no progress indicator.
> 4) Repair is extremely fragile, and can fail to complete, or become stuck 
> quite easily in real operating environments.
> 5) When repair fails it provides no feedback whatsoever of the problem or 
> possible resolution.
> 6) A failed repair operation saddles the effected nodes with a huge amount of 
> extra data (judging from node size).
> 7) There is no way to rid the node of the extra data associated with a failed 
> repair short of completely rebuilding the node.
> 8) The extra data from a failed repair makes any subsequent repair take 
> longer and increases the likelihood that it will simply become stuck or fail, 
> leading to yet more node corruption.
> 9) Eventually no repair operation will complete successfully, and node 
> operations will eventually become impacted leading to a failing cluster.
> Who would design such a system for a service meant to operate as a fault 
> tolerant clustered data store operating on a lot of commodity hardware?
> Solution...
> 1) Repair must be robust.
> 2) Repair must *never* become 'stuck'.
> 3) Failure to complete must result in reasonable feedback.
> 4) Failure to complete must not result in a node whose state is worse than 
> before the operation began.
> 5) Repair must provide some means of determining completion percentage.
> 6) It would be nice if repair could estimate its run time, even if it could 
> do so only based upon previous runs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)