[jira] [Commented] (CASSANDRA-2116) Separate out filesystem errors from generic IOErrors

2011-05-17 Thread Chris Goffinet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035111#comment-13035111
 ] 

Chris Goffinet commented on CASSANDRA-2116:
---

Unfortunately the best we can get is IOError from Java. For example we use this 
patch to actually detect when our raid array dies, the OS will tell java to 
throw IOError. I think we should error on the side of, if data is corrupt, we 
should let the operator decide what mode he wants. For us, any errors or any 
corruption of data, we want to take the node out right away.

We have been testing this in production for awhile and it works really well 
when disks die, and we also did tests involving removing drives from the system 
while it was serving traffic. 

The Read/Write classes was a similar idea of how the Hadoop code base handles 
this very issue.


> Separate out filesystem errors from generic IOErrors
> 
>
> Key: CASSANDRA-2116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2116
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Goffinet
>Priority: Minor
> Fix For: 1.0
>
> Attachments: 
> 0001-Separate-out-filesystem-errors-from-generic-IOErrors.patch
>
>
> We throw IOErrors everywhere today in the codebase. We should separate out 
> specific errors such as (reading, writing) from filesystem into FSReadError 
> and FSWriteError. This makes it possible in the next ticket to allow certain 
> failure modes (kill the server if reads or writes fail to disk).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2616) Add "DROP INDEX" command to CLI

2011-05-17 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035108#comment-13035108
 ] 

Pavel Yaskevich commented on CASSANDRA-2616:


Oh, I didn't know that Jackson created an issue about that (I was deleting 
those files because we ran into issue of indexes not being dropped properly). I 
will simplify everything then, thanks for pointing it out!

> Add "DROP INDEX" command to CLI
> ---
>
> Key: CASSANDRA-2616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2616
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 0.8.1
>
> Attachments: CASSANDRA-2616.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2616) Add "DROP INDEX" command to CLI

2011-05-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035107#comment-13035107
 ] 

Jonathan Ellis commented on CASSANDRA-2616:
---

If you just take the index definition out of the metadata, Cassandra will do 
the right thing (and mark those sstables deleted).  See CFS.reload / 
CFS.removeIndex (and CASSANDRA-2619 which fixed some bugs here).

> Add "DROP INDEX" command to CLI
> ---
>
> Key: CASSANDRA-2616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2616
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 0.8.1
>
> Attachments: CASSANDRA-2616.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2641) AbstractBounds.normalize should deal with overlapping ranges

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2641:
--

 Reviewer: slebresne
Fix Version/s: (was: 1.0)
   0.8.1

> AbstractBounds.normalize should deal with overlapping ranges
> 
>
> Key: CASSANDRA-2641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2641
> Project: Cassandra
>  Issue Type: Test
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>Priority: Minor
> Fix For: 0.8.1
>
> Attachments: 0001-Assert-non-overlapping-ranges-in-normalize.txt, 
> 0002-Don-t-use-overlapping-ranges-in-tests.txt
>
>
> Apparently no consumers have encountered it in production, but 
> AbstractBounds.normalize does not handle overlapping ranges. If given 
> overlapping ranges, the output will be sorted but still overlapping, for 
> which SSTableReader.getPositionsForRanges will choose ranges in an SSTable 
> that may overlap.
> We should either add an assert in normalize(), or in getPositionsForRanges() 
> to ensure that this never bites us in production.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2116) Separate out filesystem errors from generic IOErrors

2011-05-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035102#comment-13035102
 ] 

Jonathan Ellis commented on CASSANDRA-2116:
---

I'm not sure having different classes for read/write errors is necessary (code 
that is in a position to catch-and-do-something-reasonable knows what kind of 
op it's attempting). On the other hand, if a write op does a read as part of 
its implementation (indexes cause this to happen) we might need to distinguish 
the two.

I think it's more useful to distinguish between recoverable errors and non-: "I 
got EOF earlier than I thought" usually means the file is corrupt not the disk 
is dead.  (I can't think of any read errors that absolutely mean disk-is-dead.)

It would be useful to get some use out of Java's misguided checked exceptions, 
by keeping recoverable errors checked (IOException) and unrecoverable ones 
unchecked (IOError).

> Separate out filesystem errors from generic IOErrors
> 
>
> Key: CASSANDRA-2116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2116
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Goffinet
>Priority: Minor
> Fix For: 1.0
>
> Attachments: 
> 0001-Separate-out-filesystem-errors-from-generic-IOErrors.patch
>
>
> We throw IOErrors everywhere today in the codebase. We should separate out 
> specific errors such as (reading, writing) from filesystem into FSReadError 
> and FSWriteError. This makes it possible in the next ticket to allow certain 
> failure modes (kill the server if reads or writes fail to disk).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2616) Add "DROP INDEX" command to CLI

2011-05-17 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035101#comment-13035101
 ] 

Pavel Yaskevich commented on CASSANDRA-2616:


Ok, should we delete SSTables used by index or let them be?

> Add "DROP INDEX" command to CLI
> ---
>
> Key: CASSANDRA-2616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2616
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 0.8.1
>
> Attachments: CASSANDRA-2616.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2616) Add "DROP INDEX" command to CLI

2011-05-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035099#comment-13035099
 ] 

Jonathan Ellis commented on CASSANDRA-2616:
---

IMO we should do this at the client level (by creating appropriate metadata 
objects) not by adding a new thrift call.

> Add "DROP INDEX" command to CLI
> ---
>
> Key: CASSANDRA-2616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2616
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 0.8.1
>
> Attachments: CASSANDRA-2616.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry

2011-05-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035098#comment-13035098
 ] 

Jonathan Ellis commented on CASSANDRA-2644:
---

bq. But there are still cases that retries will recover from... flapping/down 
nodes

Fair enough, but increasing the timeout is still unwarranted.  Let's just make 
it wait for max(DEFAULT_TIMEOUT, BOOTSTRAP_TIMEOUT) with B_T equal to, say, 30s.

Committed patch 01 to 0.8.1 branch, btw.

> Make bootstrap retry
> 
>
> Key: CASSANDRA-2644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2644
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.0 beta 2
>Reporter: Chris Goffinet
>Assignee: Chris Goffinet
> Fix For: 0.8.1
>
> Attachments: 
> 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, 
> 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch
>
>
> We ran into a situation where we had rpc_timeout set to 1 second, and the 
> node needing to compute the token took over a second (1.6 seconds). The 
> bootstrapping node hangs forever without getting a token because the expiring 
> map removes it before the reply comes back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2481) C* .deb installs C* init.d scripts such that C* comes up before mdadm and related

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2481:
--

Affects Version/s: (was: 0.7.0)
Fix Version/s: 0.8.0

> C* .deb installs C* init.d scripts such that C* comes up before mdadm and 
> related
> -
>
> Key: CASSANDRA-2481
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2481
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Matthew F. Dennis
>Assignee: paul cannon
>Priority: Minor
> Fix For: 0.7.6, 0.8.0
>
> Attachments: 2481.txt
>
>
> the C* .deb packages install the init.d scripts at S20 which is before mdadm 
> and various other services.  This means that when a node reboots that C* is 
> started before the RAID sets are up and mounted causing C* to think it has no 
> data and attempt bootstrapping again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


svn commit: r1104598 - in /cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra: net/MessagingService.java utils/ExpiringMap.java

2011-05-17 Thread jbellis
Author: jbellis
Date: Tue May 17 22:14:35 2011
New Revision: 1104598

URL: http://svn.apache.org/viewvc?rev=1104598&view=rev
Log:
add per-callback timeouts to ExpiringMap

Modified:

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/net/MessagingService.java

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/utils/ExpiringMap.java

Modified: 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/net/MessagingService.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/net/MessagingService.java?rev=1104598&r1=1104597&r2=1104598&view=diff
==
--- 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/net/MessagingService.java
 (original)
+++ 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/net/MessagingService.java
 Tue May 17 22:14:35 2011
@@ -83,6 +83,7 @@ public final class MessagingService impl
 private final SimpleCondition listenGate;
 private final Map droppedMessages = 
new EnumMap(StorageService.Verb.class);
 private final List subscribers = new 
ArrayList();
+private static final long DEFAULT_CALLBACK_TIMEOUT = (long) (1.1 * 
DatabaseDescriptor.getRpcTimeout());
 
 {
 for (StorageService.Verb verb : StorageService.Verb.values())
@@ -121,7 +122,7 @@ public final class MessagingService impl
 return null;
 }
 };
-callbacks = new ExpiringMap>((long) (1.1 * DatabaseDescriptor.getRpcTimeout()), 
timeoutReporter);
+callbacks = new ExpiringMap>(DEFAULT_CALLBACK_TIMEOUT, timeoutReporter);
 
 MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
 try
@@ -256,7 +257,12 @@ public final class MessagingService impl
 
 private void addCallback(IMessageCallback cb, String messageId, 
InetAddress to)
 {
-Pair previous = 
callbacks.put(messageId, new Pair(to, cb));
+addCallback(cb, messageId, to, DEFAULT_CALLBACK_TIMEOUT);
+}
+
+private void addCallback(IMessageCallback cb, String messageId, 
InetAddress to, long timeout)
+{
+Pair previous = 
callbacks.put(messageId, new Pair(to, cb), 
timeout);
 assert previous == null;
 }
 
@@ -267,6 +273,14 @@ public final class MessagingService impl
 return Integer.toString(idGen.incrementAndGet());
 }
 
+/*
+ * @see #sendRR(Message message, InetAddress to, IMessageCallback cb, long 
timeout)
+ */
+public String sendRR(Message message, InetAddress to, IMessageCallback cb)
+{
+return sendRR(message, to, cb, DEFAULT_CALLBACK_TIMEOUT);
+}
+
 /**
  * Send a message to a given endpoint. This method specifies a callback
  * which is invoked with the actual response.
@@ -275,12 +289,13 @@ public final class MessagingService impl
  * @param cb callback interface which is used to pass the responses or
  *   suggest that a timeout occurred to the invoker of the send().
  *   suggest that a timeout occurred to the invoker of the send().
+ * @param timeout the timeout used for expiration
  * @return an reference to message id used to match with the result
  */
-public String sendRR(Message message, InetAddress to, IMessageCallback cb)
+public String sendRR(Message message, InetAddress to, IMessageCallback cb, 
long timeout)
 {
 String id = nextId();
-addCallback(cb, id, to);
+addCallback(cb, id, to, timeout);
 sendOneWay(message, id, to);
 return id;
 }
@@ -624,4 +639,9 @@ public final class MessagingService impl
 completedTasks.put(entry.getKey().getHostAddress(), 
entry.getValue().ackCon.getCompletedMesssages());
 return completedTasks;
 }
+
+public static long getDefaultCallbackTimeout()
+{
+return DEFAULT_CALLBACK_TIMEOUT;
+}
 }

Modified: 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/utils/ExpiringMap.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/utils/ExpiringMap.java?rev=1104598&r1=1104597&r2=1104598&view=diff
==
--- 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/utils/ExpiringMap.java
 (original)
+++ 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/utils/ExpiringMap.java
 Tue May 17 22:14:35 2011
@@ -32,11 +32,13 @@ public class ExpiringMap
 {
 private final T value;
 private final long age;
+private final long expiration;
 
-CacheableObject(T o)
+CacheableObject(T o, long e)
 {
 assert o != null;
 value = o;
+expiration = e;
 age = System.currentTimeMillis();
 }
 
@@ -45,26 +47,21 @@ public class ExpiringMap
 return value;

svn commit: r1104597 - in /cassandra/branches/cassandra-0.8.1: ./ conf/ contrib/ debian/ doc/cql/ drivers/py/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/cli/

2011-05-17 Thread jbellis
Author: jbellis
Date: Tue May 17 22:12:59 2011
New Revision: 1104597

URL: http://svn.apache.org/viewvc?rev=1104597&view=rev
Log:
merge from 0.8

Removed:
cassandra/branches/cassandra-0.8.1/doc/cql/CQL.html
Modified:
cassandra/branches/cassandra-0.8.1/   (props changed)
cassandra/branches/cassandra-0.8.1/CHANGES.txt
cassandra/branches/cassandra-0.8.1/NEWS.txt
cassandra/branches/cassandra-0.8.1/build.xml
cassandra/branches/cassandra-0.8.1/conf/cassandra.yaml
cassandra/branches/cassandra-0.8.1/contrib/   (props changed)
cassandra/branches/cassandra-0.8.1/debian/changelog
cassandra/branches/cassandra-0.8.1/debian/init
cassandra/branches/cassandra-0.8.1/debian/rules
cassandra/branches/cassandra-0.8.1/doc/cql/CQL.textile
cassandra/branches/cassandra-0.8.1/drivers/py/cqlsh
cassandra/branches/cassandra-0.8.1/drivers/py/setup.py

cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/cli/Cli.g

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/cli/CliClient.java
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/cql/Cql.g

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/ColumnFamilyStore.java

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/CompactionManager.java

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/HintedHandOffManager.java

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/io/sstable/SSTable.java

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/io/util/BufferedRandomAccessFile.java

cassandra/branches/cassandra-0.8.1/src/resources/org/apache/cassandra/cli/CliHelp.yaml

cassandra/branches/cassandra-0.8.1/test/unit/org/apache/cassandra/cli/CliTest.java

Propchange: cassandra/branches/cassandra-0.8.1/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 22:12:59 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000
-/cassandra/branches/cassandra-0.7:1026516-1102046,1102337
+/cassandra/branches/cassandra-0.7:1026516-1104371
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
-/cassandra/branches/cassandra-0.8:1090935-1102339,1102345
+/cassandra/branches/cassandra-0.8:1090935-1104595
 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689
 /incubator/cassandra/branches/cassandra-0.3:774578-796573
 /incubator/cassandra/branches/cassandra-0.4:810145-834239,834349-834350

Modified: cassandra/branches/cassandra-0.8.1/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8.1/CHANGES.txt?rev=1104597&r1=1104596&r2=1104597&view=diff
==
--- cassandra/branches/cassandra-0.8.1/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8.1/CHANGES.txt Tue May 17 22:12:59 2011
@@ -10,6 +10,14 @@
(CASSANDRA-2583)
 
 
+0.8-?
+ * adjust hinted handoff page size to avoid OOM with large columns 
+   (CASSANDRA-2652)
+ * update CQL consistency levels (CASSANDRA-2566)
+ * mark BRAF buffer invalid post-flush so we don't re-flush partial
+   buffers again, especially on CL writes (CASSANDRA-2660)
+
+
 0.8.0-rc1
  * faster flushes and compaction from fixing excessively pessimistic 
rebuffering in BRAF (CASSANDRA-2581)
@@ -37,8 +45,10 @@
  * initialize local ep state prior to gossip startup if needed (CASSANDRA-2638)
  * fix counter increment lost after restart (CASSANDRA-2642)
  * add quote-escaping via backslash to CLI (CASSANDRA-2623)
- * fig pig example script (CASSANDRA-2487)
+ * fix pig example script (CASSANDRA-2487)
  * fix dynamic snitch race in adding latencies (CASSANDRA-2618)
+ * Start/stop cassandra after more important services such as mdadm in
+   debian packaging (CASSANDRA-2481)
 
 
 0.8.0-beta2

Modified: cassandra/branches/cassandra-0.8.1/NEWS.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8.1/NEWS.txt?rev=1104597&r1=1104596&r2=1104597&view=diff
==
--- cassandra/branches/cassandra-0.8.1/NEWS.txt (original)
+++ cassandra/branches/cassandra-0.8.1/NEWS.txt Tue May 17 22:12:59 2011
@@ -62,6 +62,15 @@ 

svn commit: r1104594 - in /cassandra/site: publish/download/index.html src/settings.py

2011-05-17 Thread eevans
Author: eevans
Date: Tue May 17 22:08:32 2011
New Revision: 1104594

URL: http://svn.apache.org/viewvc?rev=1104594&view=rev
Log:
update site versioning for 0.8.0 rc1 release

Modified:
cassandra/site/publish/download/index.html
cassandra/site/src/settings.py

Modified: cassandra/site/publish/download/index.html
URL: 
http://svn.apache.org/viewvc/cassandra/site/publish/download/index.html?rev=1104594&r1=1104593&r2=1104594&view=diff
==
--- cassandra/site/publish/download/index.html (original)
+++ cassandra/site/publish/download/index.html Tue May 17 22:08:32 2011
@@ -103,22 +103,22 @@
 
   
   
-  The latest development release is 0.8.0-beta2 (released on
-  2011-05-05).
+  The latest development release is 0.8.0-rc1 (released on
+  2011-05-17).
   
 
   
 
-http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.8.0/apache-cassandra-0.8.0-beta2-bin.tar.gz";>apache-cassandra-0.8.0-beta2-bin.tar.gz
-[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-beta2-bin.tar.gz.asc";>PGP]
-[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-beta2-bin.tar.gz.md5";>MD5]
-[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-beta2-bin.tar.gz.sha";>SHA1]
+http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.8.0/apache-cassandra-0.8.0-rc1-bin.tar.gz";>apache-cassandra-0.8.0-rc1-bin.tar.gz
+[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-rc1-bin.tar.gz.asc";>PGP]
+[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-rc1-bin.tar.gz.md5";>MD5]
+[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-rc1-bin.tar.gz.sha";>SHA1]
 
 
-http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.8.0/apache-cassandra-0.8.0-beta2-src.tar.gz";>apache-cassandra-0.8.0-beta2-src.tar.gz
-[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-beta2-src.tar.gz.asc";>PGP]
-[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-beta2-src.tar.gz.md5";>MD5]
-[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-beta2-src.tar.gz.sha";>SHA1]
+http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.8.0/apache-cassandra-0.8.0-rc1-src.tar.gz";>apache-cassandra-0.8.0-rc1-src.tar.gz
+[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-rc1-src.tar.gz.asc";>PGP]
+[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-rc1-src.tar.gz.md5";>MD5]
+[http://www.apache.org/dist/cassandra/0.8.0/apache-cassandra-0.8.0-rc1-src.tar.gz.sha";>SHA1]
 
   
   

Modified: cassandra/site/src/settings.py
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/settings.py?rev=1104594&r1=1104593&r2=1104594&view=diff
==
--- cassandra/site/src/settings.py (original)
+++ cassandra/site/src/settings.py Tue May 17 22:08:32 2011
@@ -97,8 +97,8 @@ class CassandraDef(object):
 oldstable_exists = True
 stable_version = '0.7.5'
 stable_release_date = '2011-04-27'
-devel_version = '0.8.0-beta2'
-devel_release_date = '2011-05-05'
+devel_version = '0.8.0-rc1'
+devel_release_date = '2011-05-17'
 devel_exists = True
 _apache_base_url = 'http://www.apache.org'
 _svn_base_url = 'https://svn.apache.org/repos/asf'




svn commit: r1104576 - /cassandra/tags/cassandra-0.8.0-rc1/

2011-05-17 Thread eevans
Author: eevans
Date: Tue May 17 21:40:35 2011
New Revision: 1104576

URL: http://svn.apache.org/viewvc?rev=1104576&view=rev
Log:
tagging 0.8.0 rc1

Added:
cassandra/tags/cassandra-0.8.0-rc1/
  - copied from r1102510, cassandra/branches/cassandra-0.8/



[jira] [Updated] (CASSANDRA-833) fix consistencylevel during bootstrap

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-833:
-

Attachment: 833-v2.txt

v2 tweaks getWriteEndpoints to avoid new Collection creation where possible, 
instead using Iterables.concat.

otherwise lgtm.

> fix consistencylevel during bootstrap
> -
>
> Key: CASSANDRA-833
> URL: https://issues.apache.org/jira/browse/CASSANDRA-833
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.5
>Reporter: Jonathan Ellis
>Assignee: Sylvain Lebresne
> Fix For: 0.8.1
>
> Attachments: 0001-Increase-CL-with-boostrapping-leaving-node.patch, 
> 833-v2.txt
>
>
> As originally designed, bootstrap nodes should *always* get *all* writes 
> under any consistencylevel, so when bootstrap finishes the operator can run 
> cleanup on the old nodes w/o fear that he might lose data.
> but if a bootstrap operation fails or is aborted, that means all writes will 
> fail until the ex-bootstrapping node is decommissioned.  so starting in 
> CASSANDRA-722, we just ignore dead nodes in consistencylevel calculations.
> but this breaks the original design.  CASSANDRA-822 adds a partial fix for 
> this (just adding bootstrap targets into the RF targets and hinting 
> normally), but this is still broken under certain conditions.  The real fix 
> is to consider consistencylevel for two sets of nodes:
>   1. the RF targets as currently existing (no pending ranges)
>   2.  the RF targets as they will exist after all movement ops are done
> If we satisfy CL for both sets then we will always be in good shape.
> I'm not sure if we can easily calculate 2. from the current TokenMetadata, 
> though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-05-17 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035053#comment-13035053
 ] 

T Jake Luciani commented on CASSANDRA-2388:
---

We need to return the list if replicas in the same DC

> ColumnFamilyRecordReader fails for a given split because a host is down, even 
> if records could reasonably be read from other replica.
> -
>
> Key: CASSANDRA-2388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop
>Reporter: Eldon Stegall
>  Labels: hadoop, inputformat
> Fix For: 0.8.1
>
> Attachments: 0002_On_TException_try_next_split.patch
>
>
> ColumnFamilyRecordReader only tries the first location for a given split. We 
> should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-05-17 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035053#comment-13035053
 ] 

T Jake Luciani edited comment on CASSANDRA-2388 at 5/17/11 9:09 PM:


We need to return the list of replicas in the same DC

  was (Author: tjake):
We need to return the list if replicas in the same DC
  
> ColumnFamilyRecordReader fails for a given split because a host is down, even 
> if records could reasonably be read from other replica.
> -
>
> Key: CASSANDRA-2388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hadoop
>Reporter: Eldon Stegall
>  Labels: hadoop, inputformat
> Fix For: 0.8.1
>
> Attachments: 0002_On_TException_try_next_split.patch
>
>
> ColumnFamilyRecordReader only tries the first location for a given split. We 
> should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2045) Simplify HH to decrease read load when nodes come back

2011-05-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035041#comment-13035041
 ] 

Jonathan Ellis commented on CASSANDRA-2045:
---

bq. Doesn't this mean that, given a very unstable cluster (e.g. EC2) writes 
using CL.ANY can cause nodes to fill up with data unexpectedly quickly?

Sort of.  It means you can fill up by at most 1/RF faster than you thought, 
yes, since rows can only be stored on at most once node that is not a replica 
(the coordinator). The correct fix to that is "stabilize your cluster." :)

bq. It's probably a good idea to try to retain backwards compatibility here as 
much as possible so that rolling upgrades of a cluster is possible

Right, but as discussed above we're not planning to move to materialized-hints 
entirely, so ripping out "classic" hints isn't an option anyway.

bq. I think Edward's idea of storing hints in a per-node CommitLog is a pretty 
elegant solution, unfortunately it's quite a lot more invasive and would be a 
nightmare for maintaining backwards compatibility.

serialized mutation objects as columns in a row is pretty close to commitlog 
format, only you can query it w/ normal tools.

> Simplify HH to decrease read load when nodes come back
> --
>
> Key: CASSANDRA-2045
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2045
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Goffinet
> Fix For: 1.0
>
>
> Currently when HH is enabled, hints are stored, and when a node comes back, 
> we begin sending that node data. We do a lookup on the local node for the row 
> to send. To help reduce read load (if a node is offline for long period of 
> time) we should store the data we want forward the node locally instead. We 
> wouldn't have to do any lookups, just take byte[] and send to the destination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2433:
--

 Reviewer: stuhood
  Component/s: Core
Affects Version/s: (was: 0.7.4)

> Failed Streams Break Repair
> ---
>
> Key: CASSANDRA-2433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benjamin Coverston
>Assignee: Sylvain Lebresne
>  Labels: repair
> Fix For: 0.8.1
>
> Attachments: 
> 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch, 
> 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re.patch, 
> 0002-Register-in-gossip-to-handle-node-failures-v2.patch, 
> 0002-Register-in-gossip-to-handle-node-failures.patch, 
> 0003-Report-streaming-errors-back-to-repair-v2.patch, 
> 0003-Report-streaming-errors-back-to-repair.patch, 
> 0004-Reports-validation-compaction-errors-back-to-repair-v2.patch, 
> 0004-Reports-validation-compaction-errors-back-to-repair.patch
>
>
> Running repair in cases where a stream fails we are seeing multiple problems.
> 1. Although retry is initiated and completes, the old stream doesn't seem to 
> clean itself up and repair hangs.
> 2. The temp files are left behind and multiple failures can end up filling up 
> the data partition.
> These issues together are making repair very difficult for nearly everyone 
> running repair on a non-trivial sized data set.
> This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
> moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
> that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2643) read repair/reconciliation breaks slice based iteration at QUORUM

2011-05-17 Thread Peter Schuller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035030#comment-13035030
 ] 

Peter Schuller commented on CASSANDRA-2643:
---

You're right of course - my example was bogus. I'll also agree about re-try 
being reasonable under the circumstances, though perhaps not optimal.

With regards to the fix. Let me just make sure I understand you correctly. So 
given a read command with a limit N that yields  read repair/reconciliation breaks slice based iteration at QUORUM
> -
>
> Key: CASSANDRA-2643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2643
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.7.5
>Reporter: Peter Schuller
>Priority: Critical
> Attachments: short_read.sh, slicetest.py
>
>
> In short, I believe iterating over columns is impossible to do reliably with 
> QUORUM due to the way reconciliation works.
> The problem is that the SliceQueryFilter is executing locally when reading on 
> a node, but no attempts seem to be made to consider limits when doing 
> reconciliation and/or read-repair (RowRepairResolver.resolveSuperset() and 
> ColumnFamily.resolve()).
> If a node slices and comes up with 100 columns, and another node slices and 
> comes up with 100 columns, some of which are unique to each side, 
> reconciliation results in > 100 columns in the result set. In this case the 
> effect is limited to "client gets more than asked for", but the columns still 
> accurately represent the range. This is easily triggered by my test-case.
> In addition to the client receiving "too many" columns, I believe some of 
> them will not be satisfying the QUORUM consistency level for the same reasons 
> as with deletions (see discussion below).
> Now, there *should* be a problem for tombstones as well, but it's more 
> subtle. Suppose A has:
>   1
>   2
>   3
>   4
>   5
>   6
> and B has:
>   1
>   del 2
>   del 3
>   del 4
>   5
>   6 
> If you now slice 1-6 with count=3 the tombstones from B will reconcile with 
> those from A - fine. So you end up getting 1,5,6 back. This made it a bit 
> difficult to trigger in a test case until I realized what was going on. At 
> first I was "hoping" to see a "short" iteration result, which would mean that 
> the process of iterating until you get a short result will cause spurious 
> "end of columns" and thus make it impossible to iterate correctly.
> So; due to 5-6 existing (and if they didn't, you legitimately reached 
> end-of-columns) we do indeed get a result of size 3 which contains 1,5 and 6. 
> However, only node B would have contributed columns 5 and 6; so there is 
> actually no QUORUM consistency on the co-ordinating node with respect to 
> these columns. If node A and C also had 5 and 6, they would not have been 
> considered.
> Am I wrong?
> In any case; using script I'm about to attach, you can trigger the 
> over-delivery case very easily:
> (0) disable hinted hand-off to avoid that interacting with the test
> (1) start three nodes
> (2) create ks 'test' with rf=3 and cf 'slicetest'
> (3) ./slicetest.py hostname_of_node_C insert # let it run for a few seconds, 
> then ctrl-c
> (4) stop node A
> (5) ./slicetest.py hostname_of_node_C insert # let it run for a few seconds, 
> then ctrl-c
> (6) start node A, wait for B and C to consider it up
> (7) ./slicetest.py hostname_of_node_A slice # make A co-ordinator though it 
> doesn't necessarily matter
> You can also pass 'delete' (random deletion of 50% of contents) or 
> 'deleterange' (delete all in [0.2,0.8]) to slicetest, but you don't trigger a 
> short read by doing that (see discussion above).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2661) Canary CLHM v1.2

2011-05-17 Thread Benjamin Manes (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Manes updated CASSANDRA-2661:
--

Attachment: clhm-20110517.jar

Packaged at r628

> Canary CLHM v1.2
> 
>
> Key: CASSANDRA-2661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2661
> Project: Cassandra
>  Issue Type: Task
>Reporter: Benjamin Manes
> Attachments: clhm-20110517.jar
>
>
> I am hoping to release ConcurrentLinkedHashMap v1.2 by the end of the week. 
> This task is optional, but gives you the opportunity to canary the library 
> and provide any final feedback. There are currently 285 tests (some threaded) 
> plus a load test, so reliability-wise I'm fairly confident.
> This release has numerous performance improvements. See the change log for 
> details.
> It also includes a few useful features that may be of interest,
>  - Snapshot iteration in order of hotness (CASSANDRA-1966)
>  - Optionally defer LRU maintenance penalty to a background executor (instead 
> of amortized on caller threads)
> http://code.google.com/p/concurrentlinkedhashmap/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2661) Canary CLHM v1.2

2011-05-17 Thread Benjamin Manes (JIRA)
Canary CLHM v1.2


 Key: CASSANDRA-2661
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2661
 Project: Cassandra
  Issue Type: Task
Reporter: Benjamin Manes


I am hoping to release ConcurrentLinkedHashMap v1.2 by the end of the week. 
This task is optional, but gives you the opportunity to canary the library and 
provide any final feedback. There are currently 285 tests (some threaded) plus 
a load test, so reliability-wise I'm fairly confident.

This release has numerous performance improvements. See the change log for 
details.

It also includes a few useful features that may be of interest,
 - Snapshot iteration in order of hotness (CASSANDRA-1966)
 - Optionally defer LRU maintenance penalty to a background executor (instead 
of amortized on caller threads)

http://code.google.com/p/concurrentlinkedhashmap/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2610) Have the repair of a range repair *all* the replica for that range

2011-05-17 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2610:


Attachment: 0001-Make-repair-repair-all-hosts.patch

Patch against 0.8.1. It applies on top of CASSANDRA-2433 because it is changing 
enough of common code that I don't want to have to deal with the rebase back 
and forth (and it actually reuse some of the refactoring of CASSANDRA-2433 
anyway)

> Have the repair of a range repair *all* the replica for that range
> --
>
> Key: CASSANDRA-2610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8 beta 1
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Minor
> Fix For: 0.8.1
>
> Attachments: 0001-Make-repair-repair-all-hosts.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Say you have a range R whose replica for that range are A, B and C. If you 
> run repair on node A for that range R, when the repair end you only know that 
> A is fully repaired. B and C are not. That is B and C are up to date with A 
> before the repair, but are not up to date with one another.
> It makes it a pain to schedule "optimal" cluster repairs, that is repairing a 
> full cluster without doing work twice (because you would have still have to 
> run a repair on B or C, which will make A, B and C redo a validation 
> compaction on R, and with more replica it's even more annoying).
> However it is fairly easy during the first repair on A to have him compare 
> all the merkle trees, i.e the ones for B and C, and ask to B or C to stream 
> between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-05-17 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: 
0004-Reports-validation-compaction-errors-back-to-repair-v2.patch
0003-Report-streaming-errors-back-to-repair-v2.patch
0002-Register-in-gossip-to-handle-node-failures-v2.patch

0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch

Attaching rebased patch (against 0.8.1). It also change the behavior a little 
bit so as to not fail repair right away if a problem occur (it still throw an 
exception at the end if any problem had occured). It turns out to be slightly 
simpler that way. Especially for CASSANDRA-1610.

> Failed Streams Break Repair
> ---
>
> Key: CASSANDRA-2433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.7.4
>Reporter: Benjamin Coverston
>Assignee: Sylvain Lebresne
>  Labels: repair
> Fix For: 0.8.1
>
> Attachments: 
> 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch, 
> 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re.patch, 
> 0002-Register-in-gossip-to-handle-node-failures-v2.patch, 
> 0002-Register-in-gossip-to-handle-node-failures.patch, 
> 0003-Report-streaming-errors-back-to-repair-v2.patch, 
> 0003-Report-streaming-errors-back-to-repair.patch, 
> 0004-Reports-validation-compaction-errors-back-to-repair-v2.patch, 
> 0004-Reports-validation-compaction-errors-back-to-repair.patch
>
>
> Running repair in cases where a stream fails we are seeing multiple problems.
> 1. Although retry is initiated and completes, the old stream doesn't seem to 
> clean itself up and repair hangs.
> 2. The temp files are left behind and multiple failures can end up filling up 
> the data partition.
> These issues together are making repair very difficult for nearly everyone 
> running repair on a non-trivial sized data set.
> This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
> moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
> that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2045) Simplify HH to decrease read load when nodes come back

2011-05-17 Thread Nicholas Telford (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034970#comment-13034970
 ] 

Nicholas Telford commented on CASSANDRA-2045:
-

I've been looking in to this and I have a few observations/questions, although 
I'm still quite new to the Cassandra codebase, so if I'm wrong, please let me 
know.

 * Currently, when a node receives a RowMutation containing a hint, it stores 
it to the application CF and places a hint in the system hints CF. This is fine 
in the general case, but writes using CL.ANY may result in hinted RowMutations 
being sent to nodes that don't own that key. They still write the RowMutation 
to their application CF so they can pass it on to the destination node when it 
recovers. But this data is only ever deleted during a manual cleanup. Doesn't 
this mean that, given a very unstable cluster (e.g. EC2) writes using CL.ANY 
can cause nodes to fill up with data unexpectedly quickly?

* The JavaDoc for HintedHandOffManager mentions another issue caused by the 
current strategy: cleanup compactions on the application CF will cause the 
hints to become invalid. It goes on to suggest a strategy similar to what's 
being discussed here (placing the individual RowMutations in a separate HH CF).

* It's probably a good idea to try to retain backwards compatibility here as 
much as possible so that rolling upgrades of a cluster is possible - hints 
stored for the old version need to be deliverable to nodes coming back up with 
the new version and vice versa.

* I think Edward's idea of storing hints in a per-node CommitLog is a pretty 
elegant solution, unfortunately it's quite a lot more invasive and would be a 
nightmare for maintaining backwards compatibility. Thoughts?

> Simplify HH to decrease read load when nodes come back
> --
>
> Key: CASSANDRA-2045
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2045
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Goffinet
> Fix For: 1.0
>
>
> Currently when HH is enabled, hints are stored, and when a node comes back, 
> we begin sending that node data. We do a lookup on the local node for the row 
> to send. To help reduce read load (if a node is offline for long period of 
> time) we should store the data we want forward the node locally instead. We 
> wouldn't have to do any lookups, just take byte[] and send to the destination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-1278:
-

Assignee: Sylvain Lebresne  (was: Matthew F. Dennis)

I think we've been over-engineering the problem.  Ed was on the right track:

bq. I would personally like to see a JMX function like 'nodetool addsstable 
mykeyspace mycf mysstable-file' . Most people can generating and move an 
SSTable on their own (sstableWriter +scp)

(This is, btw, the HBase bulk load approach, which despite some clunkiness does 
seem to solve the problem for those users.)

The main drawback is that because of Cassandra's replication strategies, data 
from a naively-written sstable could span many nodes -- even the entire cluster.

So we can improve the experience a lot with a simple tool that just streams 
ranges from a local table to the right nodes. Since it's doing the exact thing 
that existing node movement needs -- sending ranges from an existing sstable -- 
it should not require any new code from Streaming.

Sylvain volunteered to take a stab at this.

> Make bulk loading into Cassandra less crappy, more pluggable
> 
>
> Key: CASSANDRA-1278
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jeremy Hanna
>Assignee: Sylvain Lebresne
> Fix For: 0.8.1
>
> Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 
> 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>  Time Spent: 40h 40m
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either 
> directed to just do it responsibly with thrift or a higher level client, or 
> they have to explore the contrib/bmt example - 
> http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
> delving into the code to find out how it works and then applying it to the 
> given problem.  Using either method, the user also needs to keep in mind that 
> overloading the cluster is possible - which will hopefully be addressed in 
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents 
> dealing with bulk loading.  Perhaps it could include code in the Core to make 
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to 
> do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Cassandra Wiki] Update of "Operations" by JonathanEllis

2011-05-17 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "Operations" page has been changed by JonathanEllis.
The comment on this change is: alternating tokens is only viable w/ same number 
of nodes in each DC.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=91&rev2=92

--

  === Token selection ===
  Using a strong hash function means !RandomPartitioner keys will, on average, 
be evenly spread across the Token space, but you can still have imbalances if 
your Tokens do not divide up the range evenly, so you should specify 
!InitialToken to your first nodes as `i * (2**127 / N)` for i = 0 .. N-1. In 
Cassandra 0.7, you should specify `initial_token` in `cassandra.yaml`.
  
- With !NetworkTopologyStrategy, you should alternate data centers when 
assigning tokens. For example, with two nodes in each of two data centers,
+ With !NetworkTopologyStrategy, you should calculate the tokens the nodes in 
each DC independantly. Tokens still neded to be unique, so you can add 1 to the 
tokens in the 2nd DC, add 2 in the 3rd, and so on.  Thus, for a 4-node cluster 
in 2 datacenters, you would have
+ {{{
+ DC1
+ node 1 = 0
+ node 2 = 85070591730234615865843651857942052864
  
+ DC2
+ node 3 = 1
+ node 4 = 85070591730234615865843651857942052865
+ }}}
+ 
+ 
+ If you happen to have the same number of nodes in each data center, you can 
also alternate data centers when assigning tokens:
  {{{
  [DC1] node 1 = 0
  [DC2] node 2 = 42535295865117307932921825928971026432
  [DC1] node 3 = 85070591730234615865843651857942052864
  [DC2] node 4 = 127605887595351923798765477786913079296
  }}}
+ 
  With order preserving partitioners, your key distribution will be 
application-dependent.  You should still take your best guess at specifying 
initial tokens (guided by sampling actual data, if possible), but you will be 
more dependent on active load balancing (see below) and/or adding new nodes to 
hot spots.
  
  Once data is placed on the cluster, the partitioner may not be changed 
without wiping and starting over.


[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-05-17 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034878#comment-13034878
 ] 

Ryan King commented on CASSANDRA-1610:
--

Agreed.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> The goal of this ticket is to make compaction pluggable enough to support 
> compaction based on max timestamp ordering of the sstables while satisfying 
> max sstable size, min and max compaction thresholds. Another goal is to allow 
> expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Cassandra Wiki] Update of "Operations" by JonathanEllis

2011-05-17 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "Operations" page has been changed by JonathanEllis.
The comment on this change is: change NTS recommendation to alternate DCs.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=90&rev2=91

--

  === Token selection ===
  Using a strong hash function means !RandomPartitioner keys will, on average, 
be evenly spread across the Token space, but you can still have imbalances if 
your Tokens do not divide up the range evenly, so you should specify 
!InitialToken to your first nodes as `i * (2**127 / N)` for i = 0 .. N-1. In 
Cassandra 0.7, you should specify `initial_token` in `cassandra.yaml`.
  
- With !NetworkTopologyStrategy, you should calculate the tokens the nodes in 
each DC independantly. Tokens still neded to be unique, so you can add 1 to the 
tokens in the 2nd DC, add 2 in the 3rd, and so on.  Thus, for a 4-node cluster 
in 2 datacenters, you would have
+ With !NetworkTopologyStrategy, you should alternate data centers when 
assigning tokens. For example, with two nodes in each of two data centers,
+ 
  {{{
- DC1
- node 1 = 0
+ [DC1] node 1 = 0
+ [DC2] node 2 = 42535295865117307932921825928971026432
- node 2 = 85070591730234615865843651857942052864
+ [DC1] node 3 = 85070591730234615865843651857942052864
+ [DC2] node 4 = 127605887595351923798765477786913079296
- 
- DC2
- node 1 = 1
- node 2 = 85070591730234615865843651857942052865
  }}}
- 
  With order preserving partitioners, your key distribution will be 
application-dependent.  You should still take your best guess at specifying 
initial tokens (guided by sampling actual data, if possible), but you will be 
more dependent on active load balancing (see below) and/or adding new nodes to 
hot spots.
  
  Once data is placed on the cluster, the partitioner may not be changed 
without wiping and starting over.


[jira] [Updated] (CASSANDRA-2660) BRAF.sync() bug can cause massive commit log write magnification

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2660:
--

  Component/s: Core
 Priority: Minor  (was: Major)
Fix Version/s: 0.8.1
   0.7.7

merged to 0.8 branch cleanly

> BRAF.sync() bug can cause massive commit log write magnification
> 
>
> Key: CASSANDRA-2660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2660
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Peter Schuller
>Assignee: Peter Schuller
>Priority: Minor
> Fix For: 0.7.7, 0.8.1
>
> Attachments: CASSANDRA-2660-075.txt
>
>
> This was discovered, fixed and tested on 0.7.5. Cursory examination shows it 
> should still be an issue on trunk/0.8. If people otherwise agree with the 
> patch I can rebase if necessary.
> Problem:
> BRAF.flush() is actually broken in the sense that it cannot be called without 
> close co-operation with the caller. rebuffer() does the co-op by adjusting 
> bufferOffset and validateBufferBytes appropriately, by sync() doesn't. This 
> means sync() is broken, and sync() is used by the commit log.
> The attached patch moves the bufferOffset/validateBufferBytes handling out 
> into resetBuffer() and has both flush() and rebuffer() call that. This makes 
> sync() safe.
> What happened was that for batched commit log mode, every time sync() was 
> called the data buffered so far would get written to the OS and fsync():ed. 
> But until rebuffer() is called for other reasons as part of the write path, 
> all subsequent sync():s would result in the very same data (plus whatever was 
> written since last time) being re-written and fsync():ed again. So first you 
> write+fsync N bytes, then N+N1, then N+N1+N2... (each N being a batch), until 
> at some point you trigger a rebuffer() and it starts all over again.
> The result is that you see *a lot* more writes to the commit log than are in 
> fact written to the BRAF. And these writes translate into actual real writes 
> to the underlying storage device due to fsync(). We had crazy numbers where 
> we saw spikes upwards of 80 mb/second where the actual throughput was more 
> like ~ 1 mb second of data to the commit log.
> (One can make a possibly weak argument that it is also functionally incorrect 
> as I can imagine implementations where re-writing the same blocks does 
> copy-on-write in such a way that you're not necessarily guaranteed to see 
> before-or-after data, particularly in case of partial page writes. However 
> that's probably not a practical issue.)
> Worthy of noting is that this probably causes added difficulties in fsync() 
> latencies since the average fsync() will contain a lot more data. Depending 
> on I/O scheduler and underlying device characteristics, the extra writes 
> *may* not have a detrimental effect, but I think it's pretty easy to point to 
> cases where it will be detrimental - in particular if the commit log is on a 
> non-battery backed drive. Even with a nice battery backed RAID with the 
> commit log on, the size of the writes probably contributes to difficulty in 
> making the write requests propagate down without being starved by reads (but 
> this is speculation, not tested, other than that I've observed commit log 
> writer starvation that seemed excessive).
> This isn't the first subtle BRAF bug. What are people's thoughts on creating 
> separate abstractions for streaming I/O that can perhaps be a lot more 
> simple, and use BRAF only for random reads in response to live traffic? (Not 
> as part of this JIRA, just asking in general.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


svn commit: r1104381 - in /cassandra/branches/cassandra-0.8: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/io/util/

2011-05-17 Thread jbellis
Author: jbellis
Date: Tue May 17 16:17:55 2011
New Revision: 1104381

URL: http://svn.apache.org/viewvc?rev=1104381&view=rev
Log:
merge from 0.7

Modified:
cassandra/branches/cassandra-0.8/   (props changed)
cassandra/branches/cassandra-0.8/CHANGES.txt
cassandra/branches/cassandra-0.8/contrib/   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/io/util/BufferedRandomAccessFile.java

Propchange: cassandra/branches/cassandra-0.8/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 16:17:55 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000
-/cassandra/branches/cassandra-0.7:1026516-1103894
+/cassandra/branches/cassandra-0.7:1026516-1104371
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689
 /cassandra/trunk:1090978-1090979

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1104381&r1=1104380&r2=1104381&view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Tue May 17 16:17:55 2011
@@ -2,6 +2,8 @@
  * adjust hinted handoff page size to avoid OOM with large columns 
(CASSANDRA-2652)
  * update CQL consistency levels (CASSANDRA-2566)
+ * mark BRAF buffer invalid post-flush so we don't re-flush partial
+   buffers again, especially on CL writes (CASSANDRA-2660)
 
 
 0.8.0-rc1

Propchange: cassandra/branches/cassandra-0.8/contrib/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 16:17:55 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009
-/cassandra/branches/cassandra-0.7/contrib:1026516-1103894
+/cassandra/branches/cassandra-0.7/contrib:1026516-1104371
 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654
 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689
 /cassandra/trunk/contrib:1090978-1090979

Propchange: 
cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 16:17:55 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000
-/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1103894
+/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1104371
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654
 
/cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689
 
/cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090978-1090979

Propchange: 
cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 16:17:55 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000
-/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1103894
+/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1104371
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1053690-1055654
 
/cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1051699-1053689
 
/cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1090978-1090979

Pro

[jira] [Commented] (CASSANDRA-2660) BRAF.sync() bug can cause massive commit log write magnification

2011-05-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034833#comment-13034833
 ] 

Hudson commented on CASSANDRA-2660:
---

Integrated in Cassandra-0.7 #487 (See 
[https://builds.apache.org/hudson/job/Cassandra-0.7/487/])
mark BRAF buffer invalid post-flush so we don't re-flush partial buffers 
again
patch by Peter Schuller; reviewed by jbellis for CASSANDRA-2660


> BRAF.sync() bug can cause massive commit log write magnification
> 
>
> Key: CASSANDRA-2660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2660
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Peter Schuller
>Assignee: Peter Schuller
> Attachments: CASSANDRA-2660-075.txt
>
>
> This was discovered, fixed and tested on 0.7.5. Cursory examination shows it 
> should still be an issue on trunk/0.8. If people otherwise agree with the 
> patch I can rebase if necessary.
> Problem:
> BRAF.flush() is actually broken in the sense that it cannot be called without 
> close co-operation with the caller. rebuffer() does the co-op by adjusting 
> bufferOffset and validateBufferBytes appropriately, by sync() doesn't. This 
> means sync() is broken, and sync() is used by the commit log.
> The attached patch moves the bufferOffset/validateBufferBytes handling out 
> into resetBuffer() and has both flush() and rebuffer() call that. This makes 
> sync() safe.
> What happened was that for batched commit log mode, every time sync() was 
> called the data buffered so far would get written to the OS and fsync():ed. 
> But until rebuffer() is called for other reasons as part of the write path, 
> all subsequent sync():s would result in the very same data (plus whatever was 
> written since last time) being re-written and fsync():ed again. So first you 
> write+fsync N bytes, then N+N1, then N+N1+N2... (each N being a batch), until 
> at some point you trigger a rebuffer() and it starts all over again.
> The result is that you see *a lot* more writes to the commit log than are in 
> fact written to the BRAF. And these writes translate into actual real writes 
> to the underlying storage device due to fsync(). We had crazy numbers where 
> we saw spikes upwards of 80 mb/second where the actual throughput was more 
> like ~ 1 mb second of data to the commit log.
> (One can make a possibly weak argument that it is also functionally incorrect 
> as I can imagine implementations where re-writing the same blocks does 
> copy-on-write in such a way that you're not necessarily guaranteed to see 
> before-or-after data, particularly in case of partial page writes. However 
> that's probably not a practical issue.)
> Worthy of noting is that this probably causes added difficulties in fsync() 
> latencies since the average fsync() will contain a lot more data. Depending 
> on I/O scheduler and underlying device characteristics, the extra writes 
> *may* not have a detrimental effect, but I think it's pretty easy to point to 
> cases where it will be detrimental - in particular if the commit log is on a 
> non-battery backed drive. Even with a nice battery backed RAID with the 
> commit log on, the size of the writes probably contributes to difficulty in 
> making the write requests propagate down without being starved by reads (but 
> this is speculation, not tested, other than that I've observed commit log 
> writer starvation that seemed excessive).
> This isn't the first subtle BRAF bug. What are people's thoughts on creating 
> separate abstractions for streaming I/O that can perhaps be a lot more 
> simple, and use BRAF only for random reads in response to live traffic? (Not 
> as part of this JIRA, just asking in general.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Cassandra Wiki] Update of "Operations" by JonathanEllis

2011-05-17 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "Operations" page has been changed by JonathanEllis.
The comment on this change is: add NTS notes to token selection.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=89&rev2=90

--

  === Token selection ===
  Using a strong hash function means !RandomPartitioner keys will, on average, 
be evenly spread across the Token space, but you can still have imbalances if 
your Tokens do not divide up the range evenly, so you should specify 
!InitialToken to your first nodes as `i * (2**127 / N)` for i = 0 .. N-1. In 
Cassandra 0.7, you should specify `initial_token` in `cassandra.yaml`.
  
+ With !NetworkTopologyStrategy, you should calculate the tokens the nodes in 
each DC independantly. Tokens still neded to be unique, so you can add 1 to the 
tokens in the 2nd DC, add 2 in the 3rd, and so on.  Thus, for a 4-node cluster 
in 2 datacenters, you would have
+ {{{
+ DC1
+ node 1 = 0
+ node 2 = 85070591730234615865843651857942052864
+ 
+ DC2
+ node 1 = 1
+ node 2 = 85070591730234615865843651857942052865
+ }}}
+ 
  With order preserving partitioners, your key distribution will be 
application-dependent.  You should still take your best guess at specifying 
initial tokens (guided by sampling actual data, if possible), but you will be 
more dependent on active load balancing (see below) and/or adding new nodes to 
hot spots.
  
  Once data is placed on the cluster, the partitioner may not be changed 
without wiping and starting over.
@@ -40, +51 @@

  Replication factor is not really intended to be changed in a live cluster 
either, but increasing it is conceptually simple: update the replication_factor 
from the CLI (see below), then run repair against each node in your cluster so 
that all the new replicas that are supposed to have the data, actually do.
  
  Until repair is finished, you have 3 options:
+ 
   * read at ConsistencyLevel.QUORUM or ALL (depending on your existing 
replication factor) to make sure that a replica that actually has the data is 
consulted
   * continue reading at lower CL, accepting that some requests will fail 
(usually only the first for a given query, if ReadRepair is enabled)
   * take downtime while repair runs
@@ -49, +61 @@

  Reducing replication factor is easily done and only requires running cleanup 
afterwards to remove extra replicas.
  
  To update the replication factor on a live cluster, forget about 
cassandra.yaml. Rather you want to use '''cassandra-cli''':
+ 
- update keyspace Keyspace1 with replication_factor = 3;
+  . update keyspace Keyspace1 with replication_factor = 3;
  
  === Network topology ===
  Besides datacenters, you can also tell Cassandra which nodes are in the same 
rack within a datacenter.  Cassandra will use this to route both reads and data 
movement for Range changes to the nearest replicas.  This is configured by a 
user-pluggable !EndpointSnitch class in the configuration file.
@@ -97, +110 @@

  
  Here's a python program which can be used to calculate new tokens for the 
nodes. There's more info on the subject at Ben Black's presentation at 
Cassandra Summit 2010. 
http://www.datastax.com/blog/slides-and-videos-cassandra-summit-2010
  
-   def tokens(nodes):
+  . def tokens(nodes):
-   for x in xrange(nodes): 
+   . for x in xrange(nodes):
-   print 2 ** 127 / nodes * x
+. print 2 ** 127 / nodes * x
  
- In versions of Cassandra 0.7.* and lower, there's also `nodetool 
loadbalance`: essentially a convenience over decommission + bootstrap, only 
instead of telling the target node where to move on the ring it will choose its 
location based on the same heuristic as Token selection on bootstrap. You 
should not use this as it doesn't rebalance the entire ring. 
+ In versions of Cassandra 0.7.* and lower, there's also `nodetool 
loadbalance`: essentially a convenience over decommission + bootstrap, only 
instead of telling the target node where to move on the ring it will choose its 
location based on the same heuristic as Token selection on bootstrap. You 
should not use this as it doesn't rebalance the entire ring.
  
- The status of move and balancing operations can be monitored using `nodetool` 
with the `netstat` argument. 
+ The status of move and balancing operations can be monitored using `nodetool` 
with the `netstat` argument.  (Cassandra 0.6.* and lower use the `streams` 
argument).
- (Cassandra 0.6.* and lower use the `streams` argument).
  
  == Consistency ==
  Cassandra allows clients to specify the desired consistency level on reads 
and writes.  (See [[API]].)  If R + W > N, where R, W, and N are respectively 
the read replica count, the write replica count, and the replication factor, 
all client reads will see the most recent write.  Otherwise, readers '''may''' 
see older versions, for periods of typic

[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-05-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034812#comment-13034812
 ] 

Jonathan Ellis commented on CASSANDRA-1610:
---

Let's keep this to "make compaction pluggable" and add extra strategies 
separately.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> The goal of this ticket is to make compaction pluggable enough to support 
> compaction based on max timestamp ordering of the sstables while satisfying 
> max sstable size, min and max compaction thresholds. Another goal is to allow 
> expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-2660) BRAF.sync() bug can cause massive commit log write magnification

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2660.
---

Resolution: Fixed
  Reviewer: jbellis

Ideally we wouldn't wipe out the buffer for read purposes but since we are 
mixing rw in the same buffer (see comments above) this is the best option.  
Committed.

> BRAF.sync() bug can cause massive commit log write magnification
> 
>
> Key: CASSANDRA-2660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2660
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Peter Schuller
>Assignee: Peter Schuller
> Attachments: CASSANDRA-2660-075.txt
>
>
> This was discovered, fixed and tested on 0.7.5. Cursory examination shows it 
> should still be an issue on trunk/0.8. If people otherwise agree with the 
> patch I can rebase if necessary.
> Problem:
> BRAF.flush() is actually broken in the sense that it cannot be called without 
> close co-operation with the caller. rebuffer() does the co-op by adjusting 
> bufferOffset and validateBufferBytes appropriately, by sync() doesn't. This 
> means sync() is broken, and sync() is used by the commit log.
> The attached patch moves the bufferOffset/validateBufferBytes handling out 
> into resetBuffer() and has both flush() and rebuffer() call that. This makes 
> sync() safe.
> What happened was that for batched commit log mode, every time sync() was 
> called the data buffered so far would get written to the OS and fsync():ed. 
> But until rebuffer() is called for other reasons as part of the write path, 
> all subsequent sync():s would result in the very same data (plus whatever was 
> written since last time) being re-written and fsync():ed again. So first you 
> write+fsync N bytes, then N+N1, then N+N1+N2... (each N being a batch), until 
> at some point you trigger a rebuffer() and it starts all over again.
> The result is that you see *a lot* more writes to the commit log than are in 
> fact written to the BRAF. And these writes translate into actual real writes 
> to the underlying storage device due to fsync(). We had crazy numbers where 
> we saw spikes upwards of 80 mb/second where the actual throughput was more 
> like ~ 1 mb second of data to the commit log.
> (One can make a possibly weak argument that it is also functionally incorrect 
> as I can imagine implementations where re-writing the same blocks does 
> copy-on-write in such a way that you're not necessarily guaranteed to see 
> before-or-after data, particularly in case of partial page writes. However 
> that's probably not a practical issue.)
> Worthy of noting is that this probably causes added difficulties in fsync() 
> latencies since the average fsync() will contain a lot more data. Depending 
> on I/O scheduler and underlying device characteristics, the extra writes 
> *may* not have a detrimental effect, but I think it's pretty easy to point to 
> cases where it will be detrimental - in particular if the commit log is on a 
> non-battery backed drive. Even with a nice battery backed RAID with the 
> commit log on, the size of the writes probably contributes to difficulty in 
> making the write requests propagate down without being starved by reads (but 
> this is speculation, not tested, other than that I've observed commit log 
> writer starvation that seemed excessive).
> This isn't the first subtle BRAF bug. What are people's thoughts on creating 
> separate abstractions for streaming I/O that can perhaps be a lot more 
> simple, and use BRAF only for random reads in response to live traffic? (Not 
> as part of this JIRA, just asking in general.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


svn commit: r1104305 - in /cassandra/branches/cassandra-0.7: CHANGES.txt src/java/org/apache/cassandra/io/util/BufferedRandomAccessFile.java

2011-05-17 Thread jbellis
Author: jbellis
Date: Tue May 17 14:53:26 2011
New Revision: 1104305

URL: http://svn.apache.org/viewvc?rev=1104305&view=rev
Log:
mark BRAF buffer invalid post-flush so we don't re-flush partial buffers again
patch by Peter Schuller; reviewed by jbellis for CASSANDRA-2660

Modified:
cassandra/branches/cassandra-0.7/CHANGES.txt

cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/io/util/BufferedRandomAccessFile.java

Modified: cassandra/branches/cassandra-0.7/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1104305&r1=1104304&r2=1104305&view=diff
==
--- cassandra/branches/cassandra-0.7/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.7/CHANGES.txt Tue May 17 14:53:26 2011
@@ -1,6 +1,8 @@
 0.7.7
  * adjust hinted handoff page size to avoid OOM with large columns 
(CASSANDRA-2652)
+ * mark BRAF buffer invalid post-flush so we don't re-flush partial
+   buffers again, especially on CL writes (CASSANDRA-2660)
 
 
 0.7.6

Modified: 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/io/util/BufferedRandomAccessFile.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/io/util/BufferedRandomAccessFile.java?rev=1104305&r1=1104304&r2=1104305&view=diff
==
--- 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/io/util/BufferedRandomAccessFile.java
 (original)
+++ 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/io/util/BufferedRandomAccessFile.java
 Tue May 17 14:53:26 2011
@@ -128,6 +128,9 @@ public class BufferedRandomAccessFile ex
 fd = CLibrary.getfd(this.getFD());
 }
 
+/**
+ * Flush (flush()) whatever writes are pending, and block until the data 
has been persistently committed (fsync()).
+ */
 public void sync() throws IOException
 {
 if (syncNeeded)
@@ -150,6 +153,11 @@ public class BufferedRandomAccessFile ex
 }
 }
 
+/**
+ * If we are dirty, flush dirty contents to the operating system. Does not 
imply fsync().
+ *
+ * Currently, for implementation reasons, this also invalidates the buffer.
+ */
 public void flush() throws IOException
 {
 if (isDirty)
@@ -181,20 +189,25 @@ public class BufferedRandomAccessFile ex
 
 }
 
+// Remember that we wrote, so we don't write it again on next 
flush().
+resetBuffer();
+
 isDirty = false;
 }
 }
 
+private void resetBuffer()
+{
+bufferOffset = current;
+validBufferBytes = 0;
+}
+
 private void reBuffer() throws IOException
 {
 flush(); // synchronizing buffer and file on disk
-
-bufferOffset = current;
+resetBuffer();
 if (bufferOffset >= channel.size())
-{
-validBufferBytes = 0;
 return;
-}
 
 if (bufferOffset < minBufferOffset)
 minBufferOffset = bufferOffset;




[jira] [Updated] (CASSANDRA-2659) Improve forceDeserialize/getCompactedRow encapsulation

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2659:
--

Attachment: 2659-v2.txt

v2 addresses nits and re-adds ability to use EchoedRow in multi-row compaction. 
(See comments to CC.getCompactedRow and forceDeserialize variable in 
CM.doCompactionWithoutSizeEstimation).

> Improve forceDeserialize/getCompactedRow encapsulation
> --
>
> Key: CASSANDRA-2659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2659
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
> Fix For: 0.8.1
>
> Attachments: 2659-v2.txt, 2659.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-2547) CQL: support for "create columnfamily" option 'row_cache_provider'

2011-05-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2547.
---

Resolution: Duplicate
  Assignee: (was: Pavel Yaskevich)

> CQL: support for "create columnfamily" option 'row_cache_provider'
> --
>
> Key: CASSANDRA-2547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2547
> Project: Cassandra
>  Issue Type: Improvement
>Affects Versions: 0.8 beta 1
>Reporter: Cathy Daw
>Priority: Minor
>  Labels: cql
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2660) BRAF.sync() bug can cause massive commit log write magnification

2011-05-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034773#comment-13034773
 ] 

Jonathan Ellis commented on CASSANDRA-2660:
---

bq. What are people's thoughts on creating separate abstractions for streaming 
I/O that can perhaps be a lot more simple, and use BRAF only for random reads 
in response to live traffic? (Not as part of this JIRA, just asking in general.)

Every time I've looked at doing this I've put it aside because making all 
writes two-pass (first pass to compute size, so we don't have to seek back 
after serializing the row itself) is such a pain.

> BRAF.sync() bug can cause massive commit log write magnification
> 
>
> Key: CASSANDRA-2660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2660
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Peter Schuller
>Assignee: Peter Schuller
> Attachments: CASSANDRA-2660-075.txt
>
>
> This was discovered, fixed and tested on 0.7.5. Cursory examination shows it 
> should still be an issue on trunk/0.8. If people otherwise agree with the 
> patch I can rebase if necessary.
> Problem:
> BRAF.flush() is actually broken in the sense that it cannot be called without 
> close co-operation with the caller. rebuffer() does the co-op by adjusting 
> bufferOffset and validateBufferBytes appropriately, by sync() doesn't. This 
> means sync() is broken, and sync() is used by the commit log.
> The attached patch moves the bufferOffset/validateBufferBytes handling out 
> into resetBuffer() and has both flush() and rebuffer() call that. This makes 
> sync() safe.
> What happened was that for batched commit log mode, every time sync() was 
> called the data buffered so far would get written to the OS and fsync():ed. 
> But until rebuffer() is called for other reasons as part of the write path, 
> all subsequent sync():s would result in the very same data (plus whatever was 
> written since last time) being re-written and fsync():ed again. So first you 
> write+fsync N bytes, then N+N1, then N+N1+N2... (each N being a batch), until 
> at some point you trigger a rebuffer() and it starts all over again.
> The result is that you see *a lot* more writes to the commit log than are in 
> fact written to the BRAF. And these writes translate into actual real writes 
> to the underlying storage device due to fsync(). We had crazy numbers where 
> we saw spikes upwards of 80 mb/second where the actual throughput was more 
> like ~ 1 mb second of data to the commit log.
> (One can make a possibly weak argument that it is also functionally incorrect 
> as I can imagine implementations where re-writing the same blocks does 
> copy-on-write in such a way that you're not necessarily guaranteed to see 
> before-or-after data, particularly in case of partial page writes. However 
> that's probably not a practical issue.)
> Worthy of noting is that this probably causes added difficulties in fsync() 
> latencies since the average fsync() will contain a lot more data. Depending 
> on I/O scheduler and underlying device characteristics, the extra writes 
> *may* not have a detrimental effect, but I think it's pretty easy to point to 
> cases where it will be detrimental - in particular if the commit log is on a 
> non-battery backed drive. Even with a nice battery backed RAID with the 
> commit log on, the size of the writes probably contributes to difficulty in 
> making the write requests propagate down without being starved by reads (but 
> this is speculation, not tested, other than that I've observed commit log 
> writer starvation that seemed excessive).
> This isn't the first subtle BRAF bug. What are people's thoughts on creating 
> separate abstractions for streaming I/O that can perhaps be a lot more 
> simple, and use BRAF only for random reads in response to live traffic? (Not 
> as part of this JIRA, just asking in general.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

2011-05-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034771#comment-13034771
 ] 

Jonathan Ellis commented on CASSANDRA-2394:
---

Yes. Here's what the cli has to say about that:

{noformat}
  Note that disabling read repair entirely means that the dynamic snitch
  will not have any latency information from all the replicas to 
recognize
  when one is performing worse than usual.
{noformat}

> Faulty hd kills cluster performance
> ---
>
> Key: CASSANDRA-2394
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.7.4
>Reporter: Thibaut
>Priority: Minor
> Fix For: 0.7.7
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  
> (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on 
> various nodes which cause everything to run very very slowly. Keyrange scans 
> just timeout and will sometimes never succeed. If I stop cassandra on the 
> faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which 
> marks a node as "down" if there are multiple read/write errors to the data 
> directories. A single faulty hd on 1 node shouldn't affect global cluster 
> performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2547) CQL: support for "create columnfamily" option 'row_cache_provider'

2011-05-17 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034736#comment-13034736
 ] 

Pavel Yaskevich commented on CASSANDRA-2547:


Branches cassandra-0.8/0.8.1 already have support for row_cache_provider in 
CreateColumnFamilyStatement.java, I have tested and it works. 

> CQL: support for "create columnfamily" option 'row_cache_provider'
> --
>
> Key: CASSANDRA-2547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2547
> Project: Cassandra
>  Issue Type: Improvement
>Affects Versions: 0.8 beta 1
>Reporter: Cathy Daw
>Assignee: Pavel Yaskevich
>Priority: Minor
>  Labels: cql
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2660) BRAF.sync() bug can cause massive commit log write magnification

2011-05-17 Thread Peter Schuller (JIRA)
BRAF.sync() bug can cause massive commit log write magnification


 Key: CASSANDRA-2660
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2660
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
 Attachments: CASSANDRA-2660-075.txt

This was discovered, fixed and tested on 0.7.5. Cursory examination shows it 
should still be an issue on trunk/0.8. If people otherwise agree with the patch 
I can rebase if necessary.

Problem:

BRAF.flush() is actually broken in the sense that it cannot be called without 
close co-operation with the caller. rebuffer() does the co-op by adjusting 
bufferOffset and validateBufferBytes appropriately, by sync() doesn't. This 
means sync() is broken, and sync() is used by the commit log.

The attached patch moves the bufferOffset/validateBufferBytes handling out into 
resetBuffer() and has both flush() and rebuffer() call that. This makes sync() 
safe.

What happened was that for batched commit log mode, every time sync() was 
called the data buffered so far would get written to the OS and fsync():ed. But 
until rebuffer() is called for other reasons as part of the write path, all 
subsequent sync():s would result in the very same data (plus whatever was 
written since last time) being re-written and fsync():ed again. So first you 
write+fsync N bytes, then N+N1, then N+N1+N2... (each N being a batch), until 
at some point you trigger a rebuffer() and it starts all over again.

The result is that you see *a lot* more writes to the commit log than are in 
fact written to the BRAF. And these writes translate into actual real writes to 
the underlying storage device due to fsync(). We had crazy numbers where we saw 
spikes upwards of 80 mb/second where the actual throughput was more like ~ 1 mb 
second of data to the commit log.

(One can make a possibly weak argument that it is also functionally incorrect 
as I can imagine implementations where re-writing the same blocks does 
copy-on-write in such a way that you're not necessarily guaranteed to see 
before-or-after data, particularly in case of partial page writes. However 
that's probably not a practical issue.)

Worthy of noting is that this probably causes added difficulties in fsync() 
latencies since the average fsync() will contain a lot more data. Depending on 
I/O scheduler and underlying device characteristics, the extra writes *may* not 
have a detrimental effect, but I think it's pretty easy to point to cases where 
it will be detrimental - in particular if the commit log is on a non-battery 
backed drive. Even with a nice battery backed RAID with the commit log on, the 
size of the writes probably contributes to difficulty in making the write 
requests propagate down without being starved by reads (but this is 
speculation, not tested, other than that I've observed commit log writer 
starvation that seemed excessive).

This isn't the first subtle BRAF bug. What are people's thoughts on creating 
separate abstractions for streaming I/O that can perhaps be a lot more simple, 
and use BRAF only for random reads in response to live traffic? (Not as part of 
this JIRA, just asking in general.)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-2660) BRAF.sync() bug can cause massive commit log write magnification

2011-05-17 Thread Peter Schuller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller reassigned CASSANDRA-2660:
-

Assignee: Peter Schuller

> BRAF.sync() bug can cause massive commit log write magnification
> 
>
> Key: CASSANDRA-2660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2660
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Peter Schuller
>Assignee: Peter Schuller
> Attachments: CASSANDRA-2660-075.txt
>
>
> This was discovered, fixed and tested on 0.7.5. Cursory examination shows it 
> should still be an issue on trunk/0.8. If people otherwise agree with the 
> patch I can rebase if necessary.
> Problem:
> BRAF.flush() is actually broken in the sense that it cannot be called without 
> close co-operation with the caller. rebuffer() does the co-op by adjusting 
> bufferOffset and validateBufferBytes appropriately, by sync() doesn't. This 
> means sync() is broken, and sync() is used by the commit log.
> The attached patch moves the bufferOffset/validateBufferBytes handling out 
> into resetBuffer() and has both flush() and rebuffer() call that. This makes 
> sync() safe.
> What happened was that for batched commit log mode, every time sync() was 
> called the data buffered so far would get written to the OS and fsync():ed. 
> But until rebuffer() is called for other reasons as part of the write path, 
> all subsequent sync():s would result in the very same data (plus whatever was 
> written since last time) being re-written and fsync():ed again. So first you 
> write+fsync N bytes, then N+N1, then N+N1+N2... (each N being a batch), until 
> at some point you trigger a rebuffer() and it starts all over again.
> The result is that you see *a lot* more writes to the commit log than are in 
> fact written to the BRAF. And these writes translate into actual real writes 
> to the underlying storage device due to fsync(). We had crazy numbers where 
> we saw spikes upwards of 80 mb/second where the actual throughput was more 
> like ~ 1 mb second of data to the commit log.
> (One can make a possibly weak argument that it is also functionally incorrect 
> as I can imagine implementations where re-writing the same blocks does 
> copy-on-write in such a way that you're not necessarily guaranteed to see 
> before-or-after data, particularly in case of partial page writes. However 
> that's probably not a practical issue.)
> Worthy of noting is that this probably causes added difficulties in fsync() 
> latencies since the average fsync() will contain a lot more data. Depending 
> on I/O scheduler and underlying device characteristics, the extra writes 
> *may* not have a detrimental effect, but I think it's pretty easy to point to 
> cases where it will be detrimental - in particular if the commit log is on a 
> non-battery backed drive. Even with a nice battery backed RAID with the 
> commit log on, the size of the writes probably contributes to difficulty in 
> making the write requests propagate down without being starved by reads (but 
> this is speculation, not tested, other than that I've observed commit log 
> writer starvation that seemed excessive).
> This isn't the first subtle BRAF bug. What are people's thoughts on creating 
> separate abstractions for streaming I/O that can perhaps be a lot more 
> simple, and use BRAF only for random reads in response to live traffic? (Not 
> as part of this JIRA, just asking in general.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2660) BRAF.sync() bug can cause massive commit log write magnification

2011-05-17 Thread Peter Schuller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-2660:
--

Attachment: CASSANDRA-2660-075.txt

> BRAF.sync() bug can cause massive commit log write magnification
> 
>
> Key: CASSANDRA-2660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2660
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Peter Schuller
>Assignee: Peter Schuller
> Attachments: CASSANDRA-2660-075.txt
>
>
> This was discovered, fixed and tested on 0.7.5. Cursory examination shows it 
> should still be an issue on trunk/0.8. If people otherwise agree with the 
> patch I can rebase if necessary.
> Problem:
> BRAF.flush() is actually broken in the sense that it cannot be called without 
> close co-operation with the caller. rebuffer() does the co-op by adjusting 
> bufferOffset and validateBufferBytes appropriately, by sync() doesn't. This 
> means sync() is broken, and sync() is used by the commit log.
> The attached patch moves the bufferOffset/validateBufferBytes handling out 
> into resetBuffer() and has both flush() and rebuffer() call that. This makes 
> sync() safe.
> What happened was that for batched commit log mode, every time sync() was 
> called the data buffered so far would get written to the OS and fsync():ed. 
> But until rebuffer() is called for other reasons as part of the write path, 
> all subsequent sync():s would result in the very same data (plus whatever was 
> written since last time) being re-written and fsync():ed again. So first you 
> write+fsync N bytes, then N+N1, then N+N1+N2... (each N being a batch), until 
> at some point you trigger a rebuffer() and it starts all over again.
> The result is that you see *a lot* more writes to the commit log than are in 
> fact written to the BRAF. And these writes translate into actual real writes 
> to the underlying storage device due to fsync(). We had crazy numbers where 
> we saw spikes upwards of 80 mb/second where the actual throughput was more 
> like ~ 1 mb second of data to the commit log.
> (One can make a possibly weak argument that it is also functionally incorrect 
> as I can imagine implementations where re-writing the same blocks does 
> copy-on-write in such a way that you're not necessarily guaranteed to see 
> before-or-after data, particularly in case of partial page writes. However 
> that's probably not a practical issue.)
> Worthy of noting is that this probably causes added difficulties in fsync() 
> latencies since the average fsync() will contain a lot more data. Depending 
> on I/O scheduler and underlying device characteristics, the extra writes 
> *may* not have a detrimental effect, but I think it's pretty easy to point to 
> cases where it will be detrimental - in particular if the commit log is on a 
> non-battery backed drive. Even with a nice battery backed RAID with the 
> commit log on, the size of the writes probably contributes to difficulty in 
> making the write requests propagate down without being starved by reads (but 
> this is speculation, not tested, other than that I've observed commit log 
> writer starvation that seemed excessive).
> This isn't the first subtle BRAF bug. What are people's thoughts on creating 
> separate abstractions for streaming I/O that can perhaps be a lot more 
> simple, and use BRAF only for random reads in response to live traffic? (Not 
> as part of this JIRA, just asking in general.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2268) CQL-enabled stress.java

2011-05-17 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034720#comment-13034720
 ] 

Pavel Yaskevich commented on CASSANDRA-2268:


Thanks! I'm stuck doing other CQL related stuff right now.

> CQL-enabled stress.java
> ---
>
> Key: CASSANDRA-2268
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Eric Evans
>Assignee: Aaron Morton
>Priority: Minor
>  Labels: cql
> Fix For: 0.8.1
>
>
> It would be great if stress.java had a CQL mode.  For making the inevitable 
> RPC->CQL comparisons, but also as a basis for measuring optimizations, and 
> spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2659) Improve forceDeserialize/getCompactedRow encapsulation

2011-05-17 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034708#comment-13034708
 ] 

Sylvain Lebresne commented on CASSANDRA-2659:
-

nitpicks:
  * the could remove the descriptor argument of the first getCompactedRow() and 
call needDeserialize() for the EchoedRow case.
  * we could use that first getCompactedRow() in SSTableWriter (it's really 
only cosmetic as we forceDesialize)
  * the comment of that first getCompactedRow() method is not completely 
correct, since the method may purge data (either if the sstable is of an old 
format or if forceDeserialize is set) while the comment suggest it never does 
it.

but those are nitpicks, so with or without +1

> Improve forceDeserialize/getCompactedRow encapsulation
> --
>
> Key: CASSANDRA-2659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2659
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Jonathan Ellis
>Priority: Minor
> Fix For: 0.8.1
>
> Attachments: 2659.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

2011-05-17 Thread Thibaut (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034671#comment-13034671
 ] 

Thibaut commented on CASSANDRA-2394:


I will do this next time and post the results.

Could http://www.mail-archive.com/user@cassandra.apache.org/msg13407.html cause 
this? We are also using read repair 0.



> Faulty hd kills cluster performance
> ---
>
> Key: CASSANDRA-2394
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.7.4
>Reporter: Thibaut
>Priority: Minor
> Fix For: 0.7.7
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  
> (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on 
> various nodes which cause everything to run very very slowly. Keyrange scans 
> just timeout and will sometimes never succeed. If I stop cassandra on the 
> faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which 
> marks a node as "down" if there are multiple read/write errors to the data 
> directories. A single faulty hd on 1 node shouldn't affect global cluster 
> performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


buildbot success in ASF Buildbot on cassandra-trunk

2011-05-17 Thread buildbot
The Buildbot has detected a restored build on builder cassandra-trunk while 
building ASF Buildbot.
Full details are available at:
 http://ci.apache.org/builders/cassandra-trunk/builds/1317

Buildbot URL: http://ci.apache.org/

Buildslave for this Build: isis_ubuntu

Build Reason: scheduler
Build Source Stamp: [branch cassandra/trunk] 1104054
Blamelist: slebresne

Build succeeded!

sincerely,
 -The Buildbot



svn commit: r1104054 - in /cassandra/trunk: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/db/marshal/

2011-05-17 Thread slebresne
Author: slebresne
Date: Tue May 17 08:37:17 2011
New Revision: 1104054

URL: http://svn.apache.org/viewvc?rev=1104054&view=rev
Log:
merge from 0.8.1

Modified:
cassandra/trunk/   (props changed)
cassandra/trunk/contrib/   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)
cassandra/trunk/src/java/org/apache/cassandra/db/marshal/ReversedType.java

Propchange: cassandra/trunk/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 08:37:17 2011
@@ -2,7 +2,7 @@
 /cassandra/branches/cassandra-0.7:1026516-1102046,1102337
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
 /cassandra/branches/cassandra-0.8:1090935-1102339,1102345
-/cassandra/branches/cassandra-0.8.1:1101014-1102517
+/cassandra/branches/cassandra-0.8.1:1101014-1102517,1104052
 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689
 /incubator/cassandra/branches/cassandra-0.3:774578-796573
 /incubator/cassandra/branches/cassandra-0.4:810145-834239,834349-834350

Propchange: cassandra/trunk/contrib/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 08:37:17 2011
@@ -2,7 +2,7 @@
 /cassandra/branches/cassandra-0.7/contrib:1026516-1102046,1102337
 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654
 /cassandra/branches/cassandra-0.8/contrib:1090935-1102339,1102345
-/cassandra/branches/cassandra-0.8.1/contrib:1101014-1102517
+/cassandra/branches/cassandra-0.8.1/contrib:1101014-1102517,1104052
 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689
 /incubator/cassandra/branches/cassandra-0.3/contrib:774578-796573
 
/incubator/cassandra/branches/cassandra-0.4/contrib:810145-810987,810994-834239,834349-834350

Propchange: 
cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 08:37:17 2011
@@ -2,7 +2,7 @@
 
/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1102046,1102337
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654
 
/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090935-1102339,1102345
-/cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1101014-1102517
+/cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1101014-1102517,1104052
 
/cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689
 
/incubator/cassandra/branches/cassandra-0.3/interface/gen-java/org/apache/cassandra/service/Cassandra.java:774578-796573
 
/incubator/cassandra/branches/cassandra-0.4/interface/gen-java/org/apache/cassandra/service/Cassandra.java:810145-834239,834349-834350

Propchange: 
cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue May 17 08:37:17 2011
@@ -2,7 +2,7 @@
 
/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1102046,1102337
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1053690-1055654
 
/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1090935-1102339,1102345
-/cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1101014-1102517
+/cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1101014-1102517,1104052
 
/cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1051699-1053689
 
/incubator/cassandra/branches/cassandra-0.3/interface/gen-java/org/apache/cassandra/service/column_t.java:774578-792198
 
/incubator/cassandra/branches/cassandra-0.4/interface/gen-java/org/apache/cassandra/service/Column.java:810145-834239,834349-834350

Propchange: 
cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
---

svn commit: r1104052 - /cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/ReversedType.java

2011-05-17 Thread slebresne
Author: slebresne
Date: Tue May 17 08:36:02 2011
New Revision: 1104052

URL: http://svn.apache.org/viewvc?rev=1104052&view=rev
Log:
Add missing method for CASSANDRA-2355

Modified:

cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/ReversedType.java

Modified: 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/ReversedType.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/ReversedType.java?rev=1104052&r1=1104051&r2=1104052&view=diff
==
--- 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/ReversedType.java
 (original)
+++ 
cassandra/branches/cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/ReversedType.java
 Tue May 17 08:36:02 2011
@@ -21,6 +21,9 @@ package org.apache.cassandra.db.marshal;
 import java.nio.ByteBuffer;
 import java.util.HashMap;
 import java.util.Map;
+import java.util.List;
+
+import org.apache.cassandra.config.ConfigurationException;
 
 public class ReversedType extends AbstractType
 {
@@ -30,6 +33,14 @@ public class ReversedType extends Abs
 // package protected for unit tests sake
 final AbstractType baseType;
 
+public static  ReversedType getInstance(TypeParser parser) throws 
ConfigurationException
+{
+List types = parser.getTypeParameters();
+if (types.size() != 1)
+throw new ConfigurationException("ReversedType takes exactly one 
argument, " + types.size() + " given");
+return getInstance(types.get(0));
+}
+
 public static synchronized  ReversedType getInstance(AbstractType 
baseType)
 {
 ReversedType type = instances.get(baseType);