[jira] [Commented] (CASSANDRA-4176) Support for sharding wide rows in CQL 3.0

2012-04-20 Thread Christoph Tavan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258076#comment-13258076
 ] 

Christoph Tavan commented on CASSANDRA-4176:


We're facing the row-key-sharding-problem as well and our workaround is 
concatenating and splitting text-type row keys in our application logic (who 
would not work around like that?). However, that feels somehow hacky given the 
fact that we can use true CompositeType for the column names, which is a huge 
step forward in CQL 3.

Since C* allows row-keys of CompositeType I think it would be a nice feature to 
have them supported through CQL as well since it would remove this 
concatenation-logic from the application and put it into the data-model where 
it belongs.

So +1 for some solution to this from my side.

On the syntax maybe the discussion in CASSANDRA-4004 is also related. I, 
personally, think that adding an attribute to the fields that should be part of 
the row-key in the {{PRIMARY KEY()}} statement like you suggest would be fine.

 Support for sharding wide rows in CQL 3.0
 -

 Key: CASSANDRA-4176
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4176
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Nick Bailey
 Fix For: 1.1.1


 CQL 3.0 currently has support for defining wide rows by declaring a composite 
 primary key. For example:
 {noformat}
 CREATE TABLE timeline (
 user_id varchar,
 tweet_id uuid,
 author varchar,
 body varchar,
 PRIMARY KEY (user_id, tweet_id)
 );
 {noformat}
 It would also be useful to manage sharding a wide row through the cql schema. 
 This would require being able to split up the actual row key in the schema 
 definition. In the above example you might want to make the row key a 
 combination of user_id and day_of_tweet, in order to shard timelines by day. 
 This might look something like:
 {noformat}
 CREATE TABLE timeline (
 user_id varchar,
 day_of_tweet date,
 tweet_id uuid,
 author varchar,
 body varchar,
 PRIMARY KEY (user_id REQUIRED, day_of_tweet REQUIRED, tweet_id)
 );
 {noformat}
 Thats probably a terrible attempt at how to structure that in CQL. But I 
 think I've gotten the point across. I tagged this for cql 3.0, but I'm 
 honestly not sure how much work it might be. As far as I know built in 
 support for composite keys is limited.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4176) Support for sharding wide rows in CQL 3.0

2012-04-20 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258105#comment-13258105
 ] 

Sylvain Lebresne commented on CASSANDRA-4176:
-

I meant to open a similar ticket for some time now but forgot. I've actually 
created it as CASSANDRA-4179. It is also basically suggesting adding support 
for composites in row key. I however decided to open a separate ticket because:
# I didn't meant CASSANDRA-4179 to be specific to sharding specific and in 
particular discuss there the question of composite in column values.
# I think that adding a nice syntax for composite in the row key is indeed nice 
for sharding very wide rows, but I'm thinking maybe it could be worth going 
even further. What I mean here is that sharding a time series is very common so 
we could imagine making that sharding more automatic. For instance (and using a 
syntax on which I haven't given much though, but reusing one of my syntax 
suggestion from CASSANDRA-4179), we could have:
{noformat}
CREATE TABLE timeline (
user_id varchar,
day_of_tweet date AUTO(day(tweet_id)),
tweet_id uuid,
author varchar,
body varchar,
GROUP (user_id, day_of_tweet) as key,
PRIMARY KEY (key, tweet_id)
);
{noformat}
for which the semantic would be that the day_of_tweet would be automatically 
calculated from tweet_id.

I'll admit it's a bit specific in a way, and clearly we could say we leave that 
to the client, but time series is a very very common use case for Cassandra and 
sharding rows is very often needed at some granularity so ...

Anyway, my suggestion would be to keep the 'composites in row key' discussion 
in CASSANDRA-4179 and maybe discuss deeper support for row sharding here.

 Support for sharding wide rows in CQL 3.0
 -

 Key: CASSANDRA-4176
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4176
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Nick Bailey
 Fix For: 1.1.1


 CQL 3.0 currently has support for defining wide rows by declaring a composite 
 primary key. For example:
 {noformat}
 CREATE TABLE timeline (
 user_id varchar,
 tweet_id uuid,
 author varchar,
 body varchar,
 PRIMARY KEY (user_id, tweet_id)
 );
 {noformat}
 It would also be useful to manage sharding a wide row through the cql schema. 
 This would require being able to split up the actual row key in the schema 
 definition. In the above example you might want to make the row key a 
 combination of user_id and day_of_tweet, in order to shard timelines by day. 
 This might look something like:
 {noformat}
 CREATE TABLE timeline (
 user_id varchar,
 day_of_tweet date,
 tweet_id uuid,
 author varchar,
 body varchar,
 PRIMARY KEY (user_id REQUIRED, day_of_tweet REQUIRED, tweet_id)
 );
 {noformat}
 Thats probably a terrible attempt at how to structure that in CQL. But I 
 think I've gotten the point across. I tagged this for cql 3.0, but I'm 
 honestly not sure how much work it might be. As far as I know built in 
 support for composite keys is limited.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4179) Add more general support for composites (to row key, column value)

2012-04-20 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258106#comment-13258106
 ] 

Sylvain Lebresne commented on CASSANDRA-4179:
-

PS: I know that this is related to CASSANDRA-4176 but see my (first) comment 
there.

 Add more general support for composites (to row key, column value)
 --

 Key: CASSANDRA-4179
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4179
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
 Fix For: 1.1.1


 Currently CQL3 have a nice syntax for using composites in the column name 
 (it's more than that in fact, it creates a whole new abstraction but let's 
 say I'm talking implementation here). There is however 2 other place where 
 composites could be used (again implementation wise): the row key and the 
 column value. This ticket proposes to explore which of those make sense for 
 CQL3 and how.
 For the row key, I really think that CQL support makes sense. It's very 
 common (and useful) to want to stuff composite information in a row key. 
 Sharding a time serie (CASSANDRA-4176) is probably the best example but there 
 is other.
 For the column value it is less clear. CQL3 makes it very transparent and 
 convenient to store multiple related values into multiple columns so maybe 
 composites in a column value is much less needed. I do still see two cases 
 for which it could be handy:
 # to save some disk/memory space, if you do know it makes no sense to 
 insert/read two value separatly.
 # if you want to enforce that two values should not be inserted separatly. 
 I.e. to enforce a form of constraint to avoid programatic error.
 Those are not widely useful things, but my reasoning is that if whatever 
 syntax we come up for grouping row key in a composite trivially extends to 
 column values, why not support it.
 As for syntax I have 3 suggestions (that are just that, suggestions):
 # If we only care about allowing grouping for row keys:
 {noformat}
 CREATE TABLE timeline (
 name text,
 month int,
 ts timestamp,
 value text,
 PRIMARY KEY ((name, month), ts)
 )
 {noformat}
 # A syntax that could work for both grouping in row key and colum value:
 {noformat}
 CREATE TABLE timeline (
 name text,
 month int,
 ts timestamp,
 value1 text,
 value2 text,
 GROUP (name, month) as key,
 GROUP (value1, value2),
 PRIMARY KEY (key, ts)
 )
 {noformat}
 # An alternative to the preceding one:
 {noformat}
 CREATE TABLE timeline (
 name text,
 month int,
 ts timestamp,
 value1 text,
 value2 text,
 GROUP (name, month) as key,
 GROUP (value1, value2),
 PRIMARY KEY (key, ts)
 ) WITH GROUP (name, month) AS key
AND GROUP (value1, value2)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4170) cql3 ALTER TABLE ALTER TYPE has no effect

2012-04-20 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258264#comment-13258264
 ] 

Jonathan Ellis commented on CASSANDRA-4170:
---

+1

 cql3 ALTER TABLE ALTER TYPE has no effect
 -

 Key: CASSANDRA-4170
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4170
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core
Affects Versions: 1.1.0
Reporter: paul cannon
Assignee: Sylvain Lebresne
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4170.txt


 running the following with cql3:
 {noformat}
 CREATE TABLE test (foo text PRIMARY KEY, bar int);
 ALTER TABLE test ALTER bar TYPE float;
 {noformat}
 does not actually change the column type of bar. It does under cql2.
 Note that on the current cassandra-1.1.0 HEAD, this causes an NPE, fixed by 
 CASSANDRA-4163. But even with that applied, the ALTER shown here has no 
 effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4177) Little improvement on the messages of the exceptions thrown by ExternalClient

2012-04-20 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258267#comment-13258267
 ] 

Jonathan Ellis commented on CASSANDRA-4177:
---

Don't you get a Caused by later on in the stack trace with the original 
approach?

 Little improvement on the messages of the exceptions thrown by ExternalClient
 -

 Key: CASSANDRA-4177
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4177
 Project: Cassandra
  Issue Type: Improvement
Reporter: Michał Michalski
Assignee: Michał Michalski
Priority: Trivial
 Attachments: trunk-4177.txt


 After adding BulkRecordWriter (or actually ExternalClient) the ability to 
 make use of authentication I've noticed that exceptions that are thrown on 
 login failure are very misguiding - there's always a Could not retrieve 
 endpoint ranges RuntimeException being thrown, no matter what really 
 happens. This hides the real reason of all authentication problems. I've 
 changed this line a bit, so all the messages are passed without any change, 
 so now I get - for example - AuthenticationException(why:Given password in 
 password mode MD5 could not be validated for user operator) or - in worst 
 case - Unexpected authentication problem, which is waaay more helpful, so I 
 submit this trivial, but useful improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4180) Remove 2-phase compaction

2012-04-20 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258294#comment-13258294
 ] 

Sylvain Lebresne commented on CASSANDRA-4180:
-

I'll note that an initial idea could be to keep the row header as it is (post 
CASSANDRA-2319), and during compaction to keep the space for the row size and 
column count, compact all columns, and seek back to write those two values. 
However, compression forbids us to do that, so we'll have to really remove 
those part two. However, we can trade the column count by writing a specific 
marker to mark the end of a row. As for the data size, we can get it from the 
index.

 Remove 2-phase compaction
 -

 Key: CASSANDRA-4180
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4180
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 1.2


 LazilyCompactedRow reads all data twice to compact a row which is obviously 
 inefficient. The main reason we do that is to compute the row header. 
 However, CASSANDRA-2319 have removed the main part of that row header. What 
 remains is the size in bytes and the number of columns, but it should be 
 relatively simple to remove those, which would then remove the need for the 
 two-phase compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3708) Support composite prefix tombstones

2012-04-20 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258303#comment-13258303
 ] 

Sylvain Lebresne commented on CASSANDRA-3708:
-

I've updated my branch at https://github.com/pcmanus/cassandra/commits/3708 to 
add efficient on-disk handling of the new range tombstones.

The idea is that we don't want to have to read every range tombstone for each 
query, but only the ones corresponding to the columns queried. For that, the 
idea is to write the range tombstone along with the columns themselves. So the 
basic principal of the patch is that if we have a range tombstone RT[x, y] 
deleting all columns between x and y, we write a tombstone marker on disk 
before column x. Of course in practice that's more complicated because we want 
to be sure to read that tombstone even if we read only say y. To ensure that, 
such tombstone marker is repeated at the beginning of every column block (index 
block) the range covers (the code is smart enough to not repeat a marker that 
is superseded by other ones so there won't be a lot of such repeated marker at 
the beginning of each block in practice).

Note that those tombstone marker are only specific for the on-disk format (in 
memory we use an interval tree), which has 2 consequences for the patch:
# the on-disk format now diverges a little bit from the wire format. So the 
code separates (hopefullly) cleanly serialization functions that deal with 
on-disk format from the others. I don't think it's a bad idea to have that 
distinction anyway since we don't want to break the wire protocol but it's ok 
to change the on-disk one.
# on-disk column iterators (SSTable{Slice,Name}Iterator) have to handle those 
tombstone markers that are not columns per-se. I.e, after having read them from 
disk we want to store them in the interval tree of the ColumnFamily object, not 
as an IColumn in the ColumnFamily map. To do this distinction, the code 
introduces an interface called OnDiskAtom, which represent basically either a 
column or a range tombstone. And the sstable iterators return those OnDiskAtom 
which are then ultimately added correctly to the resulting ColumnFamily object. 
I do think this is the clean way to handle this, but this is responsible for 
quite a bit of code diffs.

I'll also note that both those changes should be useful for CASSANDRA-4180 too 
to handle the end-of-row marker described in that issue.

Now I admit this patch is not a small one, but unit tests are passing and there 
is a few basic tests at 
https://github.com/pcmanus/cassandra-dtest/commits/3708_tests.

Lastly, I'll add that the support for this by CQL3 is minimal as of this patch. 
We only allow what is basically the equivalent of the 'delete a whole super 
column' behavior. But it would be simple to allow for more generic use of range 
tombstones, i.e to allow stuff like:
{noformat}
DELETE FROM test WHERE k=0 AND c  3 and c = 10
{noformat}
But the patch is big enough that we can see that later.


 Support composite prefix tombstones
 -

 Key: CASSANDRA-3708
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3708
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4171) cql3 ALTER TABLE foo WITH default_validation=int has no effect

2012-04-20 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258312#comment-13258312
 ] 

Jonathan Ellis commented on CASSANDRA-4171:
---

+1

 cql3 ALTER TABLE foo WITH default_validation=int has no effect
 --

 Key: CASSANDRA-4171
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4171
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core
Affects Versions: 1.1.0
Reporter: paul cannon
Assignee: Sylvain Lebresne
Priority: Trivial
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4171.txt


 running the following with cql3:
 {noformat}
 CREATE TABLE test (foo text PRIMARY KEY) WITH default_validation=timestamp;
 ALTER TABLE test WITH default_validation=int;
 {noformat}
 does not actually change the default validation type of the CF. It does under 
 cql2.
 No error is thrown. Some properties *can* be successfully changed using ALTER 
 WITH, such as comment and gc_grace_seconds, but I haven't tested all of them. 
 It seems probable that default_validation is the only problematic one, since 
 it's the only (changeable) property which accepts CQL typenames.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4161) CQL 3.0 does not work in cqlsh with uppercase SELECT

2012-04-19 Thread Christoph Tavan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257361#comment-13257361
 ] 

Christoph Tavan commented on CASSANDRA-4161:


I'm wondering why CQL is being parsed in the client at all? Couldn't we just 
handle the exceptions thrown by cassandra? That way we wouldn't have to keep 
cqlsh in sync with CQL development on the C*-side.

 CQL 3.0 does not work in cqlsh with uppercase SELECT
 

 Key: CASSANDRA-4161
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4161
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
 Environment: cqlsh
Reporter: Jonas Dohse
Priority: Minor
  Labels: cql3, cqlsh
 Attachments: 
 0001-Allow-CQL-3.0-with-uppercase-SELECT-statement.patch, 4161.patch.txt


 Uppercase SELECT prevents usage of CQL 3.0 features like ORDER BY
 Example:
 select * from test ORDER BY number; # works
 SELECT * from test ORDER BY number; # fails

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4161) CQL 3.0 does not work in cqlsh with uppercase SELECT

2012-04-19 Thread Jonas Dohse (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257385#comment-13257385
 ] 

Jonas Dohse commented on CASSANDRA-4161:


Works for me™

 CQL 3.0 does not work in cqlsh with uppercase SELECT
 

 Key: CASSANDRA-4161
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4161
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
 Environment: cqlsh
Reporter: Jonas Dohse
Priority: Minor
  Labels: cql3, cqlsh
 Attachments: 
 0001-Allow-CQL-3.0-with-uppercase-SELECT-statement.patch, 4161.patch.txt


 Uppercase SELECT prevents usage of CQL 3.0 features like ORDER BY
 Example:
 select * from test ORDER BY number; # works
 SELECT * from test ORDER BY number; # fails

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3909) Pig should handle wide rows

2012-04-19 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257455#comment-13257455
 ] 

Sylvain Lebresne commented on CASSANDRA-3909:
-

Is that a big deal if it's only in 1.1.1? I mean, personally I do trust you on 
that this can't break anything and I don't object on putting it in 1.1.0. I 
do however think that in general there would be some merit to stick to more 
strict rules. But that's not a debate related to this issue in particular so 
let's leave that discussing to some other venue.


 Pig should handle wide rows
 ---

 Key: CASSANDRA-3909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1

 Attachments: 3909.txt


 Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257521#comment-13257521
 ] 

Sylvain Lebresne commented on CASSANDRA-4004:
-

bq. From the standpoint of a new user, this is sophistry.

How is that sophistry, seriously? I think it's very important that the 
clustered part of th PK induces an ordering of records (which SQL don't have, 
but we're not talking about SQL here, right). It's important  because you don't 
model things the same way in Cassandra than traditionally you do in SQL: you 
denormalize, you model time series etc... for which the notion that there is an 
ordering or records is not an implementation detail, nor something dealt with 
at query time (contrarily to SQL), but is an important part of the model. It 
would be confusing for brand new user to *not* say that ordering is part of the 
data model (and again yes, that's a difference with SQL). I also don't see how 
saying that is in any way related to being a veteran and whatnot. I 200% agree 
that CQL3 is an abstraction and that it is A Good Thing. I'm saying the 
ordering induced by the PK should be part of that abstraction.

But then it's natural that SELECT without ORDER BY should return records in 
that clustering order, which will indeed not be the same than the order of with 
ORDER BY ASC *unless* Y is the first clustered key. But if is the first 
clustered key, then yes, SELECT and SELECT ORDER BY ASC should be the same (and 
they are). But then it's not rocket science to say that if the ordering is 
'reversed alphabetical order', then 'z'  'a' and thus a SELECT ORDER BY ASC 
returns 'z' before 'a'.

So I absolutely and strongly refute that this proposal is somehow sophistry and 
even more so that it's a negation of the abstractive nature of CQL3 or 
influenced by the thrift API any more that the solution you're pushing for.

bq. The other relevant context here is that we decided not to support arbitrary 
orderings

I'm either misunderstanding what you call 'arbitrary orderings' or I have not 
been part of that discussion. Because if you talk of custom types, then CQL3 
does support them (you can declare CREATE TABLE foo (k myCustomType PRIMARY 
KEY)). And I'm -1 on removing that support, unless someone has compelling 
reason to do so because I certainly don't see any and that's useful. And yes, I 
do see this as a good reason to go with my proposal, since it's not very 
consistent if
{noformat}
CREATE TABLE foo (
k1 uuid,
k2 myEfficientComplexNumberType,
c text,
PRIMARY KEY (k1, k2)
) WITH CLUSTERING ORDER BY (k2 DESC)
{noformat}
is *not* the same than
{noformat}
CREATE TABLE foo (
k1 uuid,
k2 myReversedEfficientComplexNumberType,
c text,
PRIMARY KEY (k1, k2)
)
{noformat}

 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4165) Generate Digest file for compressed SSTables

2012-04-19 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257534#comment-13257534
 ] 

Sylvain Lebresne commented on CASSANDRA-4165:
-

We don't really have discussed it more than the reasoning Jonathan explained 
:). But if it's for external tools, is it still useful to have it computed 
during the sstable write (i.e, you could generate the sha1 yourself before 
backupping the file in the first place)? Not that it's much work for us to do 
it (well, except for the added cpu usage during sstable write maybe). 

 Generate Digest file for compressed SSTables
 

 Key: CASSANDRA-4165
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4165
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Priority: Minor
 Attachments: 0001-Generate-digest-for-compressed-files-as-well.patch


 We use the generated *Digest.sha1-files to verify backups, would be nice if 
 they were generated for compressed sstables as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257539#comment-13257539
 ] 

Jonathan Ellis commented on CASSANDRA-4004:
---

bq. I'm either misunderstanding what you call 'arbitrary orderings' or I have 
not been part of that discussion

I think you are misunderstanding.  This is what I'm referring to:

{code}
.   if (stmt.parameters.orderBy != null)
{
CFDefinition.Name name = cfDef.get(stmt.parameters.orderBy);
if (name == null)
throw new InvalidRequestException(String.format(Order by 
on unknown column %s, stmt.parameters.orderBy));

if (name.kind != CFDefinition.Name.Kind.COLUMN_ALIAS || 
name.position != 0)
throw new InvalidRequestException(String.format(Order by 
is currently only supported on the second column of the PRIMARY KEY (if any), 
got %s, stmt.parameters.orderBy));
}
{code}

bq. How is that sophistry, seriously? 

ORDER BY X DESC does not mean give me them in the reverse order that Xes are 
in on disk, it means give me larger values before smaller ones.  This isn't 
open for debate, it's a very clear requirement.

Remember that clustering is not new ground for databases; SQL has been there, 
done that.  As I mentioned when we were designing the CQL3 schema syntax, 
RDBMSes have had a concept of clustered indexes for a long, long time.  But 
clustering on an index ASC or DESC does not affect the results other than as an 
optimization; when you ORDER BY X, that's what you get.

SQL and CQL are declarative languages: Here is what I want; you figure out how 
to give me the results.  This has proved a good design.  Modifying the 
semantics of a query based on index or clustering or other declarations 
elsewhere has ZERO precedent and is bad design to boot; you don't want users to 
have to consult their DDL when debugging, to know what results a query will 
give.

Thus, the only design that makes sense in the larger context of a declarative 
language is to treat the clustering as an optimization as I've described (or 
as an index, if you prefer), and continue to reject ORDER BY requests that 
are neither forward- nor reverse-clustered.

 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4161) CQL 3.0 does not work in cqlsh with uppercase SELECT

2012-04-19 Thread paul cannon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257548#comment-13257548
 ] 

paul cannon commented on CASSANDRA-4161:


bq. I'm wondering why CQL is being parsed in the client at all? Couldn't we 
just handle the exceptions thrown by cassandra? That way we wouldn't have to 
keep cqlsh in sync with CQL development on the C*-side.

cqlsh has to attempt to parse input in order to recognize keyspace switches, 
provide tab-completion, implement the cqlsh-specific commands, separate 
multiple statements, and (in the future) to allow things like CASSANDRA-3799.

Yes, of course, if cqlsh can identify a CQL statement but can't parse it, and 
it doesn't recognize the command word as being cqlsh-specific, it should pass 
the CQL on untouched to Cassandra. The problem in this ticket was with cqlsh 
deciding incorrectly that the user intended to give a cqlsh-only command.

 CQL 3.0 does not work in cqlsh with uppercase SELECT
 

 Key: CASSANDRA-4161
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4161
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
 Environment: cqlsh
Reporter: Jonas Dohse
Priority: Minor
  Labels: cql3, cqlsh
 Attachments: 
 0001-Allow-CQL-3.0-with-uppercase-SELECT-statement.patch, 4161.patch.txt


 Uppercase SELECT prevents usage of CQL 3.0 features like ORDER BY
 Example:
 select * from test ORDER BY number; # works
 SELECT * from test ORDER BY number; # fails

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4161) CQL 3.0 does not work in cqlsh with uppercase SELECT

2012-04-19 Thread paul cannon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257549#comment-13257549
 ] 

paul cannon commented on CASSANDRA-4161:


+1 for 4161.patch.txt.

 CQL 3.0 does not work in cqlsh with uppercase SELECT
 

 Key: CASSANDRA-4161
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4161
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
 Environment: cqlsh
Reporter: Jonas Dohse
Priority: Minor
  Labels: cql3, cqlsh
 Attachments: 
 0001-Allow-CQL-3.0-with-uppercase-SELECT-statement.patch, 4161.patch.txt


 Uppercase SELECT prevents usage of CQL 3.0 features like ORDER BY
 Example:
 select * from test ORDER BY number; # works
 SELECT * from test ORDER BY number; # fails

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3909) Pig should handle wide rows

2012-04-19 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257556#comment-13257556
 ] 

Brandon Williams commented on CASSANDRA-3909:
-

bq. personally I do trust you on that this can't break anything

3

bq. I do however think that in general there would be some merit to stick to 
more strict rules.

I agree, however my reasoning is thus: if we support wide rows in 1.1.0 (and we 
do) then why not pig?

 Pig should handle wide rows
 ---

 Key: CASSANDRA-3909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1

 Attachments: 3909.txt


 Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257568#comment-13257568
 ] 

Sylvain Lebresne commented on CASSANDRA-4004:
-

bq. This is what I'm referring to:

Wait, what happened to Third (and this is the big one) I strongly suspect that 
we're going to start supporting at least limited run-time ordering in the near 
future from CASSANDRA-3925. How can I reconcile that with The other relevant 
context here is that we decided not to support arbitrary orderings?

bq. ORDER BY X DESC does not mean give me them in the reverse order that Xes 
are in on disk

I *never* suggested that, not even a little. Not more than you did.

bq. it means give me larger values before smaller ones. This isn't open for 
debate, it's a very clear requirement

Sure. But the definition of larger versus smaller depends on what ordering you 
are talking about. This isn't open for debate either. Math have closed that 
debate for ages. And SQL is not excluded from that rule, but it just happens 
that SQL has default orderings (based on the column type) and you can't define 
new ones. But we can do that in CQL. We can independently of this ticket 
because of custom types.

Again, once you consider custom types (which we have), you can't hide behind 
that the fact that value X is larger than Y depends on the ordering induces by 
your custom types. That's the ASC order, and DESC is the reverse of that. If 
someone define it's own custom types being reverseIntegerType, how can you 
avoid SELECT ORDER BY DESC to not return 1 before 3? You can't, and returning 1 
before 3 absolutely make sense because 1 is larger than 3 if the order is 
'reverseInteger'.

bq. SQL and CQL are declarative languages: Here is what I want; you figure out 
how to give me the results. This has proved a good design.

Sure, *nothing* in what I'm suggesting is at odd with that.

bq. Modifying the semantics of a query based on index or clustering

Again, I'm not suggesting any such thing at all. The semantic of a SELECT X 
ORDER BY Y depends on what ordering relation is defined for Y *because the 
ordering relation is what defines the order*. SQL has a limited and non 
customizable set of types *and* (implicitly) define an ordering relation for 
each one. If one type was 'thing' it would have to define the ordering of 
'thing' otherwise ORDER BY queries wouldn't be properly defined. CQL also has a 
default set of types which have associated ordering relation. I'm *only* 
suggesting we add a simple syntax so that given a type/relation (a default one 
or a custom one btw), we can define the type/ordering relation that validate 
the same value but have the reversed ordering.

 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257580#comment-13257580
 ] 

Jonathan Ellis commented on CASSANDRA-4004:
---

bq. what happened to Third (and this is the big one) I strongly suspect that 
we're going to start supporting at least limited run-time ordering in the near 
future from CASSANDRA-3925

Nothing, except that it's a separate ticket's worth of work.

bq. I never suggested that [ORDER BY depends on disk order], not even a little. 
Not more than you did.

I really don't see the distinction between saying disk order and clustering 
order, as in the clustered part of th PK induces an ordering of records ... 
SELECT without ORDER BY should return records in that clustering order ... 
SELECT ORDER BY ASC returns 'z' before 'a'. 

But disk order or clustering order, I don't care which you call it; I reject 
both as modifiers of the semantics of DESC.

bq. the fact that value X is larger than Y depends on the ordering induces by 
your custom types

Agreed.  But that's not the same as reverse-clustering on a type: y int ... 
PRIMARY KEY (x, y DESC) (to use your syntax) is NOT the same as y ReversedInt 
... PRIMARY KEY (x, y).  In the former, ORDER BY Y DESC should give larger Y 
before smaller; in the latter, the reverse.


 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4161) CQL 3.0 does not work in cqlsh with uppercase SELECT

2012-04-19 Thread Jonas Dohse (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257588#comment-13257588
 ] 

Jonas Dohse commented on CASSANDRA-4161:


+1 for 4161.patch.txt

 CQL 3.0 does not work in cqlsh with uppercase SELECT
 

 Key: CASSANDRA-4161
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4161
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
 Environment: cqlsh
Reporter: Jonas Dohse
Priority: Minor
  Labels: cql3, cqlsh
 Attachments: 
 0001-Allow-CQL-3.0-with-uppercase-SELECT-statement.patch, 4161.patch.txt


 Uppercase SELECT prevents usage of CQL 3.0 features like ORDER BY
 Example:
 select * from test ORDER BY number; # works
 SELECT * from test ORDER BY number; # fails

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257619#comment-13257619
 ] 

Sylvain Lebresne commented on CASSANDRA-4004:
-

bq. Nothing, except that it's a separate ticket's worth of work.

Oh ok. For the records I didn't implied otherwise.

bq. But that's not the same as reverse-clustering on a type: y int ... PRIMARY 
KEY (x, y DESC) (to use your syntax) is NOT the same as y ReversedInt ... 
PRIMARY KEY (x, y). In the former, ORDER BY Y DESC

What?! I said that I wasn't sure my syntax was good. But with all I've said I 
expected it was clear that what I want to do with this ticket from day one is 
to allow to define y ReversedInt ... PRIMARY KEY but without having to write 
a custom java class since we don't have to and that is *exactly* what my patch 
implements. I'm fine saying my syntax suck and allow to write is y 
reversed(int) .. PK. But to be clear, I don't think that option is a bad fit 
at all for CQL3, and that's not the C* veteran talk.

bq. In the former, ORDER BY Y DESC should give larger Y before smaller (that 
is, 100 before 1); in the latter, the reverse

To my defence, you're attributing *your* semantic to *my* made up syntax (which 
again, may be is counter-intuitive to you with your background but is really 
not to me, and I made it clear that it was a suggestion. I even said in the 
description that Alternatively, the DESC could also be put after the column 
name definition).

bq. I really don't see the distinction between saying disk order and 
clustering order, as in the clustered part of th PK induces an ordering of 
records ...

Maybe with the reversed(int) syntax it makes it more clear, but when I talk 
about ordering of records, I'm saying that we should say that in CQL the model 
defines an ordering of the rows (where rows is in the sense of SQL) in tables, 
order that is defined as the ordering implied by the types of the clustered 
keys (and to be clear, I don't care what clustering mean in SQL, I'm reusing 
the name because you're using it, but I *only* mean by that term the fields in 
the PK after the first one). That doesn't imply the disk order has to respect 
it (though it will but that's an implementation detail). In other words, and 
somewhat unrelated to this issue, I think there would be value to say that the 
order of SELECT without any ORDER BY is something defined by CQL (while SQL 
does not do that). I think there would be value because I think it helps 
understanding which model are a good fit for CQL.

Now, and to sum up, I think that having the y reversed(int) syntax has the 
following advantages over just allowing to change the on-disk order:
# I do think that in most case it's more natural to define a reversed type 
rather than just adding an optim for reversed queries. Typically, it means that 
'y reversed(myCustomType)' is the same than 'y myReversedCustomType' which 
has a nice consistency to it. In the alternative, and even though I'm *not* 
saying it's ill defined in any way, I do think that have a form of syntactic 
double negation that is not equivalent to removing both is kind of weird.
# Though that seems to be very clear to you, I do think that it's not 
necessarily clear per se (i.e to anyone that may not be familiar with SQL 
clustering for instance) that WITH CLUSTERING ORDER (x DESC) does not change 
the ordering (and by that I mean 'does not semantically mean x 
reversed(type)').
# With that solution, we can maintain (without doing anything) the fact that a 
select without ORDERING respect the ordering implied by the clustering. I 
think it's convenient for C*. Again, lots of efficient model for C* uses that 
ordering, so it feels like a better idea to say 'oh, and contrarily to SQL the 
order of records in a table is defined (and thus the default ordering or 
SELECT) and lots of good modeling pattern for C* rely on this'.

 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent

[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257663#comment-13257663
 ] 

Jonathan Ellis commented on CASSANDRA-4004:
---

bq. the model defines an ordering of the rows (where rows is in the sense of 
SQL) in tables, order that is defined as the ordering implied by the types of 
the clustered keys (and to be clear, I don't care what clustering mean in 
SQL, I'm reusing the name because you're using it, but I only mean by that term 
the fields in the PK after the first one). That doesn't imply the disk order 
has to respect it

I think the mental model of rows as predicates, queries returning sets of rows 
with no inherent order, and ORDER BY as specifying the desired order, is much 
simpler and easier to reason about (see prior point about having to consult DDL 
+ QUERY to figure out what order results are supposed to appear in).

bq. To my defence, you're attributing your semantic to my made up syntax 

I was trying to say that I view ReversedType(Int32Type) as modification of 
Int32Type (which should not affect int ordering) and not a completely new type, 
the way the (hypothetical) ReversedInt (or BackwardsInt, or AlmostNotQuiteInt) 
type would be.  Since the latter isn't really related to an int at all, even 
though they look a lot like ints in many respects.

bq. I do think that in most case it's more natural to define a reversed type 
rather than just adding an optim for reversed queries. 

I don't follow.

bq. I do think that have a form of syntactic double negation that is not 
equivalent to removing both is kind of weird... I do think that it's not 
necessarily clear per se (i.e to anyone that may not be familiar with SQL 
clustering for instance) that WITH CLUSTERING ORDER (x DESC) does not change 
the ordering

But saying {{ORDER BY X DESC}} always gives you higher X first is the only 
way to avoid the double negation!  Otherwise in your original syntax of PK (X, 
Y DESC), the only way to get 1 to sort before 100 is to ask for ORDER BY Y DESC 
so the DESC cancel out!

I just can't agree that ORDER BY Y DESC giving {1, 100} is going to be less 
confusing than {100, 1}, no matter how much we tell users, No, you see, it's 
really just reversing the clustering order, which you already reversed...

Users may not be familiar with clustering, but they're *very* familiar with 
ORDER BY, which as I said above, is very clear on what it does.  Clustering is 
the closest example of how performance hints should *not* change the semantics 
of the query, but indexes fall into the same category.

It may also be worth pointing out that it's worth preserving CQL compatibility 
with Hive; queries that execute on both (and to the best of my knowledge CQL3 
is a strict subset of Hive SQL) should not give different results.

 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4163) CQL3 ALTER TABLE command causes NPE

2012-04-19 Thread paul cannon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257675#comment-13257675
 ] 

paul cannon commented on CASSANDRA-4163:


Link for the new tag (it's on the same branch, though):

http://github.com/thepaul/cassandra/tree/pending/4163-2

 CQL3 ALTER TABLE command causes NPE
 ---

 Key: CASSANDRA-4163
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4163
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: INFO 16:07:11,757 Cassandra version: 1.1.0-rc1-SNAPSHOT
 INFO 16:07:11,757 Thrift API version: 19.30.0
 INFO 16:07:11,758 CQL supported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0)
Reporter: Kristine Hahn
Assignee: paul cannon
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4163.patch-2.txt, 4163.patch.txt


 To reproduce the problem:
 ./cqlsh --cql3
 Connected to Test Cluster at localhost:9160.
 [cqlsh 2.2.0 | Cassandra 1.1.0-rc1-SNAPSHOT | CQL spec 3.0.0 | Thrift 
 protocol 19.30.0]
 Use HELP for help.
 cqlsh CREATE KEYSPACE test34 WITH strategy_class = 
 'org.apache.cassandra.locator.SimpleStrategy' AND 
 strategy_options:replication_factor='1';
 cqlsh USE test34;
 cqlsh:test34 CREATE TABLE users (
   ... password varchar,
   ... gender varchar,
   ... session_token varchar,
   ... state varchar,
   ... birth_year bigint,
   ... pk varchar,
   ... PRIMARY KEY (pk)
   ... );
 cqlsh:test34 ALTER TABLE users ADD coupon_code varchar;
 TSocket read 0 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4173) cqlsh: in cql3 mode, use cql3 quoting when outputting cql

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257757#comment-13257757
 ] 

Jonathan Ellis commented on CASSANDRA-4173:
---

Does CQL2 support double quotes?  If so, switching to doublequotes everywhere 
may be simpler.

 cqlsh: in cql3 mode, use cql3 quoting when outputting cql
 -

 Key: CASSANDRA-4173
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4173
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.1.0
Reporter: paul cannon
Assignee: paul cannon
Priority: Minor
  Labels: cql3, cqlsh

 when cqlsh needs to output a column name or other term which needs quoting 
 (say, if you run DESCRIBE KEYSPACE and some column name has a space in it), 
 it currently only knows how to quote in the cql2 way. That is,
 {noformat}
 cqlsh:foo describe columnfamily bar
 CREATE COLUMNFAMILY bar (
   a int PRIMARY KEY,
   'b c' text
 ) WITH
 ...
 {noformat}
 cql3 does not recognize single quotes around column names, or columnfamily or 
 keyspace names either. cqlsh ought to learn how to use double-quotes instead 
 when in cql3 mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4174) Unnecessary compaction happens when streaming

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257794#comment-13257794
 ] 

Jonathan Ellis commented on CASSANDRA-4174:
---

Are you proposing we issue a single compaction submission when streaming is 
done, instead?

 Unnecessary compaction happens when streaming
 -

 Key: CASSANDRA-4174
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4174
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.0.10

 Attachments: 4174-1.0.txt


 When streaming session finishes, streamed sstabls are added to CFS one by one 
 using 
 ColumnFamilyStore#addSSTable(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/streaming/StreamInSession.java#L141).
  This method submits compaction in 
 background(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L946),
  and end up with unnecessary compaction tasks behind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4173) cqlsh: in cql3 mode, use cql3 quoting when outputting cql

2012-04-19 Thread paul cannon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257813#comment-13257813
 ] 

paul cannon commented on CASSANDRA-4173:


No, it doesn't. Definitely that would have been easier.

 cqlsh: in cql3 mode, use cql3 quoting when outputting cql
 -

 Key: CASSANDRA-4173
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4173
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.1.0
Reporter: paul cannon
Assignee: paul cannon
Priority: Minor
  Labels: cql3, cqlsh

 when cqlsh needs to output a column name or other term which needs quoting 
 (say, if you run DESCRIBE KEYSPACE and some column name has a space in it), 
 it currently only knows how to quote in the cql2 way. That is,
 {noformat}
 cqlsh:foo describe columnfamily bar
 CREATE COLUMNFAMILY bar (
   a int PRIMARY KEY,
   'b c' text
 ) WITH
 ...
 {noformat}
 cql3 does not recognize single quotes around column names, or columnfamily or 
 keyspace names either. cqlsh ought to learn how to use double-quotes instead 
 when in cql3 mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4174) Unnecessary compaction happens when streaming

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257821#comment-13257821
 ] 

Jonathan Ellis commented on CASSANDRA-4174:
---

Devil's advocate for the status quo: starting compaction as soon as I have one 
sstable to work on might smooth out the workload more.  (If we finish the 
first compaction before the next is available, then great; if we don't, then 
they'll stack up and we'll do something closer to the all at once approach.)

Thoughts?

 Unnecessary compaction happens when streaming
 -

 Key: CASSANDRA-4174
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4174
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.0.10

 Attachments: 4174-1.0.txt


 When streaming session finishes, streamed sstabls are added to CFS one by one 
 using 
 ColumnFamilyStore#addSSTable(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/streaming/StreamInSession.java#L141).
  This method submits compaction in 
 background(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L946),
  and end up with unnecessary compaction tasks behind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257836#comment-13257836
 ] 

Jonathan Ellis commented on CASSANDRA-4175:
---

The wrinkle here is concurrent schema changes -- how can we make sure each node 
uses the same column ids for each name?  I see two possible approaches:

# embed something like Zookeeper to standardize the id map
# punt: let each node use a node-local map, and translate back and forth to 
full column name across node boundaries


 Reduce memory (and disk) space requirements with a column name/id map
 -

 Key: CASSANDRA-4175
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
 Fix For: 1.2


 We spend a lot of memory on column names, both transiently (during reads) and 
 more permanently (in the row cache).  Compression mitigates this on disk but 
 not on the heap.
 The overhead is significant for typical small column values, e.g., ints.
 Even though we intern once we get to the memtable, this affects writes too 
 via very high allocation rates in the young generation, hence more GC 
 activity.
 Now that CQL3 provides us some guarantees that column names must be defined 
 before they are inserted, we could create a map of (say) 32-bit int column 
 id, to names, and use that internally right up until we return a resultset to 
 the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4174) Unnecessary compaction happens when streaming

2012-04-19 Thread Yuki Morishita (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257852#comment-13257852
 ] 

Yuki Morishita commented on CASSANDRA-4174:
---

bq.  starting compaction as soon as I have one sstable to work on might smooth 
out the workload more.

Current version of cassandra adds sstables and submits compaction when finished 
streaming all files, not when finished 
streaming just one file. In my laptop, I bulkloaded 72 sstables to empty, 
single node cassandra and triggered compaction 9 times without the patch, in 
contrast to 3 times with patch applied.

 Unnecessary compaction happens when streaming
 -

 Key: CASSANDRA-4174
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4174
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.0.10

 Attachments: 4174-1.0.txt


 When streaming session finishes, streamed sstabls are added to CFS one by one 
 using 
 ColumnFamilyStore#addSSTable(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/streaming/StreamInSession.java#L141).
  This method submits compaction in 
 background(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L946),
  and end up with unnecessary compaction tasks behind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4174) Unnecessary compaction happens when streaming

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257890#comment-13257890
 ] 

Jonathan Ellis commented on CASSANDRA-4174:
---

I see.

So is this basically a cosmetic change then, to not have redundant tasks 
created?

If so, I think I'd rather commit to 1.1.1.

 Unnecessary compaction happens when streaming
 -

 Key: CASSANDRA-4174
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4174
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.0.10

 Attachments: 4174-1.0.txt


 When streaming session finishes, streamed sstabls are added to CFS one by one 
 using 
 ColumnFamilyStore#addSSTable(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/streaming/StreamInSession.java#L141).
  This method submits compaction in 
 background(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L946),
  and end up with unnecessary compaction tasks behind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

2012-04-19 Thread T Jake Luciani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257990#comment-13257990
 ] 

T Jake Luciani commented on CASSANDRA-4175:
---

Can't you use String.hashCode? it's portable.

 Reduce memory (and disk) space requirements with a column name/id map
 -

 Key: CASSANDRA-4175
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
 Fix For: 1.2


 We spend a lot of memory on column names, both transiently (during reads) and 
 more permanently (in the row cache).  Compression mitigates this on disk but 
 not on the heap.
 The overhead is significant for typical small column values, e.g., ints.
 Even though we intern once we get to the memtable, this affects writes too 
 via very high allocation rates in the young generation, hence more GC 
 activity.
 Now that CQL3 provides us some guarantees that column names must be defined 
 before they are inserted, we could create a map of (say) 32-bit int column 
 id, to names, and use that internally right up until we return a resultset to 
 the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258017#comment-13258017
 ] 

Jonathan Ellis commented on CASSANDRA-4175:
---

And extremely collision-prone. :)

 Reduce memory (and disk) space requirements with a column name/id map
 -

 Key: CASSANDRA-4175
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
 Fix For: 1.2


 We spend a lot of memory on column names, both transiently (during reads) and 
 more permanently (in the row cache).  Compression mitigates this on disk but 
 not on the heap.
 The overhead is significant for typical small column values, e.g., ints.
 Even though we intern once we get to the memtable, this affects writes too 
 via very high allocation rates in the young generation, hence more GC 
 activity.
 Now that CQL3 provides us some guarantees that column names must be defined 
 before they are inserted, we could create a map of (say) 32-bit int column 
 id, to names, and use that internally right up until we return a resultset to 
 the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

2012-04-19 Thread Dave Brosius (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258027#comment-13258027
 ] 

Dave Brosius commented on CASSANDRA-4175:
-

how about System.identityHashCode(string) ?

 Reduce memory (and disk) space requirements with a column name/id map
 -

 Key: CASSANDRA-4175
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
 Fix For: 1.2


 We spend a lot of memory on column names, both transiently (during reads) and 
 more permanently (in the row cache).  Compression mitigates this on disk but 
 not on the heap.
 The overhead is significant for typical small column values, e.g., ints.
 Even though we intern once we get to the memtable, this affects writes too 
 via very high allocation rates in the young generation, hence more GC 
 activity.
 Now that CQL3 provides us some guarantees that column names must be defined 
 before they are inserted, we could create a map of (say) 32-bit int column 
 id, to names, and use that internally right up until we return a resultset to 
 the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258030#comment-13258030
 ] 

Jonathan Ellis commented on CASSANDRA-4175:
---

Hashcode just isn't designed to be collision-resistant; it prioritizes speed.  
Even with a better (from the standpoint of collisions) general-purpose hash 
like Murmur, 32bits is just too small.  The smallest cryptographic hash I know 
of is md5, and ballooning to 128bits puts a serious crimp in the potential 
savings here.

 Reduce memory (and disk) space requirements with a column name/id map
 -

 Key: CASSANDRA-4175
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
 Fix For: 1.2


 We spend a lot of memory on column names, both transiently (during reads) and 
 more permanently (in the row cache).  Compression mitigates this on disk but 
 not on the heap.
 The overhead is significant for typical small column values, e.g., ints.
 Even though we intern once we get to the memtable, this affects writes too 
 via very high allocation rates in the young generation, hence more GC 
 activity.
 Now that CQL3 provides us some guarantees that column names must be defined 
 before they are inserted, we could create a map of (say) 32-bit int column 
 id, to names, and use that internally right up until we return a resultset to 
 the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4155) Make possible to authenticate user when loading data to Cassandra with BulkRecordWriter.

2012-04-18 Thread Commented

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256237#comment-13256237
 ] 

Michał Michalski commented on CASSANDRA-4155:
-

Thanks for quick response and review, Brandon! I like your idea so I'll modify 
the patch this way. It can be even done better - as ConfigHelper (of 
Configuration, to be more precise) returns null if value was not set, I don't 
have to do any if in BulkRecordWriter.prepareWriter() - I can just retrieve 
the values from config (null if not set) and pass them, so the only if will 
be in ExternalClient.init(). 

I'll change it and submit a new patch as soon as I get my Hadoop-to-Cassandra 
bulkload work once again, as I suffer some (...)ib-1-Data.db is not compatible 
with current version hc problem after moving to RC1, so I could test it, just 
to be 100% sure it's OK :) 

 Make possible to authenticate user when loading data to Cassandra with 
 BulkRecordWriter.
 

 Key: CASSANDRA-4155
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4155
 Project: Cassandra
  Issue Type: Improvement
Reporter: Michał Michalski
Priority: Minor
 Fix For: 1.1.1

 Attachments: trunk-4155-nicer-version.txt, trunk-4155.txt


 I need to use the authorization feature (i.e. provided by SimpleAuthenticator 
 and SimpleAuthority). The problem is that it's impossible now to pass the 
 credentials (cassandra.output.keyspace.username and 
 cassandra.output.keyspace.passwd) to org.apache.cassandra.hadoop.ConfigHelper 
 because of no setters for these variables. Moreover, even if they could be 
 passed, nothing will change because they are unused - ExternalClient class 
 from org.apache.cassandra.hadoop.BulkRecordWriter is not making use of them; 
 it's not even receiving them and no authorization is provided.
 The proposed improvement is to make it possible to authenticate user when 
 loading data to Cassandra with BulkRecordWriter by adding appropriate setters 
 to ConfigHelper and then passing credentials to ExternalClient class so it 
 could use it for authorization request.
 I have created a patch for this which I attach. 
 This improvement was made in the way that does not charm existing 
 ExternalClient interface usage, but I think that it would be a bit nicer way 
 to call the new constructor every time (optionally with username and password 
 set to null) in this code and keeping the old one, instead of having and 
 using two different constructors in two different cases in one method. 
 However, it's my first patch for Cassandra, so I submit a less agressive 
 one and I'm waiting for suggestions for to modify it :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4165) Generate Digest file for compressed SSTables

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256560#comment-13256560
 ] 

Jonathan Ellis commented on CASSANDRA-4165:
---

The thinking was, compressed sstables have a per-block checksum, so there's no 
need to have the less-granular sha.

 Generate Digest file for compressed SSTables
 

 Key: CASSANDRA-4165
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4165
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Priority: Minor
 Attachments: 0001-Generate-digest-for-compressed-files-as-well.patch


 We use the generated *Digest.sha1-files to verify backups, would be nice if 
 they were generated for compressed sstables as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4165) Generate Digest file for compressed SSTables

2012-04-18 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256566#comment-13256566
 ] 

Marcus Eriksson commented on CASSANDRA-4165:


yes, but when building external tools (like our backup validation thing), it 
would be nice to not have special cases for compressed cfs


 Generate Digest file for compressed SSTables
 

 Key: CASSANDRA-4165
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4165
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Priority: Minor
 Attachments: 0001-Generate-digest-for-compressed-files-as-well.patch


 We use the generated *Digest.sha1-files to verify backups, would be nice if 
 they were generated for compressed sstables as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256631#comment-13256631
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

Trivium: this was our oldest open issue.

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: jmx, lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff, cf_snapshots_556_2.diff, 
 cf_snapshots_556_2A.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-18 Thread Robert Coli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256705#comment-13256705
 ] 

Robert Coli commented on CASSANDRA-4162:


 Restarting with -Dcassandra.join_ring=false will do that.

It will also result in the paying of sizable startup penalty, far more severe 
in Cassandra than in most other databases. I can only speak for myself, but I 
don't want to pay a startup penalty (which can in real world be, say, a half 
hour of clock time!) if I don't have to. I think most operators who use 
disablegossip and disablethrift have a goal of removing a node from the 
cluster while keeping it running, in order to avoid this startup penalty.

While I now understand that dead has a very specific meaning in cassandra 
which relates only to Gossip state, I think it is unambiguous that, given the 
typical semantic meaning of dead and alive, people do not expect a dead 
node to be accepting writes. As explicated in The Princess Bride, there is a 
significant difference between mostly dead and all dead.


Miracle Max: Whoo-hoo-hoo, look who knows so much. It just so happens that your 
friend here is only MOSTLY dead. There's a big difference between mostly dead 
and all dead. Mostly dead is slightly alive. With all dead, well, with all dead 
there's usually only one thing you can do. 

Inigo Montoya: What's that? 

Miracle Max: Go through his clothes and look for loose change.


My goal with this ticket is to establish the best practice for an operator who 
wants to make sure his node is not receiving traffic, but is still up and 
capable of compacting or rejoining the cluster without paying startup penalty. 
It seems so far that the best solution is to use iptables to firewall off port 
7000. 

It is difficult to understand the purpose of disablethrift and 
disablegossip if the combination of the two does not render the node all 
dead. I believe most operators will expect them to render a node all dead. 
At the very minimum, it seems inappropriate to state in the help that nodetool 
disablegossip renders a node dead when in fact it renders it mostly dead.

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2246) Enable Pig to use indexed data as described in CASSANDRA-2245

2012-04-18 Thread Dmitriy V. Ryaboy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256770#comment-13256770
 ] 

Dmitriy V. Ryaboy commented on CASSANDRA-2246:
--

FYI I suspect you could do this much nicer by making use of Pig's predicate 
pushdown via LoadMetadata.getPartitionKeys (supply everything you have an index 
on) and LoadMetadata.setPartitionFilter (apply the selection by using secondary 
indexes). 

 Enable Pig to use indexed data as described in CASSANDRA-2245
 -

 Key: CASSANDRA-2246
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2246
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.7.2
Reporter: Matt Kennedy
Assignee: Brandon Williams
Priority: Minor
  Labels: hadoop
 Fix For: 1.1.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 in contrib/pig, add query parameters to CassandraStorage keyspace/column 
 family string to specify column search predicates.
 For example:
 rows = LOAD 'cassandra://mykeyspace/mycolumnfamily?country=UK' using 
 CassandraStorage();
 This depends on CASSANDRA-1600

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256774#comment-13256774
 ] 

Jonathan Ellis commented on CASSANDRA-4162:
---

If you're hung up on the nodetool help description, let's fix that.  
Fundamentally disablegossip disables gossip.  That's all.  It's not intended 
to, nor should it, stop all network traffic dead in the water.  I've already 
explained why that is, and brandon and eldon have given workarounds for when 
you really do want to do that.

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256782#comment-13256782
 ] 

Jonathan Ellis commented on CASSANDRA-4162:
---

Incidentally, startup is slow is definitely on our radar. We're looking at 
that in CASSANDRA-2392 and others.

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256793#comment-13256793
 ] 

Jonathan Ellis commented on CASSANDRA-3762:
---

bq. If we want to see the optimal solution for all the use cases i think we 
have to go for the alternative where we can save the Keycache position to the 
disk and read it back and what ever is missing let it fault fill.

I like this idea.  If you have a lot of rows (i.e., a large index) then this is 
the only thing that's going to save you from doing random i/o.

The only downside I see is the question of how much churn your sstables will 
experience between save, and load.  If you have a small data set that is 
constantly being overwritten for instance, you could basically invalidate the 
whole cache.  But, it's quite possible that just reducing cache save period is 
adequate to address this.

 AutoSaving KeyCache and System load time improvements.
 --

 Key: CASSANDRA-3762
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-SavedKeyCache-load-time-improvements.patch


 CASSANDRA-2392 saves the index summary to the disk... but when we have saved 
 cache we will still scan through the index to get the data out.
 We might be able to separate this from SSTR.load and let it load the index 
 summary, once all the SST's are loaded we might be able to check the 
 bloomfilter and do a random IO on fewer Index's to populate the KeyCache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256804#comment-13256804
 ] 

Jonathan Ellis commented on CASSANDRA-2392:
---

For the record, I'm still fine with saying loading caches will slow down 
startup, deal with it, but I think we have a good plan of attack on 3762 now 
and it may be simpler to just do that first, before rebasing this.  Which is 
also fine.

 Saving IndexSummaries to disk
 -

 Key: CASSANDRA-2392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-2392-v6.patch, 
 0001-re-factor-first-and-last.patch, 0001-save-summaries-to-disk-v4.patch, 
 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk-v2.patch, 
 0002-save-summaries-to-disk-v3.patch, 0002-save-summaries-to-disk.patch, 
 CASSANDRA-2392-v5.patch


 For nodes with millions of keys, doing rolling restarts that take over 10 
 minutes per node can be painful if you have 100 node cluster. All of our time 
 is spent on doing index summary computations on startup. It would be great if 
 we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-04-18 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256813#comment-13256813
 ] 

Pavel Yaskevich commented on CASSANDRA-3762:


It seems like saving cache's data positions (in combination with SSTable index 
summaries) to the disk to make it independent from the sstable loading is only 
viable solution we have.

 AutoSaving KeyCache and System load time improvements.
 --

 Key: CASSANDRA-3762
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-SavedKeyCache-load-time-improvements.patch


 CASSANDRA-2392 saves the index summary to the disk... but when we have saved 
 cache we will still scan through the index to get the data out.
 We might be able to separate this from SSTR.load and let it load the index 
 summary, once all the SST's are loaded we might be able to check the 
 bloomfilter and do a random IO on fewer Index's to populate the KeyCache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256817#comment-13256817
 ] 

Jonathan Ellis commented on CASSANDRA-2864:
---

Is the original description here still an accurate guide to the approach taken?

 Alternative Row Cache Implementation
 

 Key: CASSANDRA-2864
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2864
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Daniel Doubleday
Assignee: Daniel Doubleday
Priority: Minor

 we have been working on an alternative implementation to the existing row 
 cache(s)
 We have 2 main goals:
 - Decrease memory - get more rows in the cache without suffering a huge 
 performance penalty
 - Reduce gc pressure
 This sounds a lot like we should be using the new serializing cache in 0.8. 
 Unfortunately our workload consists of loads of updates which would 
 invalidate the cache all the time.
 The second unfortunate thing is that the idea we came up with doesn't fit the 
 new cache provider api...
 It looks like this:
 Like the serializing cache we basically only cache the serialized byte 
 buffer. we don't serialize the bloom filter and try to do some other minor 
 compression tricks (var ints etc not done yet). The main difference is that 
 we don't deserialize but use the normal sstable iterators and filters as in 
 the regular uncached case.
 So the read path looks like this:
 return filter.collectCollatedColumns(memtable iter, cached row iter)
 The write path is not affected. It does not update the cache
 During flush we merge all memtable updates with the cached rows.
 The attached patch is based on 0.8 branch r1143352
 It does not replace the existing row cache but sits aside it. Theres 
 environment switch to choose the implementation. This way it is easy to 
 benchmark performance differences.
 -DuseSSTableCache=true enables the alternative cache. It shares its 
 configuration with the standard row cache. So the cache capacity is shared. 
 We have duplicated a fair amount of code. First we actually refactored the 
 existing sstable filter / reader but than decided to minimize dependencies. 
 Also this way it is easy to customize serialization for in memory sstable 
 rows. 
 We have also experimented a little with compression but since this task at 
 this stage is mainly to kick off discussion we wanted to keep things simple. 
 But there is certainly room for optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256832#comment-13256832
 ] 

Jonathan Ellis commented on CASSANDRA-2864:
---

If so, how do you avoid scanning the sstables?  Does this only work on 
named-column queries?  That is, if I ask for a slice from X to Y, if you have 
data in your cache for X1 X2, how do you know there is not also an X3 on disk 
somewhere?

 Alternative Row Cache Implementation
 

 Key: CASSANDRA-2864
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2864
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Daniel Doubleday
Assignee: Daniel Doubleday
Priority: Minor

 we have been working on an alternative implementation to the existing row 
 cache(s)
 We have 2 main goals:
 - Decrease memory - get more rows in the cache without suffering a huge 
 performance penalty
 - Reduce gc pressure
 This sounds a lot like we should be using the new serializing cache in 0.8. 
 Unfortunately our workload consists of loads of updates which would 
 invalidate the cache all the time.
 The second unfortunate thing is that the idea we came up with doesn't fit the 
 new cache provider api...
 It looks like this:
 Like the serializing cache we basically only cache the serialized byte 
 buffer. we don't serialize the bloom filter and try to do some other minor 
 compression tricks (var ints etc not done yet). The main difference is that 
 we don't deserialize but use the normal sstable iterators and filters as in 
 the regular uncached case.
 So the read path looks like this:
 return filter.collectCollatedColumns(memtable iter, cached row iter)
 The write path is not affected. It does not update the cache
 During flush we merge all memtable updates with the cached rows.
 The attached patch is based on 0.8 branch r1143352
 It does not replace the existing row cache but sits aside it. Theres 
 environment switch to choose the implementation. This way it is easy to 
 benchmark performance differences.
 -DuseSSTableCache=true enables the alternative cache. It shares its 
 configuration with the standard row cache. So the cache capacity is shared. 
 We have duplicated a fair amount of code. First we actually refactored the 
 existing sstable filter / reader but than decided to minimize dependencies. 
 Also this way it is easy to customize serialization for in memory sstable 
 rows. 
 We have also experimented a little with compression but since this task at 
 this stage is mainly to kick off discussion we wanted to keep things simple. 
 But there is certainly room for optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256859#comment-13256859
 ] 

Jonathan Ellis commented on CASSANDRA-2864:
---

I think you might need to write that book, because the commit history is tough 
to follow. :)

 Alternative Row Cache Implementation
 

 Key: CASSANDRA-2864
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2864
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Daniel Doubleday
Assignee: Daniel Doubleday
Priority: Minor

 we have been working on an alternative implementation to the existing row 
 cache(s)
 We have 2 main goals:
 - Decrease memory - get more rows in the cache without suffering a huge 
 performance penalty
 - Reduce gc pressure
 This sounds a lot like we should be using the new serializing cache in 0.8. 
 Unfortunately our workload consists of loads of updates which would 
 invalidate the cache all the time.
 The second unfortunate thing is that the idea we came up with doesn't fit the 
 new cache provider api...
 It looks like this:
 Like the serializing cache we basically only cache the serialized byte 
 buffer. we don't serialize the bloom filter and try to do some other minor 
 compression tricks (var ints etc not done yet). The main difference is that 
 we don't deserialize but use the normal sstable iterators and filters as in 
 the regular uncached case.
 So the read path looks like this:
 return filter.collectCollatedColumns(memtable iter, cached row iter)
 The write path is not affected. It does not update the cache
 During flush we merge all memtable updates with the cached rows.
 The attached patch is based on 0.8 branch r1143352
 It does not replace the existing row cache but sits aside it. Theres 
 environment switch to choose the implementation. This way it is easy to 
 benchmark performance differences.
 -DuseSSTableCache=true enables the alternative cache. It shares its 
 configuration with the standard row cache. So the cache capacity is shared. 
 We have duplicated a fair amount of code. First we actually refactored the 
 existing sstable filter / reader but than decided to minimize dependencies. 
 Also this way it is easy to customize serialization for in memory sstable 
 rows. 
 We have also experimented a little with compression but since this task at 
 this stage is mainly to kick off discussion we wanted to keep things simple. 
 But there is certainly room for optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation

2012-04-18 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256879#comment-13256879
 ] 

Vijay commented on CASSANDRA-2864:
--

Hi Jonathan, When there is a write for X3 we invalidate/update the cache and 
the next fetch does the FS scan and populates the cache after it is out of the 
cache (it is similar to the page cache and if there is a write on the block the 
whole block is marked dirty and next fetch will go to the FS). there is a 
configurable block size when set high enough will cache the whole row (like the 
existing cache). The logic around it is kind of what the patch has

 I think you might need to write that book, because the commit history is 
 tough to follow
Yeah just wrote a prototype hence... :) I can it up if we agree on that 
approach.

 Alternative Row Cache Implementation
 

 Key: CASSANDRA-2864
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2864
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Daniel Doubleday
Assignee: Daniel Doubleday
Priority: Minor

 we have been working on an alternative implementation to the existing row 
 cache(s)
 We have 2 main goals:
 - Decrease memory - get more rows in the cache without suffering a huge 
 performance penalty
 - Reduce gc pressure
 This sounds a lot like we should be using the new serializing cache in 0.8. 
 Unfortunately our workload consists of loads of updates which would 
 invalidate the cache all the time.
 The second unfortunate thing is that the idea we came up with doesn't fit the 
 new cache provider api...
 It looks like this:
 Like the serializing cache we basically only cache the serialized byte 
 buffer. we don't serialize the bloom filter and try to do some other minor 
 compression tricks (var ints etc not done yet). The main difference is that 
 we don't deserialize but use the normal sstable iterators and filters as in 
 the regular uncached case.
 So the read path looks like this:
 return filter.collectCollatedColumns(memtable iter, cached row iter)
 The write path is not affected. It does not update the cache
 During flush we merge all memtable updates with the cached rows.
 The attached patch is based on 0.8 branch r1143352
 It does not replace the existing row cache but sits aside it. Theres 
 environment switch to choose the implementation. This way it is easy to 
 benchmark performance differences.
 -DuseSSTableCache=true enables the alternative cache. It shares its 
 configuration with the standard row cache. So the cache capacity is shared. 
 We have duplicated a fair amount of code. First we actually refactored the 
 existing sstable filter / reader but than decided to minimize dependencies. 
 Also this way it is easy to customize serialization for in memory sstable 
 rows. 
 We have also experimented a little with compression but since this task at 
 this stage is mainly to kick off discussion we wanted to keep things simple. 
 But there is certainly room for optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation

2012-04-18 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256924#comment-13256924
 ] 

Vijay commented on CASSANDRA-2864:
--

Wrote comments thinking it was a diffrent ticket hence removed the comments...

 Alternative Row Cache Implementation
 

 Key: CASSANDRA-2864
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2864
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Daniel Doubleday
Assignee: Daniel Doubleday
Priority: Minor

 we have been working on an alternative implementation to the existing row 
 cache(s)
 We have 2 main goals:
 - Decrease memory - get more rows in the cache without suffering a huge 
 performance penalty
 - Reduce gc pressure
 This sounds a lot like we should be using the new serializing cache in 0.8. 
 Unfortunately our workload consists of loads of updates which would 
 invalidate the cache all the time.
 The second unfortunate thing is that the idea we came up with doesn't fit the 
 new cache provider api...
 It looks like this:
 Like the serializing cache we basically only cache the serialized byte 
 buffer. we don't serialize the bloom filter and try to do some other minor 
 compression tricks (var ints etc not done yet). The main difference is that 
 we don't deserialize but use the normal sstable iterators and filters as in 
 the regular uncached case.
 So the read path looks like this:
 return filter.collectCollatedColumns(memtable iter, cached row iter)
 The write path is not affected. It does not update the cache
 During flush we merge all memtable updates with the cached rows.
 The attached patch is based on 0.8 branch r1143352
 It does not replace the existing row cache but sits aside it. Theres 
 environment switch to choose the implementation. This way it is easy to 
 benchmark performance differences.
 -DuseSSTableCache=true enables the alternative cache. It shares its 
 configuration with the standard row cache. So the cache capacity is shared. 
 We have duplicated a fair amount of code. First we actually refactored the 
 existing sstable filter / reader but than decided to minimize dependencies. 
 Also this way it is easy to customize serialization for in memory sstable 
 rows. 
 We have also experimented a little with compression but since this task at 
 this stage is mainly to kick off discussion we wanted to keep things simple. 
 But there is certainly room for optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4169) Locale settings on windows can break schema

2012-04-18 Thread Dave Brosius (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256929#comment-13256929
 ] 

Dave Brosius commented on CASSANDRA-4169:
-

even something silly like mystring.toUpperCase() would access the locale. and 
turkish is notorious for creating unsuspected characters when upper/lower 
casing a string (adding crazy accents, etc). 

 Locale settings on windows can break schema
 ---

 Key: CASSANDRA-4169
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4169
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows
Reporter: Nick Bailey
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.2


 The locale settings on windows can somehow affect how schema information is 
 either saved or loaded. When setting locale/language settings to Turkish, and 
 then starting cassandra, schema changes can be made successfully. When 
 restarting cassandra though, the following error is seen:
 {noformat}
 INFO [main] 2012-04-18 19:18:59,142 DatabaseDescriptor.java (line 501) 
 Loading schema version 4404f2e0-898b-11e1--242d50cf1fbf
  ERROR [main] 2012-04-18 19:18:59,391 AbstractCassandraDaemon.java (line 373) 
 Exception encountered during startup
  org.apache.avro.SchemaParseException: strıng is not a defined name. The 
 type of the name field must be a defined name or a {type: ...} expression.
   at org.apache.avro.Schema.parse(Schema.java:986)
   at org.apache.avro.Schema.parse(Schema.java:893)
   at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:90)
   at 
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:502)
   at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:180)
   at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
   at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
 {noformat}
 This was reported on the DataStax forums, as well as reproduced by myself. 
 http://www.datastax.com/support-forums/topic/cassandra-service-doesnt-start

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256949#comment-13256949
 ] 

Vijay commented on CASSANDRA-1956:
--

Hi Jonathan, When user requests X1,Y1 we cache the block from 
(Start-X1-Position -Y1-Position) + N with block size being configurable. 
(it should similar to the page cache and if there is a write on the block the 
whole block is marked dirty and next fetch will go to the FS). there is a 
configurable block size when set high enough will cache the whole row (like the 
existing cache). The logic around it is kind of what the patch has

We can also use column indexes (if needed) and caching data within it, but the 
simple may be is to start from the column requested and cache blocks (else 
updating them cache without invalidating the whole row is going to be hard).

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256954#comment-13256954
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

How is that different from the query cache I waved my hands about?


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4163) CQL3 ALTER TABLE command causes NPE

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256957#comment-13256957
 ] 

Jonathan Ellis commented on CASSANDRA-4163:
---

LGTM.

Nit: could we initialize properties to an empty map to avoid having to 
null-check it?

 CQL3 ALTER TABLE command causes NPE
 ---

 Key: CASSANDRA-4163
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4163
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: INFO 16:07:11,757 Cassandra version: 1.1.0-rc1-SNAPSHOT
 INFO 16:07:11,757 Thrift API version: 19.30.0
 INFO 16:07:11,758 CQL supported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0)
Reporter: Kristine Hahn
Assignee: paul cannon
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4163.patch.txt


 To reproduce the problem:
 ./cqlsh --cql3
 Connected to Test Cluster at localhost:9160.
 [cqlsh 2.2.0 | Cassandra 1.1.0-rc1-SNAPSHOT | CQL spec 3.0.0 | Thrift 
 protocol 19.30.0]
 Use HELP for help.
 cqlsh CREATE KEYSPACE test34 WITH strategy_class = 
 'org.apache.cassandra.locator.SimpleStrategy' AND 
 strategy_options:replication_factor='1';
 cqlsh USE test34;
 cqlsh:test34 CREATE TABLE users (
   ... password varchar,
   ... gender varchar,
   ... session_token varchar,
   ... state varchar,
   ... birth_year bigint,
   ... pk varchar,
   ... PRIMARY KEY (pk)
   ... );
 cqlsh:test34 ALTER TABLE users ADD coupon_code varchar;
 TSocket read 0 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4163) CQL3 ALTER TABLE command causes NPE

2012-04-18 Thread paul cannon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256959#comment-13256959
 ] 

paul cannon commented on CASSANDRA-4163:


We could, in the grammar definition, but that attribute is only ever updated 
with an assignment, so the empty map would be consed for nothing. Hardly a big 
deal either way though.

 CQL3 ALTER TABLE command causes NPE
 ---

 Key: CASSANDRA-4163
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4163
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: INFO 16:07:11,757 Cassandra version: 1.1.0-rc1-SNAPSHOT
 INFO 16:07:11,757 Thrift API version: 19.30.0
 INFO 16:07:11,758 CQL supported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0)
Reporter: Kristine Hahn
Assignee: paul cannon
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4163.patch.txt


 To reproduce the problem:
 ./cqlsh --cql3
 Connected to Test Cluster at localhost:9160.
 [cqlsh 2.2.0 | Cassandra 1.1.0-rc1-SNAPSHOT | CQL spec 3.0.0 | Thrift 
 protocol 19.30.0]
 Use HELP for help.
 cqlsh CREATE KEYSPACE test34 WITH strategy_class = 
 'org.apache.cassandra.locator.SimpleStrategy' AND 
 strategy_options:replication_factor='1';
 cqlsh USE test34;
 cqlsh:test34 CREATE TABLE users (
   ... password varchar,
   ... gender varchar,
   ... session_token varchar,
   ... state varchar,
   ... birth_year bigint,
   ... pk varchar,
   ... PRIMARY KEY (pk)
   ... );
 cqlsh:test34 ALTER TABLE users ADD coupon_code varchar;
 TSocket read 0 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256965#comment-13256965
 ] 

Vijay commented on CASSANDRA-1956:
--

:) Quite similar but a different version with less memory foot print, and 
efficient updates.

1) For example if there are 10 columns which are queried the key will have 
those names and in the CF return object;
2) If we have 2 kinds of queries with over laps (slice and column names) then 
we will be caching twice or sometimes more in a pure query cache;
3) If we have an update to a column out of 10 columns we have to search to 
see if they are available in them or invalidate the whole row. in this way we 
can update the block and be done with it.

This also allows us to incrementally deserialize some parts of the row when the 
whole row is cached. *


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4163) CQL3 ALTER TABLE command causes NPE

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256970#comment-13256970
 ] 

Jonathan Ellis commented on CASSANDRA-4163:
---

Yeah, I don't see this as performance critical so I'd rather go for cleanliness.

 CQL3 ALTER TABLE command causes NPE
 ---

 Key: CASSANDRA-4163
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4163
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: INFO 16:07:11,757 Cassandra version: 1.1.0-rc1-SNAPSHOT
 INFO 16:07:11,757 Thrift API version: 19.30.0
 INFO 16:07:11,758 CQL supported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0)
Reporter: Kristine Hahn
Assignee: paul cannon
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4163.patch.txt


 To reproduce the problem:
 ./cqlsh --cql3
 Connected to Test Cluster at localhost:9160.
 [cqlsh 2.2.0 | Cassandra 1.1.0-rc1-SNAPSHOT | CQL spec 3.0.0 | Thrift 
 protocol 19.30.0]
 Use HELP for help.
 cqlsh CREATE KEYSPACE test34 WITH strategy_class = 
 'org.apache.cassandra.locator.SimpleStrategy' AND 
 strategy_options:replication_factor='1';
 cqlsh USE test34;
 cqlsh:test34 CREATE TABLE users (
   ... password varchar,
   ... gender varchar,
   ... session_token varchar,
   ... state varchar,
   ... birth_year bigint,
   ... pk varchar,
   ... PRIMARY KEY (pk)
   ... );
 cqlsh:test34 ALTER TABLE users ADD coupon_code varchar;
 TSocket read 0 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256975#comment-13256975
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Does this support caching head/tail queries?  Or do X and Y have to be existing 
column values?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256983#comment-13256983
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Also, it sounds like this always invalidates on update.  Would it be possible 
to preserve the current row cache behavior?  I.e., update-in-place if a 
non-copying cache implementation.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256992#comment-13256992
 ] 

Vijay commented on CASSANDRA-1956:
--

 Does this support caching head/tail queries? Or do X and Y have to be 
 existing column values?
No X and Y doesn't need to existing, they are just markers in the RowCacheKey 
(for example if the query has x* - y* we will have that in the RCK instead of 
xeon - yum)... It does support head and tail queries.

  it sounds like this always invalidates on update. Would it be possible to 
 preserve the current row cache behavior?
Yeah the prototype does the update on write, but the problem is that when there 
are a lot of updates block size will increase then initially cached, at some 
point we need to split/re-partition it...

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4138) Add varint encoding to Serializing Cache

2012-04-18 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257089#comment-13257089
 ] 

Pavel Yaskevich commented on CASSANDRA-4138:


Ok, let's give a Jonathan chance to make a final look.

 Add varint encoding to Serializing Cache
 

 Key: CASSANDRA-4138
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4138
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4138-Take1.patch, 
 0001-CASSANDRA-4138-V2.patch, 0001-CASSANDRA-4138-v4.patch, 
 0002-sizeof-changes-on-rest-of-the-code.patch, CASSANDRA-4138-v3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-17 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255359#comment-13255359
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

Sure I understand. What I'm saying is that the initial discussion was about 
removing a setting. But we can't. One of the setting is about how sparse is the 
main sstable index (it's already sparse in fact, we're not indexing every 
column, but every min(1 row, column_index_size_in_kb)), the other is how sparse 
is the in-memory summary of the on-disk sparse index.

 Promote row index
 -

 Key: CASSANDRA-2319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Sylvain Lebresne
  Labels: index, timeseries
 Fix For: 1.2

 Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
 version-g-lzf.txt, version-g.txt


 The row index contains entries for configurably sized blocks of a wide row. 
 For a row of appreciable size, the row index ends up directing the third seek 
 (1. index, 2. row index, 3. content) to nearby the first column of a scan.
 Since the row index is always used for wide rows, and since it contains 
 information that tells us whether or not the 3rd seek is necessary (the 
 column range or name we are trying to slice may not exist in a given 
 sstable), promoting the row index into the sstable index would allow us to 
 drop the maximum number of seeks for wide rows back to 2, and, more 
 importantly, would allow sstables to be eliminated using only the index.
 An example usecase that benefits greatly from this change is time series data 
 in wide rows, where data is appended to the beginning or end of the row. Our 
 existing compaction strategy gets lucky and clusters the oldest data in the 
 oldest sstables: for queries to recently appended data, we would be able to 
 eliminate wide rows using only the sstable index, rather than needing to seek 
 into the data file to determine that it isn't interesting. For narrow rows, 
 this change would have no effect, as they will not reach the threshold for 
 indexing anyway.
 A first cut design for this change would look very similar to the file format 
 design proposed on #674: 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
 column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3982) Explore not returning range ghosts

2012-04-17 Thread Christoph Tavan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255463#comment-13255463
 ] 

Christoph Tavan commented on CASSANDRA-3982:


When working on the helenus driver for node.js I was stumbling upon this 
problem recently. I was experiencing that for static column families I was 
getting not only row- but also column-ghosts. See this example that uses 
{{cqlsh -3}} from the 1.1.0-rc1 release:

{code}
$ cqlsh -3
Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.1.0~rc1 | CQL spec 3.0.0 | Thrift protocol 19.30.0]
Use HELP for help.
cqlsh CREATE KEYSPACE helenus_cql3_test WITH strategy_class=SimpleStrategy AND 
strategy_options:replication_factor=1;
cqlsh USE helenus_cql3_test ;
cqlsh:helenus_cql3_test CREATE COLUMNFAMILY cql_test (id text, foo text, 
PRIMARY KEY (id));
cqlsh:helenus_cql3_test UPDATE cql_test SET foo='bar' WHERE id='foobar';
cqlsh:helenus_cql3_test SELECT * FROM cql_test;
 id | foo
+-
 foobar | bar

cqlsh:helenus_cql3_test DELETE FROM cql_test WHERE id='foobar';
cqlsh:helenus_cql3_test SELECT * FROM cql_test;
 id | foo
+--
 foobar | null
{code}

As you can see the result contains not only the primary key (i.e. the row key) 
as a ghost, but all columns that have been defined in the schema are also 
contained with a value of null.

I think it would be highly desirable if ghosts would just never pop up in any 
CQL result.

 Explore not returning range ghosts
 --

 Key: CASSANDRA-3982
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3982
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
 Fix For: 1.2


 This ticket proposes to remove range ghosts in CQL3.
 The basic argument is that range ghosts confuses users a lot and don't add 
 any value since range ghost don't allow to distinguish between the two 
 following case:
 * the row is deleted
 * the row is not deleted but don't have data for the provided filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4151) Apache project branding requirements: DOAP file [PATCH]

2012-04-17 Thread Shane Curcuru (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255542#comment-13255542
 ] 

Shane Curcuru commented on CASSANDRA-4151:
--

Please feel free to change as makes sense for the project - I just want to 
ensure that every PMC has a DOAP checked in, so we can start using projects.a.o 
as a real resource.  Thanks!

 Apache project branding requirements: DOAP file [PATCH]
 ---

 Key: CASSANDRA-4151
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4151
 Project: Cassandra
  Issue Type: Improvement
Reporter: Shane Curcuru
  Labels: branding
 Attachments: doap_Cassandra.rdf


 Attached.  Re: http://www.apache.org/foundation/marks/pmcs
 See Also: http://projects.apache.org/create.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4159) isReadyForBootstrap doesn't compare schema UUID by timestamp as it should

2012-04-17 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255652#comment-13255652
 ] 

Brandon Williams commented on CASSANDRA-4159:
-

+1

 isReadyForBootstrap doesn't compare schema UUID by timestamp as it should
 -

 Key: CASSANDRA-4159
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4159
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 1.0.10

 Attachments: 4159.txt


 CASSANDRA-3629 introduced a wait to be sure the node is up to date on the 
 schema before starting bootstrap. However, the isReadyForBootsrap() method 
 compares schema version using UUID.compareTo(), which doesn't compare UUID by 
 timestamp, while the rest of the code does compare using timestamp 
 (MigrationManager.updateHighestKnown).
 During a test where lots of node were boostrapped simultaneously (and some 
 schema change were done), we ended up having some node stuck in the 
 isReadyForBoostrap loop. Restarting the node fixed it, so while I can't 
 confirm it, I suspect this was the source of that problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-17 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255825#comment-13255825
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

bq. which makes tuning complicated

No, you're right, that part didn't change. However a very easy fix would be to 
turn as I said above index_interval into index_interval_in_kb. Without deeper 
changes, the index interval would have to include at least one key (i.e. the 
index of one key), but aside from that it would give pretty much what we want 
in term of making tuning work as well for narrow and wide row. And that's a 
very simple change. Which doesn't preclude doing deeper changes later if we so 
wish, but in the meantime would give use 91.32% (my best guess) of the 
benefits. We could even rename the settings to say disk_index_interval_in_kb 
and memory_index_interval_in_kb (and internally both would really mean that the 
interval is min({disk,memory}_index_interval_in_kb, 1 row)).

 Promote row index
 -

 Key: CASSANDRA-2319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Sylvain Lebresne
  Labels: index, timeseries
 Fix For: 1.2

 Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
 version-g-lzf.txt, version-g.txt


 The row index contains entries for configurably sized blocks of a wide row. 
 For a row of appreciable size, the row index ends up directing the third seek 
 (1. index, 2. row index, 3. content) to nearby the first column of a scan.
 Since the row index is always used for wide rows, and since it contains 
 information that tells us whether or not the 3rd seek is necessary (the 
 column range or name we are trying to slice may not exist in a given 
 sstable), promoting the row index into the sstable index would allow us to 
 drop the maximum number of seeks for wide rows back to 2, and, more 
 importantly, would allow sstables to be eliminated using only the index.
 An example usecase that benefits greatly from this change is time series data 
 in wide rows, where data is appended to the beginning or end of the row. Our 
 existing compaction strategy gets lucky and clusters the oldest data in the 
 oldest sstables: for queries to recently appended data, we would be able to 
 eliminate wide rows using only the sstable index, rather than needing to seek 
 into the data file to determine that it isn't interesting. For narrow rows, 
 this change would have no effect, as they will not reach the threshold for 
 indexing anyway.
 A first cut design for this change would look very similar to the file format 
 design proposed on #674: 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
 column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3946) BulkRecordWriter shouldn't stream any empty data/index files that might be created at end of flush

2012-04-17 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255883#comment-13255883
 ] 

Jonathan Ellis commented on CASSANDRA-3946:
---

LGTM.  Could you also post a version against 1.0, Yuki?

 BulkRecordWriter shouldn't stream any empty data/index files that might be 
 created at end of flush
 --

 Key: CASSANDRA-3946
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3946
 Project: Cassandra
  Issue Type: Bug
Reporter: Chris Goffinet
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.1.1

 Attachments: 0001-Abort-SSTableWriter-when-exception-occured.patch, 
 0001-CASSANDRA-3946-BulkRecordWriter-shouldn-t-stream-any.patch


 If by chance, we flush sstables during BulkRecordWriter (we have seen it 
 happen), I want to make sure we don't try to stream them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3909) Pig should handle wide rows

2012-04-17 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255954#comment-13255954
 ] 

Brandon Williams commented on CASSANDRA-3909:
-

Sylvain, any reason we can't put this in 1.1.0?  It has to be explicitly 
enabled so it can't break anything existing, and it goes well with the hadoop 
wide row support we already put in 1.1.0.

 Pig should handle wide rows
 ---

 Key: CASSANDRA-3909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1

 Attachments: 3909.txt


 Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3909) Pig should handle wide rows

2012-04-17 Thread Matthew F. Dennis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255956#comment-13255956
 ] 

Matthew F. Dennis commented on CASSANDRA-3909:
--

+1 on inclusion in 1.1.0 (and if not, ASAP after 1.1.0)

 Pig should handle wide rows
 ---

 Key: CASSANDRA-3909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1

 Attachments: 3909.txt


 Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-17 Thread Aaron Morton (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255970#comment-13255970
 ] 

Aaron Morton commented on CASSANDRA-4162:
-

Disabling thrift and gossip is seen as a way to isolate a node from clients and 
the other nodes. If it does not stop an in progress HH is there another 
approach we can use to effectively remove a running node from the ring?

In this case the reporter assumed that since all the other nodes saw the node 
as down they would stop talking to it.

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-17 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255976#comment-13255976
 ] 

Jonathan Ellis commented on CASSANDRA-4162:
---

I can easily think of a scenario where you want to let the HH complete (e.g., 
you only want up to date nodes serving reads) but having trouble thinking of 
a scenario for the other way around. So no, I don't think that's a good general 
rule...

(If you want it completely cut off ISTM you should kill it and bring it back up 
without joining the ring.)

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-17 Thread Eldon Stegall (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255993#comment-13255993
 ] 

Eldon Stegall commented on CASSANDRA-4162:
--

I am a bit fuzzy on the internals of when a HH session starts and stops. 
However, I have seen similar behavior, and specifically in situations where a 
very intensive, very long-running compaction is occuring, some sort of 
thrashing appears to happen, and neither the HH nor the compaction finish. In a 
situation (perhaps an edge case) where you want to isolate a node in order to 
let a very-long-running compaction to complete, you may not want to kill and 
restart the node, as that could dramatically increase your time to rejoin the 
ring (particularly if you have already finished a significant portion of the 
compaction). I just shut it all off with iptables like so:
sudo iptables -A INPUT -p tcp --dport 7000 -j DROP
sudo iptables -A INPUT -p tcp --dport 9160 -j DROP
sudo iptables -A OUTPUT -p tcp --dport 9160 -j DROP
sudo iptables -A OUTPUT -p tcp --dport 7000 -j DROP

It's not pretty, but it works, and I think maybe it all goes away with leveldb, 
if only I had the cycles to switch us to that. Forgive me if this seems odd, I 
have had my head out of cassandra for a little while now. My 2 cents.

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-17 Thread Paul Lathrop (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256126#comment-13256126
 ] 

Paul Lathrop commented on CASSANDRA-4162:
-

If you are not going to actually effectively mark the node dead you shouldn't 
advertise it as such in the nodetool documentation.

This definitely violates the principle of least surprise, in my opinion. At a 
bare minimum the docs should be updated. However, it would be good to go the 
next step and actually support the use case that production users actually 
encounter instead of dismissing it because you can't think of a scenario where 
you'd use it.

Put another way:

As an operator of a cassandra cluster, I want a reliable way to remove a node 
from the cluster and disable traffic to it, so that I can diagnose problems 
with the node while keeping it from participating in the cluster. No, iptables 
is not the correct answer to this use case.

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-16 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254684#comment-13254684
 ] 

Sylvain Lebresne commented on CASSANDRA-4004:
-

bq. but I don't think it makes sense for CQL.

Why wouldn't it?  The notion of what is largest or smallest only make sense 
once you've defined what ordering you're talking about (what I'm calling the 
logical ordering). We do still allow in CQL custom orderings (which is useful), 
so why giving a simple syntax to define the reverse ordering or an existing one 
wouldn't make sense? With my first patch, ORDER BY X DESC does *always* 
return the largest X first, given the ordering.

bq. but it shouldn't change the semantics of the query itself

To be precise, it doesn't change the semantic of the query, it changes the 
logical ordering (which happens to be the same than the physical one but that 
last part is an implementation detail) of records in the table.


Now, looking more closely at the alternative of keeping the logical ordering 
unchanged but changing the physical ordering (in order to get faster reversed 
queries), I think this just doesn't work. And by doesn't work, I mean that as 
soon as we have composites, it would be costly to implement (making it 
useless). Typically, suppose you follow that idea and declare:
{noformat}
CREATE TABLE timeseries (
  key text,
  kind int,
  time timestamp,
  value text,
  PRIMARY KEY (key, kind, time)
) WITH CLUSTERING ORDER BY (kind ASC, time DESC)
{noformat}

Now, if the query is:
{noformat}
SELECT kind, time FROM timeseries WHERE key = somevalue LIMIT 200;
{noformat}
then, if the DESC above is just an optimisation for reversed queries, then the 
expected result is (say):
{noformat}
kind | time
---
   0 |0
   0 |1
   ...
   0 |   99
   0 |  100
   1 |0
   1 |1
   1 |2
   ...
{noformat}
but the physical layout is now in fact:
{noformat}
kind | time
---
   0 |  100
   0 |   99
   ...
   0 |1
   0 |0
   1 |  100
   1 |   99
   1 |   98
   ...
{noformat}
I don't see how to implement that query efficiently (without potentially making 
many queries).

Lastly, while I think that changing the logical ordering is the correct way to 
deal with this, the question of the syntax is another matter. I do happen to 
like the syntax in the description of this ticket, but I don't care too much 
either.


 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-16 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254707#comment-13254707
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

I still don't see how that would make us remove one of the 2 config option 
we're talking about. Even if you basically switch to an on-disk format where 
there is no notion of Row, you still have to (1) say how big are the block you 
index in the index file, and (2) how much of those index entries you load in 
memory (unless you decide to load the full index in memory, but I don't think 
that's what we're talking about). I.e. the ??raison d'être?? of both 
index_interval and column_index_size_in_kb is *not* because we have the notion 
of rows in the on-disk format.

Don't get me wrong, I'm not saying that having a file format where the row key 
is not special anymore is a bad idea at all (though I'll admit that I'm less 
convinced that moving C* to such a file format should be a priority), but it 
seems to me that it's a completely different debate.


 Promote row index
 -

 Key: CASSANDRA-2319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Sylvain Lebresne
  Labels: index, timeseries
 Fix For: 1.2

 Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
 version-g-lzf.txt, version-g.txt


 The row index contains entries for configurably sized blocks of a wide row. 
 For a row of appreciable size, the row index ends up directing the third seek 
 (1. index, 2. row index, 3. content) to nearby the first column of a scan.
 Since the row index is always used for wide rows, and since it contains 
 information that tells us whether or not the 3rd seek is necessary (the 
 column range or name we are trying to slice may not exist in a given 
 sstable), promoting the row index into the sstable index would allow us to 
 drop the maximum number of seeks for wide rows back to 2, and, more 
 importantly, would allow sstables to be eliminated using only the index.
 An example usecase that benefits greatly from this change is time series data 
 in wide rows, where data is appended to the beginning or end of the row. Our 
 existing compaction strategy gets lucky and clusters the oldest data in the 
 oldest sstables: for queries to recently appended data, we would be able to 
 eliminate wide rows using only the sstable index, rather than needing to seek 
 into the data file to determine that it isn't interesting. For narrow rows, 
 this change would have no effect, as they will not reach the threshold for 
 indexing anyway.
 A first cut design for this change would look very similar to the file format 
 design proposed on #674: 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
 column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4079) Check SSTable range before running cleanup

2012-04-16 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254714#comment-13254714
 ] 

Sylvain Lebresne commented on CASSANDRA-4079:
-

A small nit: I know this is a bit of a mess, but we also have ExcludingBounds 
and IncludingExcludingBounds that extends AbstractBounds which 
Range.intersects() doesn't handle. I don't think we should bother supporting 
them here since we don't need it, but I'd prefer protecting against future 
misuse of interests. I'd also prefer not having the method static for no good 
reason. Typically I would add intersects as a method of Bounds (that would take 
a list of Range).

 Check SSTable range before running cleanup
 --

 Key: CASSANDRA-4079
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4079
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Jonathan Ellis
Priority: Minor
  Labels: compaction
 Fix For: 1.1.1

 Attachments: 4079.txt


 Before running a cleanup compaction on an SSTable we should check the range 
 to see if the SSTable falls into the range we want to remove. If it doesn't 
 we can just mark the SSTable as compacted and be done with it, if it does, we 
 can no-op.
 Will not help with STCS, but for LCS, and perhaps some others we may see a 
 benefit here after topology changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

2012-04-16 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254729#comment-13254729
 ] 

Sylvain Lebresne commented on CASSANDRA-3912:
-

bq.  the solution is for the user to look at the cluster layout, and use exact 
tokens, right?

Well, kinda. The user should look at the layout, grab one of the range of the 
node, and then submit repair on a subset of that range. I fully agree this is 
not for the faint of heart (which is why I prefer not exposing it to nodetool 
just yet), but as it stand I'll admit I'm not sure how to improve that error 
message much.

bq. it should be possible to repair a range that falls on the boundary of two 
getLocalRanges, assuming it can be fully contained in their aggregate

Actually no. Or more precisely, in general it still have the problem mentioned 
above. Given 2 local ranges, there will be some neighbors that share one but 
not both of those ranges (so with these nodes the repair would be imprecise). 
In other words, to repair a range on the boundary of two local ranges, you'd 
really want to do 2 repairs on each subrange, because each time the set of 
neighbors will be different (we could do that splitting at the StorageService 
level but we should probably keep it simple for now. I see this ticket as just 
exposing existing code to advanced user, not reinventing repair).

bq. For JMX's sake.

Fixed :) (I've rebased the patch)

bq. Would including the 'future.session.getName' in this log message be useful

That's logged before the session is created.

 support incremental repair controlled by external agent
 ---

 Key: CASSANDRA-3912
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Peter Schuller
 Fix For: 1.2

 Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, 
 CASSANDRA-3912-v2-001-add-nodetool-commands.txt, 
 CASSANDRA-3912-v2-002-fix-antientropyservice.txt


 As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair 
 small parts of a range is extremely useful because it allows (with external 
 scripting logic) to slowly repair a node's content over time. Other than 
 avoiding the bulkyness of complete repairs, it means that you can safely do 
 repairs even if you absolutely cannot afford e.g. disk spaces spikes (see 
 CASSANDRA-2699 for what the issues are).
 Attaching a patch that exposes a repairincremental command to nodetool, 
 where you specify a step and the number of total steps. Incrementally 
 performing a repair in 100 steps, for example, would be done by:
 {code}
 nodetool repairincremental 0 100
 nodetool repairincremental 1 100
 ...
 nodetool repairincremental 99 100
 {code}
 An external script can be used to keep track of what has been repaired and 
 when. This should allow (1) allow incremental repair to happen now/soon, and 
 (2) allow experimentation and evaluation for an implementation of 
 CASSANDRA-2699 which I still think is a good idea. This patch does nothing to 
 help the average deployment, but at least makes incremental repair possible 
 given sufficient effort spent on external scripting.
 The big no-no about the patch is that it is entirely specific to 
 RandomPartitioner and BigIntegerToken. If someone can suggest a way to 
 implement this command generically using the Range/Token abstractions, I'd be 
 happy to hear suggestions.
 An alternative would be to provide a nodetool command that allows you to 
 simply specify the specific token ranges on the command line. It makes using 
 it a bit more difficult, but would mean that it works for any partitioner and 
 token type.
 Unless someone can suggest a better way to do this, I think I'll provide a 
 patch that does this. I'm still leaning towards supporting the simple step N 
 out of M form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4146) sstableloader should detect and report failures

2012-04-16 Thread Yuki Morishita (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254799#comment-13254799
 ] 

Yuki Morishita commented on CASSANDRA-4146:
---

+1

 sstableloader should detect and report failures
 ---

 Key: CASSANDRA-4146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4146
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.0.9
Reporter: Manish Zope
Assignee: Brandon Williams
Priority: Minor
  Labels: sstableloader, tools
 Fix For: 1.1.1

 Attachments: 4146.txt

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are three cases where we have observed the abnormal termination
 1) In case of exception while loading.
 2) User terminates the loading process.
 3) If some node is down OR un-reachable then sstableloader get stucked.In 
 this case user have to terminate the process in between.
 In case of abnormal termination, sstables (which are added in this session) 
 remains as it is on the cluster.In case user starts the process all over 
 again by fixing the exception, it results in duplication of the data till 
 Major compaction is triggered.
 sstableloader can maintain the session while loading the sstables in 
 cluster.So in case of abnormal termination sstableloader triggers the event 
 that will delete the sstables loaded in that session.
 Also It would be great to have timeout in case of sstableloader.That can be 
 kept configurable.
 So if sstableloader process got stucked for period longer than timeout, it 
 can terminate itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4045) BOF fails when some nodes are down

2012-04-16 Thread Yuki Morishita (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254861#comment-13254861
 ] 

Yuki Morishita commented on CASSANDRA-4045:
---

+1

 BOF fails when some nodes are down
 --

 Key: CASSANDRA-4045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
  Labels: hadoop
 Fix For: 1.1.1

 Attachments: 4045.txt


 As the summary says, we should allow jobs to complete when some targets are 
 unavailable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4156) CQL should support CL.TWO

2012-04-16 Thread Matthew F. Dennis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254945#comment-13254945
 ] 

Matthew F. Dennis commented on CASSANDRA-4156:
--

tested on EC2 with multiple nodes

 CQL should support CL.TWO
 -

 Key: CASSANDRA-4156
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4156
 Project: Cassandra
  Issue Type: Bug
  Components: API
Affects Versions: 1.0.1
Reporter: paul cannon
Assignee: Matthew F. Dennis
Priority: Minor
  Labels: cql, cql3
 Attachments: cassandra-trunk-4156.txt


 CL.TWO was overlooked in creating the CQL language spec. It should probably 
 be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4079) Check SSTable range before running cleanup

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254948#comment-13254948
 ] 

Jonathan Ellis commented on CASSANDRA-4079:
---

Changed to an abstract method in AB at 
https://github.com/jbellis/cassandra/branches/4079-3

 Check SSTable range before running cleanup
 --

 Key: CASSANDRA-4079
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4079
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Jonathan Ellis
Priority: Minor
  Labels: compaction
 Fix For: 1.1.1

 Attachments: 4079.txt


 Before running a cleanup compaction on an SSTable we should check the range 
 to see if the SSTable falls into the range we want to remove. If it doesn't 
 we can just mark the SSTable as compacted and be done with it, if it does, we 
 can no-op.
 Will not help with STCS, but for LCS, and perhaps some others we may see a 
 benefit here after topology changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4079) Check SSTable range before running cleanup

2012-04-16 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254955#comment-13254955
 ] 

Sylvain Lebresne commented on CASSANDRA-4079:
-

lgtm +1

 Check SSTable range before running cleanup
 --

 Key: CASSANDRA-4079
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4079
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Jonathan Ellis
Priority: Minor
  Labels: compaction
 Fix For: 1.1.1

 Attachments: 4079.txt


 Before running a cleanup compaction on an SSTable we should check the range 
 to see if the SSTable falls into the range we want to remove. If it doesn't 
 we can just mark the SSTable as compacted and be done with it, if it does, we 
 can no-op.
 Will not help with STCS, but for LCS, and perhaps some others we may see a 
 benefit here after topology changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4079) Check SSTable range before running cleanup

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254974#comment-13254974
 ] 

Jonathan Ellis commented on CASSANDRA-4079:
---

couldn't leave well enough alone ... 
https://github.com/jbellis/cassandra/branches/4079-4 makes AB.intersects 
non-abstract and pushes the type check into Range.insersects(AB).  I like this 
a little better since it lets me comment why I'm leaving the EB/IEB 
unimplemented in an obvious place.

 Check SSTable range before running cleanup
 --

 Key: CASSANDRA-4079
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4079
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Jonathan Ellis
Priority: Minor
  Labels: compaction
 Fix For: 1.1.1

 Attachments: 4079.txt


 Before running a cleanup compaction on an SSTable we should check the range 
 to see if the SSTable falls into the range we want to remove. If it doesn't 
 we can just mark the SSTable as compacted and be done with it, if it does, we 
 can no-op.
 Will not help with STCS, but for LCS, and perhaps some others we may see a 
 benefit here after topology changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255029#comment-13255029
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

Thanks, Dave!

I think it would be good to split up the method calls at the JMX level as well, 
since it doesn't really make sense to apply a specific CF name AND multiple 
keyspaces at the same time. What do you think?

Nit: help in nodecommand adds a second line for snapshot instead of 
snapshot_columnfamily

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Dave Brosius (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255038#comment-13255038
 ] 

Dave Brosius commented on CASSANDRA-556:


Sure that's fine, i'll fix tonite. Just wanted to make sure folks were ok with 
splitting the command as it is.

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4151) Apache project branding requirements: DOAP file [PATCH]

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255040#comment-13255040
 ] 

Jonathan Ellis commented on CASSANDRA-4151:
---

Comments:

- unclear what changed in Description, apart from s/Cassandra/it/, which isn't 
an improvement when the antecedent is unclear
- not a fan of updating this thing for each release.  would prefer to leave out.
- should probably leave out svn repo entirely as well, rather than pointing 
people to the unused (except for site) old one


 Apache project branding requirements: DOAP file [PATCH]
 ---

 Key: CASSANDRA-4151
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4151
 Project: Cassandra
  Issue Type: Improvement
Reporter: Shane Curcuru
  Labels: branding
 Attachments: doap_Cassandra.rdf


 Attached.  Re: http://www.apache.org/foundation/marks/pmcs
 See Also: http://projects.apache.org/create.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255045#comment-13255045
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

bq. Just wanted to make sure folks were ok with splitting the command as it is

I guess the main alternative would be to add more -flags...  I'm okay breaking 
backwards compatibility there.

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Dave Brosius (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255055#comment-13255055
 ] 

Dave Brosius commented on CASSANDRA-556:


the issue with -flags, is then you have the potential situation of n keyspaces 
with a cf name... which might be confusing... hopefully people don't have the 
same cf name in multiple keyspaces. -flags is also different then the way other 
commands handle cfs. But i'm fine with doing it that way as well. If that were 
the case there would be only one jmx call i would think.

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4152) cache the hashcode of DecoratedKey as it is immutable

2012-04-16 Thread Dave Brosius (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255059#comment-13255059
 ] 

Dave Brosius commented on CASSANDRA-4152:
-

perhaps consider generating the hashcode from the token, rather than the key. 
Granted the 64k Key is hopefully a corner case, but using the token would 
provide consistency in that case as well.

 cache the hashcode of DecoratedKey as it is immutable
 -

 Key: CASSANDRA-4152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4152
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dave Brosius
Priority: Trivial
 Attachments: cache_decoratedkey_hash.diff


 cache the hashcode of the DecoratedKey on first hashCode() call. DecoratedKey 
 is immutable so no need to run thru all ByteBuffer bytes of the key and do 
 hashcode math. 
 applied to trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255072#comment-13255072
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

bq. then you have the potential situation of n keyspaces with a cf name

Not sure I follow, could you elaborate?

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Dave Brosius (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255077#comment-13255077
 ] 

Dave Brosius commented on CASSANDRA-556:


if one did

   nodetool snapshot -cf foo

that could potentially take snapshots of multiple 'foo's (one each in multiple 
keyspaces) which might be something the admin wasn't realizing... right? or am 
i wrong and cf names are unique across the cluster?

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255082#comment-13255082
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

Ah, I see.  Quite right, CF names are not unique.  (So what you could do is 
check the schema nodetool-side and spit back a which KS did you want to 
snapshot CF in? error...)

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4140) Build stress classes in a location that allows tools/stress/bin/stress to find them

2012-04-16 Thread Nick Bailey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255119#comment-13255119
 ] 

Nick Bailey commented on CASSANDRA-4140:


My only wish with the new version is that we add the executable flag to the 
permissions of tools/bin/stress[d]. Besides that looks good for linux, haven't 
tried out the .bat files.

 Build stress classes in a location that allows tools/stress/bin/stress to 
 find them
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Assignee: Vijay
Priority: Trivial
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4140-v2.patch, 0001-CASSANDRA-4140.patch


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4140) Build stress classes in a location that allows tools/stress/bin/stress to find them

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255126#comment-13255126
 ] 

Jonathan Ellis commented on CASSANDRA-4140:
---

Ship it!

 Build stress classes in a location that allows tools/stress/bin/stress to 
 find them
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Assignee: Vijay
Priority: Trivial
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4140-v2.patch, 0001-CASSANDRA-4140.patch


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4140) Build stress classes in a location that allows tools/stress/bin/stress to find them

2012-04-16 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255127#comment-13255127
 ] 

Vijay commented on CASSANDRA-4140:
--

Nick,
Are you sure this is a issue with this patch?

patch has 
{code}
create mode 100755 tools/bin/stress
 create mode 100644 tools/bin/stress.bat
 create mode 100755 tools/bin/stressd
 delete mode 100755 tools/stress/bin/stress
 delete mode 100644 tools/stress/bin/stress.bat
 delete mode 100755 tools/stress/bin/stressd
{code}

and build.xml has 
{code}
 tarfileset dir=${dist.dir} prefix=${final.name} mode=755
   include name=bin/*/
+  include name=tools/bin/*/
   not
 filename name=bin/*.in.sh /
   /not
{code}

file list from the tar, shows the same...

vparthasarathy$ tar tvfz 
/Users/vparthasarathy/Documents/workspace/cassandraT11/build/apache-cassandra-1.1.0-rc1-SNAPSHOT-bin.tar.gz
 |grep -i stress
-rw-r--r--  0 0  01543 Apr 16 15:25 
apache-cassandra-1.1.0-rc1-SNAPSHOT/tools/bin/stress
-rw-r--r--  0 0  01274 Apr 16 15:25 
apache-cassandra-1.1.0-rc1-SNAPSHOT/tools/bin/stress.bat
-rw-r--r--  0 0  02194 Apr 16 15:25 
apache-cassandra-1.1.0-rc1-SNAPSHOT/tools/bin/stressd
-rw-r--r--  0 0  015312482 Apr 16 15:25 
apache-cassandra-1.1.0-rc1-SNAPSHOT/tools/lib/stress.jar
-rwxr-xr-x  0 0  01543 Apr 16 15:25 
apache-cassandra-1.1.0-rc1-SNAPSHOT/tools/bin/stress
-rwxr-xr-x  0 0  01274 Apr 16 15:25 
apache-cassandra-1.1.0-rc1-SNAPSHOT/tools/bin/stress.bat
-rwxr-xr-x  0 0  02194 Apr 16 15:25 
apache-cassandra-1.1.0-rc1-SNAPSHOT/tools/bin/stressd



 Build stress classes in a location that allows tools/stress/bin/stress to 
 find them
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Assignee: Vijay
Priority: Trivial
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4140-v2.patch, 0001-CASSANDRA-4140.patch


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4140) Build stress classes in a location that allows tools/stress/bin/stress to find them

2012-04-16 Thread Nick Bailey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255131#comment-13255131
 ] 

Nick Bailey commented on CASSANDRA-4140:


My mistake, looks like 'patch -p1  patch_File' won't keep permissions. 'git 
apply patch_file' works fine though.

 Build stress classes in a location that allows tools/stress/bin/stress to 
 find them
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Assignee: Vijay
Priority: Trivial
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4140-v2.patch, 0001-CASSANDRA-4140.patch


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3909) Pig should handle wide rows

2012-04-16 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255134#comment-13255134
 ] 

Pavel Yaskevich commented on CASSANDRA-3909:


+1

 Pig should handle wide rows
 ---

 Key: CASSANDRA-3909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1

 Attachments: 3909.txt


 Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4138) Add varint encoding to Serializing Cache

2012-04-16 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255143#comment-13255143
 ] 

Pavel Yaskevich commented on CASSANDRA-4138:


To avoid confusion related to naming of {write, read}VLong methods (as it gives 
a feeling that writeInt doesn't really write an int anymore) in the EDIS and 
EDOS I propose to rename them to {encode, decode}VInt. Furthermore, we could 
give a better feel of the encoding used by adding {VInt} as a prefix to both 
classes (as an alternative they could be moved to o.a.c.u.vint package), also I 
think the DBContants class is now should be changed to only share sizeof(type) 
methods and become something like DBContants.{native, encoded}.sizeof(type)...

 Add varint encoding to Serializing Cache
 

 Key: CASSANDRA-4138
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4138
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4138-Take1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-16 Thread Stu Hood (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255246#comment-13255246
 ] 

Stu Hood commented on CASSANDRA-2319:
-

bq. I.e. the raison d'être of both index_interval and column_index_size_in_kb 
is not because we have the notion of rows in the on-disk format.
If I'm understanding what Ellis is suggesting, it is that the entire sstable 
index could become sparse: that would mean that column_index_size_in_kb could 
be renamed to index_size_in_kb. index_interval would not change.

 Promote row index
 -

 Key: CASSANDRA-2319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Sylvain Lebresne
  Labels: index, timeseries
 Fix For: 1.2

 Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
 version-g-lzf.txt, version-g.txt


 The row index contains entries for configurably sized blocks of a wide row. 
 For a row of appreciable size, the row index ends up directing the third seek 
 (1. index, 2. row index, 3. content) to nearby the first column of a scan.
 Since the row index is always used for wide rows, and since it contains 
 information that tells us whether or not the 3rd seek is necessary (the 
 column range or name we are trying to slice may not exist in a given 
 sstable), promoting the row index into the sstable index would allow us to 
 drop the maximum number of seeks for wide rows back to 2, and, more 
 importantly, would allow sstables to be eliminated using only the index.
 An example usecase that benefits greatly from this change is time series data 
 in wide rows, where data is appended to the beginning or end of the row. Our 
 existing compaction strategy gets lucky and clusters the oldest data in the 
 oldest sstables: for queries to recently appended data, we would be able to 
 eliminate wide rows using only the sstable index, rather than needing to seek 
 into the data file to determine that it isn't interesting. For narrow rows, 
 this change would have no effect, as they will not reach the threshold for 
 indexing anyway.
 A first cut design for this change would look very similar to the file format 
 design proposed on #674: 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
 column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4157) Allow KS + CF names up to 48 characters

2012-04-16 Thread Yuki Morishita (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255318#comment-13255318
 ] 

Yuki Morishita commented on CASSANDRA-4157:
---

lgtm.

 Allow KS + CF names up to 48 characters
 ---

 Key: CASSANDRA-4157
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4157
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.1.0
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.1.0

 Attachments: 4157.txt


 CASSANDRA-2749 imposed a 32-character limit on KS and CF names.  We can be a 
 little more lenient than that and still be safe for path names (see 
 CASSANDRA-4110).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   6   7   8   9   10   >