Re: Schema versions reflect schemas on unwanted nodes

2011-10-13 Thread Eric Czech
Thanks Brandon!  Out of curiosity, would making schema changes through a
thrift interface (via hector) be any different?  In other words, would using
hector instead of the cli make schema changes possible without upgrading?

On Thu, Oct 13, 2011 at 8:22 AM, Brandon Williams  wrote:

> You're running into https://issues.apache.org/jira/browse/CASSANDRA-3259
>
> Try upgrading and doing a rolling restart.
>
> -Brandon
>
> On Thu, Oct 13, 2011 at 9:11 AM, Eric Czech  wrote:
> > Nope, there was definitely no intersection of the seed nodes between the
> two
> > clusters so I'm fairly certain that the second cluster found out about
> the
> > first through what was in the LocationInfo* system tables.  Also, I don't
> > think that procedure will really help because I don't actually want the
> > schema on cass-analysis-1 to be consistent with the schema in the
> original
> > cluster -- I just want to totally remove it.
> >
> > On Thu, Oct 13, 2011 at 8:01 AM, Mohit Anchlia 
> > wrote:
> >>
> >> Do you have same seed node specified in cass-analysis-1 as cass-1,2,3?
> >> I am thinking that changing the seed node in cass-analysis-2 and
> >> following the directions in
> >> http://wiki.apache.org/cassandra/FAQ#schema_disagreement might solve
> >> the problem. Somone please correct me.
> >>
> >> On Thu, Oct 13, 2011 at 12:05 AM, Eric Czech 
> >> wrote:
> >> > I don't think that's what I'm after here since the unwanted nodes were
> >> > originally assimilated into the cluster with the same initial_token
> >> > values
> >> > as other nodes that were already in the cluster (that have, and still
> do
> >> > have, useful data).  I know this is an awkward situation so I'll try
> to
> >> > depict it in a simpler way:
> >> > Let's say I have a simplified version of our production cluster that
> >> > looks
> >> > like this -
> >> > cass-1   token = A
> >> > cass-2   token = B
> >> > cass-3   token = C
> >> > Then I tried to create a second cluster that looks like this -
> >> > cass-analysis-1   token = A  (and contains same data as cass-1)
> >> > cass-analysis-2   token = B  (and contains same data as cass-2)
> >> > cass-analysis-3   token = C  (and contains same data as cass-3)
> >> > But after starting the second cluster, things got crossed up between
> the
> >> > clusters and here's what the original cluster now looks like -
> >> > cass-1   token = A   (has data and schema)
> >> > cass-2   token = B   (has data and schema)
> >> > cass-3   token = C   (had data and schema)
> >> > cass-analysis-1   token = A  (has *no* data and is not part of the
> ring,
> >> > but
> >> > is trying to be included in cluster schema)
> >> > A simplified version of "describe cluster"  for the original cluster
> now
> >> > shows:
> >> > Cluster Information:
> >> >Schema versions:
> >> > SCHEMA-UUID-1: [cass-1, cass-2, cass-3]
> >> > SCHEMA-UUID-2: [cass-analysis-1]
> >> > But the simplified ring looks like this (has only 3 nodes instead of
> 4):
> >> > Host   Owns Token
> >> > cass-1 33%   A
> >> > cass-2 33%   B
> >> > cass-3 33%   C
> >> > The original cluster is still working correctly but all live schema
> >> > updates
> >> > are failing because of the inconsistent schema versions introduced by
> >> > the
> >> > unwanted node.
> >> > From my perspective, a simple fix seems to be for cassandra to exclude
> >> > nodes
> >> > that aren't part of the ring from the schema consistency requirements.
> >> >  Any
> >> > reason that wouldn't work?
> >> > And aside from a possible code patch, any recommendations as to how I
> >> > can
> >> > best fix this given the current 8.4 release?
> >> >
> >> > On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis 
> >> > wrote:
> >> >>
> >> >> Does nodetool removetoken not work?
> >> >>
> >> >> On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech 
> >> >> wrote:
> >> >> > Not sure if anyone has seen this before but it's really killing me
> >> >> > right
> >> >> > now.  Perhaps that was too long of a description of the issue so
> >> >> > here's
> >> >> > a
> >> >> > more succinct question -- How do I remove nodes associated with a
> >> >> > cluster
> >> >> > that contain no data and have no reason to be associated with the
> >> >> > cluster
> >> >> > whatsoever?
> >> >> > My last resort here is to stop cassandra (after recording all
> tokens
> >> >> > for
> >> >> > each node), set the initial token for each node in the cluster in
> >> >> > cassandra.yaml, manually delete the LocationInfo* sstables in the
> >> >> > system
> >> >> > keyspace, and then restart.  I'm hoping there's a simpler, less
> >> >> > seemingly
> >> >> > risky way to do this so please, please let me know if that's true!
> >> >> > Thanks again.
> >> >> > - Eric
> >> >> > On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech <
> e...@nextbigsound.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi, I'm having what I think is a fairly uncommon schema issue --
> >> >> >> My situation is that I had a cluster with 10 nodes and a
> consistent
> >>

Restore snapshots suggestion

2011-10-13 Thread Daning
If I need to restore snapshots from all nodes, but I can only shutdown 
one node a time since it is production, is there a way I can stop data 
syncing between nodes temporarily? I don't want the existing data 
overwrites the snapshot. I found this undocumented parameter  
DoConsistencyChecksBoolean(http://www.datastax.com/dev/blog/whats-new-cassandra-066) 
to disable read repair,  what is the proper way to do it?



I am on 0.8.6.

Thank you in advance,

Daning


Re: Cassandra as session store under heavy load

2011-10-13 Thread Jonathan Ellis
Or upgrade to 1.0 and use leveled compaction
(http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra)

On Thu, Oct 13, 2011 at 4:28 PM, aaron morton  wrote:
> They only have a minimum time, gc_grace_seconds for deletes.
>
> If you want to be really watch disk space reduce the compaction thresholds on 
> the CF.
>
> Or run a major compaction as part of maintenance.
>
> cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/10/2011, at 10:50 PM, Maciej Miklas wrote:
>
>> durable_writes sounds great - thank you! I really do not need commit log 
>> here.
>>
>> Another question: it is possible to configure live time of Tombstones?
>>
>>
>> Regards,
>> Maciej
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cassandra as session store under heavy load

2011-10-13 Thread aaron morton
They only have a minimum time, gc_grace_seconds for deletes.

If you want to be really watch disk space reduce the compaction thresholds on 
the CF. 

Or run a major compaction as part of maintenance. 

cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 13/10/2011, at 10:50 PM, Maciej Miklas wrote:

> durable_writes sounds great - thank you! I really do not need commit log here.
> 
> Another question: it is possible to configure live time of Tombstones? 
> 
> 
> Regards,
> Maciej



Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
Then just use a soundex function on the first word in the text... that
will shrink it sufficiently and give nice buckets in near sequential
order (http://en.wikipedia.org/wiki/Soundex)

On 13 October 2011 21:21, Matthias Pfau  wrote:
> Hi Stephen,
> we are hashing the first 8 byte (8 US-ASCII characters) of text that has
> been written by humans. Wouldn't it be easy for the attacker to do a
> dictionary attack on this text, especially if he knows the language of the
> text?
>
> Kind regards
> Matthias
>
> On 10/13/2011 08:20 PM, Stephen Connolly wrote:
>>
>> in theory, however they have less than 32 bits of entropy from which
>> they can do that, leaving them with at least 32 more bits of
>> combinations to try... that's 2 billion or so... must be a big dictionary
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on
>> the screen
>>
>> On 13 Oct 2011 17:57, "Matthias Pfau" mailto:p...@l3s.de>>
>> wrote:
>>
>>    Hi Stephen,
>>    this sounds very reasonable. But wouldn't this enable an attacker to
>>    execute dictionary attacks in order to "decrypt" the first 8 bytes
>>    of the plain text?
>>
>>    Kind regards
>>    Matthias
>>
>>    On 10/13/2011 05:03 PM, Stephen Connolly wrote:
>>
>>        It wouldn't be unencrypted... which is the point
>>
>>        you use a one way linear hash function to take the first, say 8
>>        bytes,
>>        of unencrypted data and turn it into 4 bytes of a sort prefix.
>>
>>        You've used lost half the data in the process, so effectively
>>        each bit
>>        is an OR of two bits and you can only infer from 0 values... so
>> data
>>        is still encrypted, but you have an approximate sorting.
>>
>>        For example, if your data is US-ASCII text with no numbers, you
>>        could
>>        use Soundex to get the pre-key, so that worst case you have a
>> bucket
>>        of values in the range.
>>
>>        Using this technique, a random get will have to get the values
>>        at the
>>        desired prefix +/- a small amount rather than the whole row...
>>        on the
>>        client side you can then decrypt the data and sort that small
>> bucket
>>        to get the correct index position.
>>
>>        You could do a 1 byte prefix, but that only gives you at best 256
>>        buckets and assumes that the first 2 bytes are uniformly
>>        distributed... you've said your data is not uniformly
>>        distributed, so
>>        a linear hash function sounds like your best bet.
>>
>>        your hash function should have the property that hash(A)>=
>>        hash(B) if
>>        and only if A>= B
>>
>>        On 13 October 2011 08:47, Matthias Pfau>        >  wrote:
>>
>>            Hi Stephen,
>>            this is a great idea but unfortunately doesn't work for us
>>            either as we can
>>            not store the data in an unencrypted form.
>>
>>            Kind regards
>>            Matthias
>>
>>            On 10/12/2011 07:42 PM, Stephen Connolly wrote:
>>
>>
>>                could you prefix the data with 3-4 bytes of a linear
>>                hash of the
>>                unencypted data? it wouldn't be a perfect sort, but
>>                you'd have less of a
>>                range to query to get the sorted values?
>>
>>                - Stephen
>>
>>                ---
>>                Sent from my Android phone, so random spelling mistakes,
>>                random nonsense
>>                words and other nonsense are a direct result of using
>>                swype to type on
>>                the screen
>>
>>                On 12 Oct 2011 17:57, "Matthias Pfau">                >                >>
>>                wrote:
>>
>>                    Unfortunately, that is not an option as we have to
>>                store the data in
>>                    an compressed and encrypted and therefore binary and
>>                non-sortable form.
>>
>>                    On 10/12/2011 06:39 PM, David McNelis wrote:
>>
>>                        Is it an option to not convert the data to
>>                binary prior to
>>                inserting
>>                        into Cassandra?  Also, how large are the strings
>>                you're sorting?
>>                          If its
>>                        viable to not convert to binary before writing
>>                to Cassandra, and
>>                        you use
>>                        one of the string based column ordering
>>                techniques (utf8, ascii,
>>                for
>>                        example), then the data would be sorted without
>>                you  needing to
>>                        specifically worry about that.  Of course, if
>>                the strings are
>>                        lengthy
>>         

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau

Hi Stephen,
we are hashing the first 8 byte (8 US-ASCII characters) of text that has 
been written by humans. Wouldn't it be easy for the attacker to do a 
dictionary attack on this text, especially if he knows the language of 
the text?


Kind regards
Matthias

On 10/13/2011 08:20 PM, Stephen Connolly wrote:

in theory, however they have less than 32 bits of entropy from which
they can do that, leaving them with at least 32 more bits of
combinations to try... that's 2 billion or so... must be a big dictionary

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on
the screen

On 13 Oct 2011 17:57, "Matthias Pfau" mailto:p...@l3s.de>>
wrote:

Hi Stephen,
this sounds very reasonable. But wouldn't this enable an attacker to
execute dictionary attacks in order to "decrypt" the first 8 bytes
of the plain text?

Kind regards
Matthias

On 10/13/2011 05:03 PM, Stephen Connolly wrote:

It wouldn't be unencrypted... which is the point

you use a one way linear hash function to take the first, say 8
bytes,
of unencrypted data and turn it into 4 bytes of a sort prefix.

You've used lost half the data in the process, so effectively
each bit
is an OR of two bits and you can only infer from 0 values... so data
is still encrypted, but you have an approximate sorting.

For example, if your data is US-ASCII text with no numbers, you
could
use Soundex to get the pre-key, so that worst case you have a bucket
of values in the range.

Using this technique, a random get will have to get the values
at the
desired prefix +/- a small amount rather than the whole row...
on the
client side you can then decrypt the data and sort that small bucket
to get the correct index position.

You could do a 1 byte prefix, but that only gives you at best 256
buckets and assumes that the first 2 bytes are uniformly
distributed... you've said your data is not uniformly
distributed, so
a linear hash function sounds like your best bet.

your hash function should have the property that hash(A)>=
hash(B) if
and only if A>= B

On 13 October 2011 08:47, Matthias Pfaumailto:p...@l3s.de>>  wrote:

Hi Stephen,
this is a great idea but unfortunately doesn't work for us
either as we can
not store the data in an unencrypted form.

Kind regards
Matthias

On 10/12/2011 07:42 PM, Stephen Connolly wrote:


could you prefix the data with 3-4 bytes of a linear
hash of the
unencypted data? it wouldn't be a perfect sort, but
you'd have less of a
range to query to get the sorted values?

- Stephen

---
Sent from my Android phone, so random spelling mistakes,
random nonsense
words and other nonsense are a direct result of using
swype to type on
the screen

On 12 Oct 2011 17:57, "Matthias Pfau"mailto:p...@l3s.de>>>
wrote:

Unfortunately, that is not an option as we have to
store the data in
an compressed and encrypted and therefore binary and
non-sortable form.

On 10/12/2011 06:39 PM, David McNelis wrote:

Is it an option to not convert the data to
binary prior to
inserting
into Cassandra?  Also, how large are the strings
you're sorting?
  If its
viable to not convert to binary before writing
to Cassandra, and
you use
one of the string based column ordering
techniques (utf8, ascii,
for
example), then the data would be sorted without
you  needing to
specifically worry about that.  Of course, if
the strings are
lengthy
you could run into  additional issues.

On Wed, Oct 12, 2011 at 11:34 AM, Matthias
Pfaumailto:p...@l3s.de>
>


Re: MapReduce with two ethernet cards

2011-10-13 Thread Brandon Williams
On Thu, Oct 13, 2011 at 1:17 PM, Scott Fines  wrote:
> When I look at the source for ColumnFamilyInputFormat, it appears that it 
> does a call to client.describe_ring; when you do the equivalent call  with 
> nodetool, you get the 10.1.1.* addresses.  This seems to indicate to me that 
> I should open up the firewall and attempt to contact those IPs instead of the 
> normal thrift IPs.
>
> That leads me to think that I need to have thrift listening on both IPs, 
> though. Would that then be the case?

My mistake, I thought I'd committed this:
https://issues.apache.org/jira/browse/CASSANDRA-3214

Can you see if that solves your issue?

-Brandon


Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
in theory, however they have less than 32 bits of entropy from which they
can do that, leaving them with at least 32 more bits of combinations to
try... that's 2 billion or so... must be a big dictionary

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 13 Oct 2011 17:57, "Matthias Pfau"  wrote:

> Hi Stephen,
> this sounds very reasonable. But wouldn't this enable an attacker to
> execute dictionary attacks in order to "decrypt" the first 8 bytes of the
> plain text?
>
> Kind regards
> Matthias
>
> On 10/13/2011 05:03 PM, Stephen Connolly wrote:
>
>> It wouldn't be unencrypted... which is the point
>>
>> you use a one way linear hash function to take the first, say 8 bytes,
>> of unencrypted data and turn it into 4 bytes of a sort prefix.
>>
>> You've used lost half the data in the process, so effectively each bit
>> is an OR of two bits and you can only infer from 0 values... so data
>> is still encrypted, but you have an approximate sorting.
>>
>> For example, if your data is US-ASCII text with no numbers, you could
>> use Soundex to get the pre-key, so that worst case you have a bucket
>> of values in the range.
>>
>> Using this technique, a random get will have to get the values at the
>> desired prefix +/- a small amount rather than the whole row... on the
>> client side you can then decrypt the data and sort that small bucket
>> to get the correct index position.
>>
>> You could do a 1 byte prefix, but that only gives you at best 256
>> buckets and assumes that the first 2 bytes are uniformly
>> distributed... you've said your data is not uniformly distributed, so
>> a linear hash function sounds like your best bet.
>>
>> your hash function should have the property that hash(A)>= hash(B) if
>> and only if A>= B
>>
>> On 13 October 2011 08:47, Matthias Pfau  wrote:
>>
>>> Hi Stephen,
>>> this is a great idea but unfortunately doesn't work for us either as we
>>> can
>>> not store the data in an unencrypted form.
>>>
>>> Kind regards
>>> Matthias
>>>
>>> On 10/12/2011 07:42 PM, Stephen Connolly wrote:
>>>

 could you prefix the data with 3-4 bytes of a linear hash of the
 unencypted data? it wouldn't be a perfect sort, but you'd have less of a
 range to query to get the sorted values?

 - Stephen

 ---
 Sent from my Android phone, so random spelling mistakes, random nonsense
 words and other nonsense are a direct result of using swype to type on
 the screen

 On 12 Oct 2011 17:57, "Matthias 
 Pfau"mailto:pfau@**l3s.de
 >>
 wrote:

Unfortunately, that is not an option as we have to store the data in
an compressed and encrypted and therefore binary and non-sortable
 form.

On 10/12/2011 06:39 PM, David McNelis wrote:

Is it an option to not convert the data to binary prior to
 inserting
into Cassandra?  Also, how large are the strings you're sorting?
  If its
viable to not convert to binary before writing to Cassandra, and
you use
one of the string based column ordering techniques (utf8, ascii,
 for
example), then the data would be sorted without you  needing to
specifically worry about that.  Of course, if the strings are
lengthy
you could run into  additional issues.

On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau>>>
>>
  wrote:

Hi there,
we are currently building a prototype based on cassandra and
came
into problems on implementing sorted lists containing
millions of items.

The special thing about the items of our lists is, that
cassandra is
not able to sort them as the data is stored in a binary
format which
is not sortable. However, we are able to sort the data
before the
plain data gets encoded (our application is responsible for
the order).

First Approach: Storing Lists in ColumnFamilies
***
We first tried to map the list to a single row of a
ColumnFamily in
a way that the index of the list is mapped to the column
names and
the items of the list to the column values. The column names
 are
increasing numbers which define the sort order.
This has the major drawback that big parts of the list have
to be
rewritten on inserts (because the column names are numbered
by their
index), which are quite common.


Second Approach: Storing the wh

RE: MapReduce with two ethernet cards

2011-10-13 Thread Scott Fines
When I look at the source for ColumnFamilyInputFormat, it appears that it does 
a call to client.describe_ring; when you do the equivalent call  with nodetool, 
you get the 10.1.1.* addresses.  This seems to indicate to me that I should 
open up the firewall and attempt to contact those IPs instead of the normal 
thrift IPs. 

That leads me to think that I need to have thrift listening on both IPs, 
though. Would that then be the case?

Scott

From: Scott Fines [scott.fi...@nisc.coop]
Sent: Thursday, October 13, 2011 12:40 PM
To: user@cassandra.apache.org
Subject: RE: MapReduce with two ethernet cards

The listen address on all machines are set to the 10.1.1.* addresses, while the 
thrift rpc address is the 172.28.* addresses


From: Brandon Williams [dri...@gmail.com]
Sent: Thursday, October 13, 2011 12:28 PM
To: user@cassandra.apache.org
Subject: Re: MapReduce with two ethernet cards

What is your rpc_address set to?  If it's 0.0.0.0 (bind everything)
then that's not going to work if listen_address is blocked.

-Brandon

On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines  wrote:
> I upgraded to cassandra 0.8.7, and the problem persists.
>
> Scott
> 
> From: Brandon Williams [dri...@gmail.com]
> Sent: Monday, October 10, 2011 12:28 PM
> To: user@cassandra.apache.org
> Subject: Re: MapReduce with two ethernet cards
>
> On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines  wrote:
>> Hi all,
>> This may be a silly question, but I'm at a bit of a loss, and was hoping for
>> some help.
>> I have a Cassandra cluster set up with two NICs--one for internel
>> communication between cassandra machines (10.1.1.*), and one to respond to
>> Thrift RPC (172.28.*.*).
>> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
>> remain separate from Cassandra, so I've written a little MapReduce job to
>> copy data from Cassandra to Hadoop. However, when I try to run my job, I
>> get
>> java.io.IOException: failed connecting to all endpoints
>> 10.1.1.24,10.1.1.17,10.1.1.16
>> which is puzzling to me. It seems like the MR is attempting to connect to
>> the internal communication IPs instead of the external Thrift IPs. Since I
>> set up a firewall to block external access to the internal IPs of Cassandra,
>> this is obviously going to fail.
>> So my question is: why does Cassandra MR seem to be grabbing the
>> listen_address instead of the Thrift one. Presuming it's not a funky
>> configuration error or something on my part, is that strictly necessary? All
>> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
>> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
>> Thanks for your help,
>> Scott
>
> Your cassandra is old, upgrade to the latest version.
>
> -Brandon
>


RE: MapReduce with two ethernet cards

2011-10-13 Thread Scott Fines
The listen address on all machines are set to the 10.1.1.* addresses, while the 
thrift rpc address is the 172.28.* addresses


From: Brandon Williams [dri...@gmail.com]
Sent: Thursday, October 13, 2011 12:28 PM
To: user@cassandra.apache.org
Subject: Re: MapReduce with two ethernet cards

What is your rpc_address set to?  If it's 0.0.0.0 (bind everything)
then that's not going to work if listen_address is blocked.

-Brandon

On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines  wrote:
> I upgraded to cassandra 0.8.7, and the problem persists.
>
> Scott
> 
> From: Brandon Williams [dri...@gmail.com]
> Sent: Monday, October 10, 2011 12:28 PM
> To: user@cassandra.apache.org
> Subject: Re: MapReduce with two ethernet cards
>
> On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines  wrote:
>> Hi all,
>> This may be a silly question, but I'm at a bit of a loss, and was hoping for
>> some help.
>> I have a Cassandra cluster set up with two NICs--one for internel
>> communication between cassandra machines (10.1.1.*), and one to respond to
>> Thrift RPC (172.28.*.*).
>> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
>> remain separate from Cassandra, so I've written a little MapReduce job to
>> copy data from Cassandra to Hadoop. However, when I try to run my job, I
>> get
>> java.io.IOException: failed connecting to all endpoints
>> 10.1.1.24,10.1.1.17,10.1.1.16
>> which is puzzling to me. It seems like the MR is attempting to connect to
>> the internal communication IPs instead of the external Thrift IPs. Since I
>> set up a firewall to block external access to the internal IPs of Cassandra,
>> this is obviously going to fail.
>> So my question is: why does Cassandra MR seem to be grabbing the
>> listen_address instead of the Thrift one. Presuming it's not a funky
>> configuration error or something on my part, is that strictly necessary? All
>> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
>> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
>> Thanks for your help,
>> Scott
>
> Your cassandra is old, upgrade to the latest version.
>
> -Brandon
>


Re: MapReduce with two ethernet cards

2011-10-13 Thread Brandon Williams
What is your rpc_address set to?  If it's 0.0.0.0 (bind everything)
then that's not going to work if listen_address is blocked.

-Brandon

On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines  wrote:
> I upgraded to cassandra 0.8.7, and the problem persists.
>
> Scott
> 
> From: Brandon Williams [dri...@gmail.com]
> Sent: Monday, October 10, 2011 12:28 PM
> To: user@cassandra.apache.org
> Subject: Re: MapReduce with two ethernet cards
>
> On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines  wrote:
>> Hi all,
>> This may be a silly question, but I'm at a bit of a loss, and was hoping for
>> some help.
>> I have a Cassandra cluster set up with two NICs--one for internel
>> communication between cassandra machines (10.1.1.*), and one to respond to
>> Thrift RPC (172.28.*.*).
>> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
>> remain separate from Cassandra, so I've written a little MapReduce job to
>> copy data from Cassandra to Hadoop. However, when I try to run my job, I
>> get
>> java.io.IOException: failed connecting to all endpoints
>> 10.1.1.24,10.1.1.17,10.1.1.16
>> which is puzzling to me. It seems like the MR is attempting to connect to
>> the internal communication IPs instead of the external Thrift IPs. Since I
>> set up a firewall to block external access to the internal IPs of Cassandra,
>> this is obviously going to fail.
>> So my question is: why does Cassandra MR seem to be grabbing the
>> listen_address instead of the Thrift one. Presuming it's not a funky
>> configuration error or something on my part, is that strictly necessary? All
>> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
>> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
>> Thanks for your help,
>> Scott
>
> Your cassandra is old, upgrade to the latest version.
>
> -Brandon
>


Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau

Hi Stephen,
this sounds very reasonable. But wouldn't this enable an attacker to 
execute dictionary attacks in order to "decrypt" the first 8 bytes of 
the plain text?


Kind regards
Matthias

On 10/13/2011 05:03 PM, Stephen Connolly wrote:

It wouldn't be unencrypted... which is the point

you use a one way linear hash function to take the first, say 8 bytes,
of unencrypted data and turn it into 4 bytes of a sort prefix.

You've used lost half the data in the process, so effectively each bit
is an OR of two bits and you can only infer from 0 values... so data
is still encrypted, but you have an approximate sorting.

For example, if your data is US-ASCII text with no numbers, you could
use Soundex to get the pre-key, so that worst case you have a bucket
of values in the range.

Using this technique, a random get will have to get the values at the
desired prefix +/- a small amount rather than the whole row... on the
client side you can then decrypt the data and sort that small bucket
to get the correct index position.

You could do a 1 byte prefix, but that only gives you at best 256
buckets and assumes that the first 2 bytes are uniformly
distributed... you've said your data is not uniformly distributed, so
a linear hash function sounds like your best bet.

your hash function should have the property that hash(A)>= hash(B) if
and only if A>= B

On 13 October 2011 08:47, Matthias Pfau  wrote:

Hi Stephen,
this is a great idea but unfortunately doesn't work for us either as we can
not store the data in an unencrypted form.

Kind regards
Matthias

On 10/12/2011 07:42 PM, Stephen Connolly wrote:


could you prefix the data with 3-4 bytes of a linear hash of the
unencypted data? it wouldn't be a perfect sort, but you'd have less of a
range to query to get the sorted values?

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on
the screen

On 12 Oct 2011 17:57, "Matthias Pfau"mailto:p...@l3s.de>>
wrote:

Unfortunately, that is not an option as we have to store the data in
an compressed and encrypted and therefore binary and non-sortable form.

On 10/12/2011 06:39 PM, David McNelis wrote:

Is it an option to not convert the data to binary prior to
inserting
into Cassandra?  Also, how large are the strings you're sorting?
  If its
viable to not convert to binary before writing to Cassandra, and
you use
one of the string based column ordering techniques (utf8, ascii,
for
example), then the data would be sorted without you  needing to
specifically worry about that.  Of course, if the strings are
lengthy
you could run into  additional issues.

On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfaumailto:p...@l3s.de>
>>  wrote:

Hi there,
we are currently building a prototype based on cassandra and
came
into problems on implementing sorted lists containing
millions of items.

The special thing about the items of our lists is, that
cassandra is
not able to sort them as the data is stored in a binary
format which
is not sortable. However, we are able to sort the data
before the
plain data gets encoded (our application is responsible for
the order).

First Approach: Storing Lists in ColumnFamilies
***
We first tried to map the list to a single row of a
ColumnFamily in
a way that the index of the list is mapped to the column
names and
the items of the list to the column values. The column names
are
increasing numbers which define the sort order.
This has the major drawback that big parts of the list have
to be
rewritten on inserts (because the column names are numbered
by their
index), which are quite common.


Second Approach: Storing the whole List as Binary Data:
***
We tried to store the compressed list in a single column.
However,
this is only feasible for smaller lists. Our lists are far
to big
leading to multi megabyte reads and writes. As we need to
read and
update the lists quite often, this would put our Cassandra
cluster
under a lot of pressure.

Ideal Solution: Native support for storing lists
***
We would be very happy with a way to store a list of sorted
values
without making improper use of column names for the list
index. This
implies that we would need a possibility to insert values at
defined
positions. We know that this could lead to problems with
concurrent
inserts

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau

Hi Zach,
thanks for your additional input. You are absolutely right: The long 
namespace should be big enough. We are going to insert up to 2^32 values 
into the list.


We only need support for get(index), insert(index) and remove(index) 
while get and insert will be used very often. Remove is also needed but 
used very rare.


Kind regards
Matthias

On 10/13/2011 04:49 PM, Zach Richardson wrote:

Matthias,

Answers below.

On Thu, Oct 13, 2011 at 9:03 AM, Matthias Pfau  wrote:

Hi Zach,
thanks for that good idea. Unfortunately, our list needs to be rewritten
often because our data is far away from being evenly distributed.


This shouldn't be a problem if you use long's.  If you were to space
them at original write (with N objects) at a distance of
Long.MAX_VALUE / N, and N was 10,000,000 you could still fit another
1844674407370 entries in between.


However, we could get this under control but there is a more severe problem:
Random access is very hard to implement on a structure with undefined
distances between two following index numbers. We absolutely need random
access because the lists are too big to do this on the application side :-(


I'm guessing you need to be able to implement all of the traditional
get(index), set(index), insert(index) type operations on the "list."
Once you start trying to do that, you start to hit all of the same
problems you get with different in memory list implementations based
on which operation is most important.

Could you provide some more information on what operations will be
performed the most, and how important they are.  I think that would
help anyone recommend a path to take.

Zach


Kind regards
Matthias

On 10/13/2011 02:30 PM, Zach Richardson wrote:


Matthias,

This is an interesting problem.

I would consider using long's as the column type, where your column
names are evenly distributed longs in sort order when you first write
your list out.  So if you have items A and C with the long column
names 1000 and 2000, and then you have to insert B, it gets inserted
at 1500.  Once you run out of room between any two column name
entries, i.e 1000, 1001, 1002 entries are all taken at any spot in the
list, go ahead and re-write the list.

If your unencrypted data is uniformly distributed, you will have very
few collisions on your column names and should not have to re-write
the list to often.

If your lists are small enough, then you could use ints to save space,
but will then have to re-write the list more often.

Thanks,

Zach

On Thu, Oct 13, 2011 at 2:47 AM, Matthias Pfauwrote:


Hi Stephen,
this is a great idea but unfortunately doesn't work for us either as we
can
not store the data in an unencrypted form.

Kind regards
Matthias

On 10/12/2011 07:42 PM, Stephen Connolly wrote:


could you prefix the data with 3-4 bytes of a linear hash of the
unencypted data? it wouldn't be a perfect sort, but you'd have less of a
range to query to get the sorted values?

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on
the screen

On 12 Oct 2011 17:57, "Matthias Pfau"mailto:p...@l3s.de>>
wrote:

Unfortunately, that is not an option as we have to store the data in
an compressed and encrypted and therefore binary and non-sortable
form.

On 10/12/2011 06:39 PM, David McNelis wrote:

Is it an option to not convert the data to binary prior to
inserting
into Cassandra?  Also, how large are the strings you're sorting?
  If its
viable to not convert to binary before writing to Cassandra, and
you use
one of the string based column ordering techniques (utf8, ascii,
for
example), then the data would be sorted without you  needing to
specifically worry about that.  Of course, if the strings are
lengthy
you could run into  additional issues.

On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfaumailto:p...@l3s.de>
>>wrote:

Hi there,
we are currently building a prototype based on cassandra and
came
into problems on implementing sorted lists containing
millions of items.

The special thing about the items of our lists is, that
cassandra is
not able to sort them as the data is stored in a binary
format which
is not sortable. However, we are able to sort the data
before the
plain data gets encoded (our application is responsible for
the order).

First Approach: Storing Lists in ColumnFamilies
***
We first tried to map the list to a single row of a
ColumnFamily in
a way that the index of the list is mapped to the column
names and
the items of the list to the column values. The column names
are
increasing numbers which d

Re: Efficiency of hector's setRowCount (and setStartKey!)

2011-10-13 Thread Patricio Echagüe
On Thu, Oct 13, 2011 at 9:39 AM, Don Smith  wrote:

> **
> It's actually setStartKey that's the important method call (in combination
> with setRowCount). So I should have been clearer.
>
> The following code performs as expected, as far as returning the expected
> data in the expected order.  I believe that the use of IndexedSliceQuery's
> setStartKey will support efficient queries -- avoiding repulling the entire
> data set from cassandra. Correct?
>

correct

>
>
> void demoPaging() {
> String lastKey = processPage("don","");  // get first
> batch, starting with "" (smallest key)
> lastKey = processPage("don",lastKey);// get second
> batch starting with previous last key
> lastKey = processPage("don",lastKey);// get third batch
> starting with previous last key
>//
> }
>
> // return last key processed, null when no records left
> String processPage(String username, String startKey) {
> String lastKey=null;
> IndexedSlicesQuery
> indexedSlicesQuery =
> HFactory.createIndexedSlicesQuery(keyspace,
> stringSerializer, stringSerializer, stringSerializer);
>
> indexedSlicesQuery.addEqualsExpression("user", username);
>
> indexedSlicesQuery.setColumnNames("source","ip");
>
> indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);
> indexedSlicesQuery.setStartKey(startKey);
> //
> <
> indexedSlicesQuery.setRowCount(batchSize);
> QueryResult String>> result =indexedSlicesQuery.execute();
> OrderedRows rows =
> result.get();
> for(Row row:rows ){
> if (row==null) { continue; }
> totalCount++;
> String key = row.getKey();
>
> if (!startKey.equals(key))
> {lastKey=key;}
> }
> totalCount--;
> return lastKey;
> }
>
>
>
>
>
>
> On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
>
> Hi Don. No it will not. IndexedSlicesQuery will read just the amount of
> rows specified by RowCount and will go to the DB to get the new page when
> needed.
>
>  SetRowCount is doing indexClause.setCount(rowCount);
>
> On Mon, Oct 10, 2011 at 3:52 PM, Don Smith  wrote:
>
>> Hector's IndexedSlicesQuery has a setRowCount method that you can use to
>> page through the results, as described in
>> https://github.com/rantav/hector/wiki/User-Guide .
>>
>> rangeSlicesQuery.setRowCount(1001);
>>  .
>> rangeSlicesQuery.setKeys(lastRow.getKey(),  "");
>>
>> Is it efficient?  Specifically, suppose my query returns 100,000 results
>> and I page through batches of 1000 at a time (making 100 executes of the
>> query). Will it internally retrieve all the results each time (but pass only
>> the desired set of 1000 or so to me)? Or will it optimize queries to avoid
>> the duplication?  I presume the latter. :)
>>
>> Can IndexedSlicesQuery's setStartKey method be used for the same effect?
>>
>>   Thanks,  Don
>>
>
>
>


Re: Efficiency of hector's setRowCount (and setStartKey!)

2011-10-13 Thread Don Smith
It's actually setStartKey that's the important method call (in 
combination with setRowCount). So I should have been clearer.


The following code performs as expected, as far as returning the 
expected data in the expected order.  I believe that the use of 
IndexedSliceQuery's setStartKey will support efficient queries -- 
avoiding repulling the entire data set from cassandra. Correct?



void demoPaging() {
String lastKey = processPage("don","");  // get first 
batch, starting with "" (smallest key)
lastKey = processPage("don",lastKey);// get second 
batch starting with previous last key
lastKey = processPage("don",lastKey);// get third 
batch starting with previous last key

   //
}

// return last key processed, null when no records left
String processPage(String username, String startKey) {
String lastKey=null;
IndexedSlicesQuery 
indexedSlicesQuery =

HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, 
stringSerializer, stringSerializer);

indexedSlicesQuery.addEqualsExpression("user", username);

indexedSlicesQuery.setColumnNames("source","ip");

indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);

indexedSlicesQuery.setStartKey(startKey);   // 
<

indexedSlicesQuery.setRowCount(batchSize);
QueryResultString>> result =indexedSlicesQuery.execute();
OrderedRows rows 
= result.get();

for(Row row:rows ){
if (row==null) { continue; }
totalCount++;
String key = row.getKey();

if (!startKey.equals(key)) 
{lastKey=key;}

}
totalCount--;
return lastKey;
}






On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
Hi Don. No it will not. IndexedSlicesQuery will read just the amount 
of rows specified by RowCount and will go to the DB to get the new 
page when needed.


SetRowCount is doing indexClause.setCount(rowCount);

On Mon, Oct 10, 2011 at 3:52 PM, Don Smith > wrote:


Hector's IndexedSlicesQuery has a setRowCount method that you can
use to page through the results, as described in
https://github.com/rantav/hector/wiki/User-Guide .

rangeSlicesQuery.setRowCount(1001);
 .
rangeSlicesQuery.setKeys(lastRow.getKey(),  "");

Is it efficient?  Specifically, suppose my query returns 100,000
results and I page through batches of 1000 at a time (making 100
executes of the query). Will it internally retrieve all the
results each time (but pass only the desired set of 1000 or so to
me)? Or will it optimize queries to avoid the duplication?  I
presume the latter. :)

Can IndexedSlicesQuery's setStartKey method be used for the same
effect?

  Thanks,  Don






Re: Hector has a website

2011-10-13 Thread Patricio Echagüe
Hi Aaron. does it still happen ? We didn't set up any password on the page.

On Tue, Oct 11, 2011 at 9:15 AM, Aaron Turner  wrote:

> Just a FYI:
>
> http://hector-client.org is requesting a username/pass
> http://www.hector-client.org is working fine
>
> On Fri, Oct 7, 2011 at 12:51 AM, aaron morton 
> wrote:
> > Thanks, will be handy for new peeps.
> > A
> > -
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> > On 7/10/2011, at 12:00 PM, Patricio Echagüe wrote:
> >
> > Hi, I wanted to let you all know that Hector client has a website.
> > http://hector-client.org
> > There are links to documentation, Javadoc and resources from the
> community.
> > If you have a personal blog and want us to include the link, let us know.
> > Feedback is always welcome.
> > Thanks!
> > Hector Team.
> >
>
>
>
> --
> Aaron Turner
> http://synfin.net/ Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
> -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>


Re: Efficiency of hector's setRowCount

2011-10-13 Thread Patricio Echagüe
Hi Don. No it will not. IndexedSlicesQuery will read just the amount of rows
specified by RowCount and will go to the DB to get the new page when needed.

SetRowCount is doing indexClause.setCount(rowCount);

On Mon, Oct 10, 2011 at 3:52 PM, Don Smith  wrote:

> Hector's IndexedSlicesQuery has a setRowCount method that you can use to
> page through the results, as described in https://github.com/rantav/**
> hector/wiki/User-Guide .
>
> rangeSlicesQuery.setRowCount(**1001);
>  .
> rangeSlicesQuery.setKeys(**lastRow.getKey(),  "");
>
> Is it efficient?  Specifically, suppose my query returns 100,000 results
> and I page through batches of 1000 at a time (making 100 executes of the
> query). Will it internally retrieve all the results each time (but pass only
> the desired set of 1000 or so to me)? Or will it optimize queries to avoid
> the duplication?  I presume the latter. :)
>
> Can IndexedSlicesQuery's setStartKey method be used for the same effect?
>
>   Thanks,  Don
>


RE: MapReduce with two ethernet cards

2011-10-13 Thread Scott Fines
I upgraded to cassandra 0.8.7, and the problem persists.

Scott

From: Brandon Williams [dri...@gmail.com]
Sent: Monday, October 10, 2011 12:28 PM
To: user@cassandra.apache.org
Subject: Re: MapReduce with two ethernet cards

On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines  wrote:
> Hi all,
> This may be a silly question, but I'm at a bit of a loss, and was hoping for
> some help.
> I have a Cassandra cluster set up with two NICs--one for internel
> communication between cassandra machines (10.1.1.*), and one to respond to
> Thrift RPC (172.28.*.*).
> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
> remain separate from Cassandra, so I've written a little MapReduce job to
> copy data from Cassandra to Hadoop. However, when I try to run my job, I
> get
> java.io.IOException: failed connecting to all endpoints
> 10.1.1.24,10.1.1.17,10.1.1.16
> which is puzzling to me. It seems like the MR is attempting to connect to
> the internal communication IPs instead of the external Thrift IPs. Since I
> set up a firewall to block external access to the internal IPs of Cassandra,
> this is obviously going to fail.
> So my question is: why does Cassandra MR seem to be grabbing the
> listen_address instead of the Thrift one. Presuming it's not a funky
> configuration error or something on my part, is that strictly necessary? All
> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
> Thanks for your help,
> Scott

Your cassandra is old, upgrade to the latest version.

-Brandon


Re: Hector Problem Basic one

2011-10-13 Thread Patricio Echagüe
Hi, Hector does not retry on a down server. In the unit tests where you have
just one server, Hector will pass the exception to the client.

Can you tell us please what your test looks like ?

2011/10/12 Wangpei (Peter) 

>  I only saw this error message when all Cassandra nodes are down.
>
> How you get the Cluster and how you set the hosts?
>
> ** **
>
> *发件人:* CASSANDRA learner [mailto:cassandralear...@gmail.com]
> *发送时间:* 2011年10月12日 14:30
> *收件人:* user@cassandra.apache.org
> *主题:* Re: Hector Problem Basic one
>
> ** **
>
> Thanks for the reply ben.
>
> Actually The problem is, I could not able to run a basic hector example
> from eclipse. Its throwing "me.prettyprint.hector.api.
>
> exceptions.HectorException: All host pools marked
> > down. Retry burden pushed out to client
>
> "
> Can you please let me know why i am getting this
>
> 
>
> On Tue, Oct 11, 2011 at 3:54 PM, Ben Ashton  wrote:*
> ***
>
> Hey,
>
> We had this one, even tho in the hector documentation it says that it
> retry s failed servers even 30 by default, it doesn't.
>
> Once we explicitly set it to X seconds, when ever there is a failure,
> ie with network (AWS), it will retry and add it back into the pool.
>
> Ben
>
>
> On 11 October 2011 11:09, CASSANDRA learner 
> wrote:
> > Hi Every One,
> >
> > Actually I was using cassandra long time back and when i tried today, I
> am
> > getting a problem from eclipse. When i am trying to run a basic hector
> > (java) example, I am getting an exception
> > me.prettyprint.hector.api.exceptions.HectorException: All host pools
> marked
> > down. Retry burden pushed out to client. . But My server is up. Node tool
> > also whows that it is up. I donno what happens..
> >
> > 1.)Is it any thing to do with JMX port.
> > 2.) What is the storage port in casandra.yaml and jmx port in
> > cassandra-env.sh
> >
> >
> >
>
> ** **
>


Re: supercolumns vs. prefixing columns of same data type?

2011-10-13 Thread Dean Hiller
great video, thanks!

On Thu, Oct 13, 2011 at 7:45 AM, hani elabed  wrote:

> Hi Dean,
> I don't have have an answer to your question, but just in case you haven't
> seen this screencast by Ed Anuff on Cassandra Indexes, it helped me a lot.
> http://blip.tv/datastax/indexing-in-cassandra-5495633
>
> Hani
>
>
> On Wed, Oct 12, 2011 at 12:18 PM, Dean Hiller  wrote:
>
>> I heard cassandra may be going the direction of removing super column and
>> users are starting to just use prefixes in front of the column.
>>
>> The reason I ask is I was going the way of only using supercolumns and
>> then many tables were fixed with just one supercolumn per row as the
>> structure for that table was simplethis kept the api we have on top of
>> Hector extremely simple not having to deal with columns vs. supercolumns.
>> What are people's thoughts on this?
>>
>> Dealing in columnfamilies where some have supercolumns and some don't I
>> think personally is a painful way to go.going with just one way and
>> sticking with it sure makes the apis easier and it's much easier to apply
>> AOP type stuff to that ONE insert method rather than having two insert
>> methods.  So what is the direction of casssandra project and the
>> recommendation?
>>
>> thanks,
>> Dean
>>
>
>


Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
It wouldn't be unencrypted... which is the point

you use a one way linear hash function to take the first, say 8 bytes,
of unencrypted data and turn it into 4 bytes of a sort prefix.

You've used lost half the data in the process, so effectively each bit
is an OR of two bits and you can only infer from 0 values... so data
is still encrypted, but you have an approximate sorting.

For example, if your data is US-ASCII text with no numbers, you could
use Soundex to get the pre-key, so that worst case you have a bucket
of values in the range.

Using this technique, a random get will have to get the values at the
desired prefix +/- a small amount rather than the whole row... on the
client side you can then decrypt the data and sort that small bucket
to get the correct index position.

You could do a 1 byte prefix, but that only gives you at best 256
buckets and assumes that the first 2 bytes are uniformly
distributed... you've said your data is not uniformly distributed, so
a linear hash function sounds like your best bet.

your hash function should have the property that hash(A) >= hash(B) if
and only if A >= B

On 13 October 2011 08:47, Matthias Pfau  wrote:
> Hi Stephen,
> this is a great idea but unfortunately doesn't work for us either as we can
> not store the data in an unencrypted form.
>
> Kind regards
> Matthias
>
> On 10/12/2011 07:42 PM, Stephen Connolly wrote:
>>
>> could you prefix the data with 3-4 bytes of a linear hash of the
>> unencypted data? it wouldn't be a perfect sort, but you'd have less of a
>> range to query to get the sorted values?
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on
>> the screen
>>
>> On 12 Oct 2011 17:57, "Matthias Pfau" mailto:p...@l3s.de>>
>> wrote:
>>
>>    Unfortunately, that is not an option as we have to store the data in
>>    an compressed and encrypted and therefore binary and non-sortable form.
>>
>>    On 10/12/2011 06:39 PM, David McNelis wrote:
>>
>>        Is it an option to not convert the data to binary prior to
>> inserting
>>        into Cassandra?  Also, how large are the strings you're sorting?
>>          If its
>>        viable to not convert to binary before writing to Cassandra, and
>>        you use
>>        one of the string based column ordering techniques (utf8, ascii,
>> for
>>        example), then the data would be sorted without you  needing to
>>        specifically worry about that.  Of course, if the strings are
>>        lengthy
>>        you could run into  additional issues.
>>
>>        On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau >        
>>        >> wrote:
>>
>>            Hi there,
>>            we are currently building a prototype based on cassandra and
>>        came
>>            into problems on implementing sorted lists containing
>>        millions of items.
>>
>>            The special thing about the items of our lists is, that
>>        cassandra is
>>            not able to sort them as the data is stored in a binary
>>        format which
>>            is not sortable. However, we are able to sort the data
>>        before the
>>            plain data gets encoded (our application is responsible for
>>        the order).
>>
>>            First Approach: Storing Lists in ColumnFamilies
>>            ***
>>            We first tried to map the list to a single row of a
>>        ColumnFamily in
>>            a way that the index of the list is mapped to the column
>>        names and
>>            the items of the list to the column values. The column names
>> are
>>            increasing numbers which define the sort order.
>>            This has the major drawback that big parts of the list have
>>        to be
>>            rewritten on inserts (because the column names are numbered
>>        by their
>>            index), which are quite common.
>>
>>
>>            Second Approach: Storing the whole List as Binary Data:
>>            ***
>>            We tried to store the compressed list in a single column.
>>        However,
>>            this is only feasible for smaller lists. Our lists are far
>>        to big
>>            leading to multi megabyte reads and writes. As we need to
>>        read and
>>            update the lists quite often, this would put our Cassandra
>>        cluster
>>            under a lot of pressure.
>>
>>            Ideal Solution: Native support for storing lists
>>            ***
>>            We would be very happy with a way to store a list of sorted
>>        values
>>            without making improper use of column names for the list
>>        index. This
>>            implies that we would need a possibility to insert values at
>>        defined
>>            positions. We know that this could lead to problems with
>>        concurrent
>>            inserts in a distributed environ

Re: Storing pre-sorted data

2011-10-13 Thread Zach Richardson
Matthias,

Answers below.

On Thu, Oct 13, 2011 at 9:03 AM, Matthias Pfau  wrote:
> Hi Zach,
> thanks for that good idea. Unfortunately, our list needs to be rewritten
> often because our data is far away from being evenly distributed.

This shouldn't be a problem if you use long's.  If you were to space
them at original write (with N objects) at a distance of
Long.MAX_VALUE / N, and N was 10,000,000 you could still fit another
1844674407370 entries in between.

> However, we could get this under control but there is a more severe problem:
> Random access is very hard to implement on a structure with undefined
> distances between two following index numbers. We absolutely need random
> access because the lists are too big to do this on the application side :-(

I'm guessing you need to be able to implement all of the traditional
get(index), set(index), insert(index) type operations on the "list."
Once you start trying to do that, you start to hit all of the same
problems you get with different in memory list implementations based
on which operation is most important.

Could you provide some more information on what operations will be
performed the most, and how important they are.  I think that would
help anyone recommend a path to take.

Zach

> Kind regards
> Matthias
>
> On 10/13/2011 02:30 PM, Zach Richardson wrote:
>>
>> Matthias,
>>
>> This is an interesting problem.
>>
>> I would consider using long's as the column type, where your column
>> names are evenly distributed longs in sort order when you first write
>> your list out.  So if you have items A and C with the long column
>> names 1000 and 2000, and then you have to insert B, it gets inserted
>> at 1500.  Once you run out of room between any two column name
>> entries, i.e 1000, 1001, 1002 entries are all taken at any spot in the
>> list, go ahead and re-write the list.
>>
>> If your unencrypted data is uniformly distributed, you will have very
>> few collisions on your column names and should not have to re-write
>> the list to often.
>>
>> If your lists are small enough, then you could use ints to save space,
>> but will then have to re-write the list more often.
>>
>> Thanks,
>>
>> Zach
>>
>> On Thu, Oct 13, 2011 at 2:47 AM, Matthias Pfau  wrote:
>>>
>>> Hi Stephen,
>>> this is a great idea but unfortunately doesn't work for us either as we
>>> can
>>> not store the data in an unencrypted form.
>>>
>>> Kind regards
>>> Matthias
>>>
>>> On 10/12/2011 07:42 PM, Stephen Connolly wrote:

 could you prefix the data with 3-4 bytes of a linear hash of the
 unencypted data? it wouldn't be a perfect sort, but you'd have less of a
 range to query to get the sorted values?

 - Stephen

 ---
 Sent from my Android phone, so random spelling mistakes, random nonsense
 words and other nonsense are a direct result of using swype to type on
 the screen

 On 12 Oct 2011 17:57, "Matthias Pfau"mailto:p...@l3s.de>>
 wrote:

    Unfortunately, that is not an option as we have to store the data in
    an compressed and encrypted and therefore binary and non-sortable
 form.

    On 10/12/2011 06:39 PM, David McNelis wrote:

        Is it an option to not convert the data to binary prior to
 inserting
        into Cassandra?  Also, how large are the strings you're sorting?
          If its
        viable to not convert to binary before writing to Cassandra, and
        you use
        one of the string based column ordering techniques (utf8, ascii,
 for
        example), then the data would be sorted without you  needing to
        specifically worry about that.  Of course, if the strings are
        lengthy
        you could run into  additional issues.

        On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau>>>        
        >>  wrote:

            Hi there,
            we are currently building a prototype based on cassandra and
        came
            into problems on implementing sorted lists containing
        millions of items.

            The special thing about the items of our lists is, that
        cassandra is
            not able to sort them as the data is stored in a binary
        format which
            is not sortable. However, we are able to sort the data
        before the
            plain data gets encoded (our application is responsible for
        the order).

            First Approach: Storing Lists in ColumnFamilies
            ***
            We first tried to map the list to a single row of a
        ColumnFamily in
            a way that the index of the list is mapped to the column
        names and
            the items of the list to the column values. The column names
 are
            increasing numbers which define the sort order

Re: [Solved] column index offset miscalculation

2011-10-13 Thread Thomas Richter
Thanks for the hint.

Ticket created: https://issues.apache.org/jira/browse/CASSANDRA-3358

Best,

Thomas

On 10/13/2011 03:27 PM, Sylvain Lebresne wrote:
> JIRA is not read-only, you should be able to create a ticket at
> https://issues.apache.org/jira/browse/CASSANDRA, though
> that probably require that you create an account.
> 
> --
> Sylvain
> 
> On Thu, Oct 13, 2011 at 3:20 PM, Thomas Richter  wrote:
>> Hi Aaron,
>>
>> the fix does the trick. I wonder why nobody else ran into this before...
>> I checked org/apache/cassandra/db/ColumnIndexer.java in 0.7.9, 0.8.7 and
>> 1.0.0-rc2 and all seem to be affected.
>>
>> Looks like public Jira is readonly - so I'm not sure about how to continue.
>>
>> Best,
>>
>> Thomas
>>
>> On 10/13/2011 10:52 AM, Thomas Richter wrote:
>>> Hi Aaron,
>>>
>>> I guess i found it :-).
>>>
>>> I added logging for the used IndexInfo to
>>> SSTableNamesIterator.readIndexedColumns and got negative index postions
>>> for the missing columns. This is the reason why the columns are not
>>> loaded from sstable.
>>>
>>> So I had a look at ColumnIndexer.serializeInternal and there it is:
>>>
>>> int endPosition = 0, startPosition = -1;
>>>
>>> Should be:
>>>
>>> long endPosition = 0, startPosition = -1;
>>>
>>> I'm currently running a compaction with a fixed version to verify.
>>>
>>> Best,
>>>
>>> Thomas
>>>
>>> On 10/12/2011 11:54 PM, aaron morton wrote:
 Sounds a lot like the column is deleted.

 IIRC this is where the columns from various SSTables are reduced
 https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/db/filter/QueryFilter.java#L117

 The call to ColumnFamily.addColumn() is where the column instance may be 
 merged with other instances.

 A

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 13/10/2011, at 5:33 AM, Thomas Richter wrote:

> Hi Aaron,
>
> I cannot read the column with a slice query.
> The slice query only returns data till a certain column and after that i
> only get empty results.
>
> I added log output to QueryFilter.isRelevant to see if the filter is
> dropping the column(s) but it doesn't even show up there.
>
> Next thing i will check check is the diff between columns contained in
> json export and columns fetched with the slice query, maybe this gives
> more clue...
>
> Any other ideas where to place more debugging output to see what's
> happening?
>
> Best,
>
> Thomas
>
> On 10/11/2011 12:46 PM, aaron morton wrote:
>> kewl,
>>
>>> * Row is not deleted (other columns can be read, row survives compaction
>>> with GCGraceSeconds=0)
>>
>> IIRC row tombstones can hang around for a while (until gc grace has 
>> passed), and they only have an effect on columns that have a lower 
>> timstamp. So it's possible to read columns from a row with a tombstone.
>>
>> Can you read the column using a slice range rather than specifying it's 
>> name ?
>>
>> Aaron
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 11/10/2011, at 11:15 PM, Thomas Richter wrote:
>>
>>> Hi Aaron,
>>>
>>> i invalidated the caches but nothing changed. I didn't get the mentioned
>>> log line either, but as I read the code SliceByNamesReadCommand uses
>>> NamesQueryFilter and not SliceQueryFilter.
>>>
>>> Next, there is only one SSTable.
>>>
>>> I can rule out that the row is deleted because I deleted all other rows
>>> in that CF to reduce data size and speed up testing. I set
>>> GCGraceSeconds to zero and ran a compaction. All other rows are gone,
>>> but i can still access at least one column from the left row.
>>> So as far as I understand it, there should not be a tombstone on row 
>>> level.
>>>
>>> To make it a list:
>>>
>>> * One SSTable, one row
>>> *
>>> * Row is not deleted (other columns can be read, row survives compaction
>>> with GCGraceSeconds=0)
>>> * Most columns can be read by get['row']['col'] from cassandra-cli
>>> * Some columns can not be read by get['row']['col'] from cassandra-cli
>>> but can be found in output of sstable2json
>>> * unreadable data survives compaction with GCGraceSeconds=0 (checked
>>> with sstable2json)
>>> * Invalidation caches does not help
>>> * Nothing in the logs
>>>
>>> Does that point into any direction where i should look next?
>>>
>>> Best,
>>>
>>> Thomas
>>>
>>> On 10/11/2011 10:30 AM, aaron morton wrote:
 Nothing jumps out. The obvious answer is that t

Re: Schema versions reflect schemas on unwanted nodes

2011-10-13 Thread Brandon Williams
You're running into https://issues.apache.org/jira/browse/CASSANDRA-3259

Try upgrading and doing a rolling restart.

-Brandon

On Thu, Oct 13, 2011 at 9:11 AM, Eric Czech  wrote:
> Nope, there was definitely no intersection of the seed nodes between the two
> clusters so I'm fairly certain that the second cluster found out about the
> first through what was in the LocationInfo* system tables.  Also, I don't
> think that procedure will really help because I don't actually want the
> schema on cass-analysis-1 to be consistent with the schema in the original
> cluster -- I just want to totally remove it.
>
> On Thu, Oct 13, 2011 at 8:01 AM, Mohit Anchlia 
> wrote:
>>
>> Do you have same seed node specified in cass-analysis-1 as cass-1,2,3?
>> I am thinking that changing the seed node in cass-analysis-2 and
>> following the directions in
>> http://wiki.apache.org/cassandra/FAQ#schema_disagreement might solve
>> the problem. Somone please correct me.
>>
>> On Thu, Oct 13, 2011 at 12:05 AM, Eric Czech 
>> wrote:
>> > I don't think that's what I'm after here since the unwanted nodes were
>> > originally assimilated into the cluster with the same initial_token
>> > values
>> > as other nodes that were already in the cluster (that have, and still do
>> > have, useful data).  I know this is an awkward situation so I'll try to
>> > depict it in a simpler way:
>> > Let's say I have a simplified version of our production cluster that
>> > looks
>> > like this -
>> > cass-1   token = A
>> > cass-2   token = B
>> > cass-3   token = C
>> > Then I tried to create a second cluster that looks like this -
>> > cass-analysis-1   token = A  (and contains same data as cass-1)
>> > cass-analysis-2   token = B  (and contains same data as cass-2)
>> > cass-analysis-3   token = C  (and contains same data as cass-3)
>> > But after starting the second cluster, things got crossed up between the
>> > clusters and here's what the original cluster now looks like -
>> > cass-1   token = A   (has data and schema)
>> > cass-2   token = B   (has data and schema)
>> > cass-3   token = C   (had data and schema)
>> > cass-analysis-1   token = A  (has *no* data and is not part of the ring,
>> > but
>> > is trying to be included in cluster schema)
>> > A simplified version of "describe cluster"  for the original cluster now
>> > shows:
>> > Cluster Information:
>> >    Schema versions:
>> > SCHEMA-UUID-1: [cass-1, cass-2, cass-3]
>> > SCHEMA-UUID-2: [cass-analysis-1]
>> > But the simplified ring looks like this (has only 3 nodes instead of 4):
>> > Host       Owns     Token
>> > cass-1     33%       A
>> > cass-2     33%       B
>> > cass-3     33%       C
>> > The original cluster is still working correctly but all live schema
>> > updates
>> > are failing because of the inconsistent schema versions introduced by
>> > the
>> > unwanted node.
>> > From my perspective, a simple fix seems to be for cassandra to exclude
>> > nodes
>> > that aren't part of the ring from the schema consistency requirements.
>> >  Any
>> > reason that wouldn't work?
>> > And aside from a possible code patch, any recommendations as to how I
>> > can
>> > best fix this given the current 8.4 release?
>> >
>> > On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> Does nodetool removetoken not work?
>> >>
>> >> On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech 
>> >> wrote:
>> >> > Not sure if anyone has seen this before but it's really killing me
>> >> > right
>> >> > now.  Perhaps that was too long of a description of the issue so
>> >> > here's
>> >> > a
>> >> > more succinct question -- How do I remove nodes associated with a
>> >> > cluster
>> >> > that contain no data and have no reason to be associated with the
>> >> > cluster
>> >> > whatsoever?
>> >> > My last resort here is to stop cassandra (after recording all tokens
>> >> > for
>> >> > each node), set the initial token for each node in the cluster in
>> >> > cassandra.yaml, manually delete the LocationInfo* sstables in the
>> >> > system
>> >> > keyspace, and then restart.  I'm hoping there's a simpler, less
>> >> > seemingly
>> >> > risky way to do this so please, please let me know if that's true!
>> >> > Thanks again.
>> >> > - Eric
>> >> > On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech 
>> >> > wrote:
>> >> >>
>> >> >> Hi, I'm having what I think is a fairly uncommon schema issue --
>> >> >> My situation is that I had a cluster with 10 nodes and a consistent
>> >> >> schema.  Then, in an experiment to setup a second cluster with the
>> >> >> same
>> >> >> information (by copying the raw sstables), I left the LocationInfo*
>> >> >> sstables
>> >> >> in the system keyspace in the new cluster and after starting the
>> >> >> second
>> >> >> cluster, I realized that the two clusters were discovering each
>> >> >> other
>> >> >> when
>> >> >> they shouldn't have been.  Since then, I changed the cluster name
>> >> >> for
>> >> >> the
>> >> >> second cluster and made sure to delete the LocationInfo* s

Re: Schema versions reflect schemas on unwanted nodes

2011-10-13 Thread Eric Czech
Nope, there was definitely no intersection of the seed nodes between the two
clusters so I'm fairly certain that the second cluster found out about the
first through what was in the LocationInfo* system tables.  Also, I don't
think that procedure will really help because I don't actually want the
schema on cass-analysis-1 to be consistent with the schema in the original
cluster -- I just want to totally remove it.

On Thu, Oct 13, 2011 at 8:01 AM, Mohit Anchlia wrote:

> Do you have same seed node specified in cass-analysis-1 as cass-1,2,3?
> I am thinking that changing the seed node in cass-analysis-2 and
> following the directions in
> http://wiki.apache.org/cassandra/FAQ#schema_disagreement might solve
> the problem. Somone please correct me.
>
> On Thu, Oct 13, 2011 at 12:05 AM, Eric Czech 
> wrote:
> > I don't think that's what I'm after here since the unwanted nodes were
> > originally assimilated into the cluster with the same initial_token
> values
> > as other nodes that were already in the cluster (that have, and still do
> > have, useful data).  I know this is an awkward situation so I'll try to
> > depict it in a simpler way:
> > Let's say I have a simplified version of our production cluster that
> looks
> > like this -
> > cass-1   token = A
> > cass-2   token = B
> > cass-3   token = C
> > Then I tried to create a second cluster that looks like this -
> > cass-analysis-1   token = A  (and contains same data as cass-1)
> > cass-analysis-2   token = B  (and contains same data as cass-2)
> > cass-analysis-3   token = C  (and contains same data as cass-3)
> > But after starting the second cluster, things got crossed up between the
> > clusters and here's what the original cluster now looks like -
> > cass-1   token = A   (has data and schema)
> > cass-2   token = B   (has data and schema)
> > cass-3   token = C   (had data and schema)
> > cass-analysis-1   token = A  (has *no* data and is not part of the ring,
> but
> > is trying to be included in cluster schema)
> > A simplified version of "describe cluster"  for the original cluster now
> > shows:
> > Cluster Information:
> >Schema versions:
> > SCHEMA-UUID-1: [cass-1, cass-2, cass-3]
> > SCHEMA-UUID-2: [cass-analysis-1]
> > But the simplified ring looks like this (has only 3 nodes instead of 4):
> > Host   Owns Token
> > cass-1 33%   A
> > cass-2 33%   B
> > cass-3 33%   C
> > The original cluster is still working correctly but all live schema
> updates
> > are failing because of the inconsistent schema versions introduced by the
> > unwanted node.
> > From my perspective, a simple fix seems to be for cassandra to exclude
> nodes
> > that aren't part of the ring from the schema consistency requirements.
>  Any
> > reason that wouldn't work?
> > And aside from a possible code patch, any recommendations as to how I can
> > best fix this given the current 8.4 release?
> >
> > On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis 
> wrote:
> >>
> >> Does nodetool removetoken not work?
> >>
> >> On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech 
> >> wrote:
> >> > Not sure if anyone has seen this before but it's really killing me
> right
> >> > now.  Perhaps that was too long of a description of the issue so
> here's
> >> > a
> >> > more succinct question -- How do I remove nodes associated with a
> >> > cluster
> >> > that contain no data and have no reason to be associated with the
> >> > cluster
> >> > whatsoever?
> >> > My last resort here is to stop cassandra (after recording all tokens
> for
> >> > each node), set the initial token for each node in the cluster in
> >> > cassandra.yaml, manually delete the LocationInfo* sstables in the
> system
> >> > keyspace, and then restart.  I'm hoping there's a simpler, less
> >> > seemingly
> >> > risky way to do this so please, please let me know if that's true!
> >> > Thanks again.
> >> > - Eric
> >> > On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech 
> >> > wrote:
> >> >>
> >> >> Hi, I'm having what I think is a fairly uncommon schema issue --
> >> >> My situation is that I had a cluster with 10 nodes and a consistent
> >> >> schema.  Then, in an experiment to setup a second cluster with the
> same
> >> >> information (by copying the raw sstables), I left the LocationInfo*
> >> >> sstables
> >> >> in the system keyspace in the new cluster and after starting the
> second
> >> >> cluster, I realized that the two clusters were discovering each other
> >> >> when
> >> >> they shouldn't have been.  Since then, I changed the cluster name for
> >> >> the
> >> >> second cluster and made sure to delete the LocationInfo* sstables
> >> >> before
> >> >> starting it and the two clusters are now operating independent of one
> >> >> another for the most part.  The only remaining connection between the
> >> >> two
> >> >> seems to be that the first cluster is still maintaining references to
> >> >> nodes
> >> >> in the second cluster in the schema versions despite those nodes not
> >> >> actu

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau

Hi Zach,
thanks for that good idea. Unfortunately, our list needs to be rewritten 
often because our data is far away from being evenly distributed.


However, we could get this under control but there is a more severe 
problem: Random access is very hard to implement on a structure with 
undefined distances between two following index numbers. We absolutely 
need random access because the lists are too big to do this on the 
application side :-(


Kind regards
Matthias

On 10/13/2011 02:30 PM, Zach Richardson wrote:

Matthias,

This is an interesting problem.

I would consider using long's as the column type, where your column
names are evenly distributed longs in sort order when you first write
your list out.  So if you have items A and C with the long column
names 1000 and 2000, and then you have to insert B, it gets inserted
at 1500.  Once you run out of room between any two column name
entries, i.e 1000, 1001, 1002 entries are all taken at any spot in the
list, go ahead and re-write the list.

If your unencrypted data is uniformly distributed, you will have very
few collisions on your column names and should not have to re-write
the list to often.

If your lists are small enough, then you could use ints to save space,
but will then have to re-write the list more often.

Thanks,

Zach

On Thu, Oct 13, 2011 at 2:47 AM, Matthias Pfau  wrote:

Hi Stephen,
this is a great idea but unfortunately doesn't work for us either as we can
not store the data in an unencrypted form.

Kind regards
Matthias

On 10/12/2011 07:42 PM, Stephen Connolly wrote:


could you prefix the data with 3-4 bytes of a linear hash of the
unencypted data? it wouldn't be a perfect sort, but you'd have less of a
range to query to get the sorted values?

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on
the screen

On 12 Oct 2011 17:57, "Matthias Pfau"mailto:p...@l3s.de>>
wrote:

Unfortunately, that is not an option as we have to store the data in
an compressed and encrypted and therefore binary and non-sortable form.

On 10/12/2011 06:39 PM, David McNelis wrote:

Is it an option to not convert the data to binary prior to
inserting
into Cassandra?  Also, how large are the strings you're sorting?
  If its
viable to not convert to binary before writing to Cassandra, and
you use
one of the string based column ordering techniques (utf8, ascii,
for
example), then the data would be sorted without you  needing to
specifically worry about that.  Of course, if the strings are
lengthy
you could run into  additional issues.

On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfaumailto:p...@l3s.de>
>>  wrote:

Hi there,
we are currently building a prototype based on cassandra and
came
into problems on implementing sorted lists containing
millions of items.

The special thing about the items of our lists is, that
cassandra is
not able to sort them as the data is stored in a binary
format which
is not sortable. However, we are able to sort the data
before the
plain data gets encoded (our application is responsible for
the order).

First Approach: Storing Lists in ColumnFamilies
***
We first tried to map the list to a single row of a
ColumnFamily in
a way that the index of the list is mapped to the column
names and
the items of the list to the column values. The column names
are
increasing numbers which define the sort order.
This has the major drawback that big parts of the list have
to be
rewritten on inserts (because the column names are numbered
by their
index), which are quite common.


Second Approach: Storing the whole List as Binary Data:
***
We tried to store the compressed list in a single column.
However,
this is only feasible for smaller lists. Our lists are far
to big
leading to multi megabyte reads and writes. As we need to
read and
update the lists quite often, this would put our Cassandra
cluster
under a lot of pressure.

Ideal Solution: Native support for storing lists
***
We would be very happy with a way to store a list of sorted
values
without making improper use of column names for the list
index. This
implies that we would need a possibility to insert values at
defined
positions. We know that this could lead to problems with
concurrent
inserts in a distributed environment, but this is handled

Re: Schema versions reflect schemas on unwanted nodes

2011-10-13 Thread Mohit Anchlia
Do you have same seed node specified in cass-analysis-1 as cass-1,2,3?
I am thinking that changing the seed node in cass-analysis-2 and
following the directions in
http://wiki.apache.org/cassandra/FAQ#schema_disagreement might solve
the problem. Somone please correct me.

On Thu, Oct 13, 2011 at 12:05 AM, Eric Czech  wrote:
> I don't think that's what I'm after here since the unwanted nodes were
> originally assimilated into the cluster with the same initial_token values
> as other nodes that were already in the cluster (that have, and still do
> have, useful data).  I know this is an awkward situation so I'll try to
> depict it in a simpler way:
> Let's say I have a simplified version of our production cluster that looks
> like this -
> cass-1   token = A
> cass-2   token = B
> cass-3   token = C
> Then I tried to create a second cluster that looks like this -
> cass-analysis-1   token = A  (and contains same data as cass-1)
> cass-analysis-2   token = B  (and contains same data as cass-2)
> cass-analysis-3   token = C  (and contains same data as cass-3)
> But after starting the second cluster, things got crossed up between the
> clusters and here's what the original cluster now looks like -
> cass-1   token = A   (has data and schema)
> cass-2   token = B   (has data and schema)
> cass-3   token = C   (had data and schema)
> cass-analysis-1   token = A  (has *no* data and is not part of the ring, but
> is trying to be included in cluster schema)
> A simplified version of "describe cluster"  for the original cluster now
> shows:
> Cluster Information:
>    Schema versions:
> SCHEMA-UUID-1: [cass-1, cass-2, cass-3]
> SCHEMA-UUID-2: [cass-analysis-1]
> But the simplified ring looks like this (has only 3 nodes instead of 4):
> Host       Owns     Token
> cass-1     33%       A
> cass-2     33%       B
> cass-3     33%       C
> The original cluster is still working correctly but all live schema updates
> are failing because of the inconsistent schema versions introduced by the
> unwanted node.
> From my perspective, a simple fix seems to be for cassandra to exclude nodes
> that aren't part of the ring from the schema consistency requirements.  Any
> reason that wouldn't work?
> And aside from a possible code patch, any recommendations as to how I can
> best fix this given the current 8.4 release?
>
> On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis  wrote:
>>
>> Does nodetool removetoken not work?
>>
>> On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech 
>> wrote:
>> > Not sure if anyone has seen this before but it's really killing me right
>> > now.  Perhaps that was too long of a description of the issue so here's
>> > a
>> > more succinct question -- How do I remove nodes associated with a
>> > cluster
>> > that contain no data and have no reason to be associated with the
>> > cluster
>> > whatsoever?
>> > My last resort here is to stop cassandra (after recording all tokens for
>> > each node), set the initial token for each node in the cluster in
>> > cassandra.yaml, manually delete the LocationInfo* sstables in the system
>> > keyspace, and then restart.  I'm hoping there's a simpler, less
>> > seemingly
>> > risky way to do this so please, please let me know if that's true!
>> > Thanks again.
>> > - Eric
>> > On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech 
>> > wrote:
>> >>
>> >> Hi, I'm having what I think is a fairly uncommon schema issue --
>> >> My situation is that I had a cluster with 10 nodes and a consistent
>> >> schema.  Then, in an experiment to setup a second cluster with the same
>> >> information (by copying the raw sstables), I left the LocationInfo*
>> >> sstables
>> >> in the system keyspace in the new cluster and after starting the second
>> >> cluster, I realized that the two clusters were discovering each other
>> >> when
>> >> they shouldn't have been.  Since then, I changed the cluster name for
>> >> the
>> >> second cluster and made sure to delete the LocationInfo* sstables
>> >> before
>> >> starting it and the two clusters are now operating independent of one
>> >> another for the most part.  The only remaining connection between the
>> >> two
>> >> seems to be that the first cluster is still maintaining references to
>> >> nodes
>> >> in the second cluster in the schema versions despite those nodes not
>> >> actually being part of the ring.
>> >> Here's what my "describe cluster" looks like on the original cluster:
>> >> Cluster Information:
>> >>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>> >>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>> >>    Schema versions:
>> >> 48971cb0-e9ff-11e0--eb9eab7d90bf: [,
>> >> , ..., ]
>> >> 848bcfc0-eddf-11e0--8a3bb58f08ff: [,
>> >> ]
>> >> The second cluster, however, contains no schema versions involving
>> >> nodes
>> >> from the first cluster.
>> >> My question then is, how can I remove those schema versions from the
>> >> original cluster that are associated with the unwanted nodes from the
>> >> second
>> >> c

Re: supercolumns vs. prefixing columns of same data type?

2011-10-13 Thread hani elabed
Hi Dean,
I don't have have an answer to your question, but just in case you haven't
seen this screencast by Ed Anuff on Cassandra Indexes, it helped me a lot.
http://blip.tv/datastax/indexing-in-cassandra-5495633

Hani

On Wed, Oct 12, 2011 at 12:18 PM, Dean Hiller  wrote:

> I heard cassandra may be going the direction of removing super column and
> users are starting to just use prefixes in front of the column.
>
> The reason I ask is I was going the way of only using supercolumns and then
> many tables were fixed with just one supercolumn per row as the structure
> for that table was simplethis kept the api we have on top of Hector
> extremely simple not having to deal with columns vs. supercolumns.  What are
> people's thoughts on this?
>
> Dealing in columnfamilies where some have supercolumns and some don't I
> think personally is a painful way to go.going with just one way and
> sticking with it sure makes the apis easier and it's much easier to apply
> AOP type stuff to that ONE insert method rather than having two insert
> methods.  So what is the direction of casssandra project and the
> recommendation?
>
> thanks,
> Dean
>


Re: [Solved] column index offset miscalculation (was: Existing column(s) not readable)

2011-10-13 Thread Sylvain Lebresne
JIRA is not read-only, you should be able to create a ticket at
https://issues.apache.org/jira/browse/CASSANDRA, though
that probably require that you create an account.

--
Sylvain

On Thu, Oct 13, 2011 at 3:20 PM, Thomas Richter  wrote:
> Hi Aaron,
>
> the fix does the trick. I wonder why nobody else ran into this before...
> I checked org/apache/cassandra/db/ColumnIndexer.java in 0.7.9, 0.8.7 and
> 1.0.0-rc2 and all seem to be affected.
>
> Looks like public Jira is readonly - so I'm not sure about how to continue.
>
> Best,
>
> Thomas
>
> On 10/13/2011 10:52 AM, Thomas Richter wrote:
>> Hi Aaron,
>>
>> I guess i found it :-).
>>
>> I added logging for the used IndexInfo to
>> SSTableNamesIterator.readIndexedColumns and got negative index postions
>> for the missing columns. This is the reason why the columns are not
>> loaded from sstable.
>>
>> So I had a look at ColumnIndexer.serializeInternal and there it is:
>>
>> int endPosition = 0, startPosition = -1;
>>
>> Should be:
>>
>> long endPosition = 0, startPosition = -1;
>>
>> I'm currently running a compaction with a fixed version to verify.
>>
>> Best,
>>
>> Thomas
>>
>> On 10/12/2011 11:54 PM, aaron morton wrote:
>>> Sounds a lot like the column is deleted.
>>>
>>> IIRC this is where the columns from various SSTables are reduced
>>> https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/db/filter/QueryFilter.java#L117
>>>
>>> The call to ColumnFamily.addColumn() is where the column instance may be 
>>> merged with other instances.
>>>
>>> A
>>>
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 13/10/2011, at 5:33 AM, Thomas Richter wrote:
>>>
 Hi Aaron,

 I cannot read the column with a slice query.
 The slice query only returns data till a certain column and after that i
 only get empty results.

 I added log output to QueryFilter.isRelevant to see if the filter is
 dropping the column(s) but it doesn't even show up there.

 Next thing i will check check is the diff between columns contained in
 json export and columns fetched with the slice query, maybe this gives
 more clue...

 Any other ideas where to place more debugging output to see what's
 happening?

 Best,

 Thomas

 On 10/11/2011 12:46 PM, aaron morton wrote:
> kewl,
>
>> * Row is not deleted (other columns can be read, row survives compaction
>> with GCGraceSeconds=0)
>
> IIRC row tombstones can hang around for a while (until gc grace has 
> passed), and they only have an effect on columns that have a lower 
> timstamp. So it's possible to read columns from a row with a tombstone.
>
> Can you read the column using a slice range rather than specifying it's 
> name ?
>
> Aaron
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/10/2011, at 11:15 PM, Thomas Richter wrote:
>
>> Hi Aaron,
>>
>> i invalidated the caches but nothing changed. I didn't get the mentioned
>> log line either, but as I read the code SliceByNamesReadCommand uses
>> NamesQueryFilter and not SliceQueryFilter.
>>
>> Next, there is only one SSTable.
>>
>> I can rule out that the row is deleted because I deleted all other rows
>> in that CF to reduce data size and speed up testing. I set
>> GCGraceSeconds to zero and ran a compaction. All other rows are gone,
>> but i can still access at least one column from the left row.
>> So as far as I understand it, there should not be a tombstone on row 
>> level.
>>
>> To make it a list:
>>
>> * One SSTable, one row
>> *
>> * Row is not deleted (other columns can be read, row survives compaction
>> with GCGraceSeconds=0)
>> * Most columns can be read by get['row']['col'] from cassandra-cli
>> * Some columns can not be read by get['row']['col'] from cassandra-cli
>> but can be found in output of sstable2json
>> * unreadable data survives compaction with GCGraceSeconds=0 (checked
>> with sstable2json)
>> * Invalidation caches does not help
>> * Nothing in the logs
>>
>> Does that point into any direction where i should look next?
>>
>> Best,
>>
>> Thomas
>>
>> On 10/11/2011 10:30 AM, aaron morton wrote:
>>> Nothing jumps out. The obvious answer is that the column has been 
>>> deleted. Did you check all the SSTables ?
>>>
>>> It looks like query returned from row cache, otherwise you would see 
>>> this as well…
>>>
>>> DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java 
>>> (line 123) collecting 0 of 2147483647: 
>>

[Solved] column index offset miscalculation (was: Existing column(s) not readable)

2011-10-13 Thread Thomas Richter
Hi Aaron,

the fix does the trick. I wonder why nobody else ran into this before...
I checked org/apache/cassandra/db/ColumnIndexer.java in 0.7.9, 0.8.7 and
1.0.0-rc2 and all seem to be affected.

Looks like public Jira is readonly - so I'm not sure about how to continue.

Best,

Thomas

On 10/13/2011 10:52 AM, Thomas Richter wrote:
> Hi Aaron,
> 
> I guess i found it :-).
> 
> I added logging for the used IndexInfo to
> SSTableNamesIterator.readIndexedColumns and got negative index postions
> for the missing columns. This is the reason why the columns are not
> loaded from sstable.
> 
> So I had a look at ColumnIndexer.serializeInternal and there it is:
> 
> int endPosition = 0, startPosition = -1;
> 
> Should be:
> 
> long endPosition = 0, startPosition = -1;
> 
> I'm currently running a compaction with a fixed version to verify.
> 
> Best,
> 
> Thomas
> 
> On 10/12/2011 11:54 PM, aaron morton wrote:
>> Sounds a lot like the column is deleted. 
>>
>> IIRC this is where the columns from various SSTables are reduced
>> https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/db/filter/QueryFilter.java#L117
>>
>> The call to ColumnFamily.addColumn() is where the column instance may be 
>> merged with other instances. 
>>
>> A 
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 13/10/2011, at 5:33 AM, Thomas Richter wrote:
>>
>>> Hi Aaron,
>>>
>>> I cannot read the column with a slice query.
>>> The slice query only returns data till a certain column and after that i
>>> only get empty results.
>>>
>>> I added log output to QueryFilter.isRelevant to see if the filter is
>>> dropping the column(s) but it doesn't even show up there.
>>>
>>> Next thing i will check check is the diff between columns contained in
>>> json export and columns fetched with the slice query, maybe this gives
>>> more clue...
>>>
>>> Any other ideas where to place more debugging output to see what's
>>> happening?
>>>
>>> Best,
>>>
>>> Thomas
>>>
>>> On 10/11/2011 12:46 PM, aaron morton wrote:
 kewl, 

> * Row is not deleted (other columns can be read, row survives compaction
> with GCGraceSeconds=0)

 IIRC row tombstones can hang around for a while (until gc grace has 
 passed), and they only have an effect on columns that have a lower 
 timstamp. So it's possible to read columns from a row with a tombstone. 

 Can you read the column using a slice range rather than specifying it's 
 name ? 

 Aaron

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 11/10/2011, at 11:15 PM, Thomas Richter wrote:

> Hi Aaron,
>
> i invalidated the caches but nothing changed. I didn't get the mentioned
> log line either, but as I read the code SliceByNamesReadCommand uses
> NamesQueryFilter and not SliceQueryFilter.
>
> Next, there is only one SSTable.
>
> I can rule out that the row is deleted because I deleted all other rows
> in that CF to reduce data size and speed up testing. I set
> GCGraceSeconds to zero and ran a compaction. All other rows are gone,
> but i can still access at least one column from the left row.
> So as far as I understand it, there should not be a tombstone on row 
> level.
>
> To make it a list:
>
> * One SSTable, one row
> *
> * Row is not deleted (other columns can be read, row survives compaction
> with GCGraceSeconds=0)
> * Most columns can be read by get['row']['col'] from cassandra-cli
> * Some columns can not be read by get['row']['col'] from cassandra-cli
> but can be found in output of sstable2json
> * unreadable data survives compaction with GCGraceSeconds=0 (checked
> with sstable2json)
> * Invalidation caches does not help
> * Nothing in the logs
>
> Does that point into any direction where i should look next?
>
> Best,
>
> Thomas
>
> On 10/11/2011 10:30 AM, aaron morton wrote:
>> Nothing jumps out. The obvious answer is that the column has been 
>> deleted. Did you check all the SSTables ?
>>
>> It looks like query returned from row cache, otherwise you would see 
>> this as well…
>>
>> DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 
>> 123) collecting 0 of 2147483647: 
>> 1318294191654059:false:354@1318294191654861
>>
>> Which would mean a version of the column was found. 
>>
>> If you invalidate the cache with nodetool and run the query and the log 
>> message appears it will mean the column was read from (all of the) 
>> sstables. If you do not get a column returned I would say there is a 
>> tombstone in plac

Re: Storing pre-sorted data

2011-10-13 Thread Zach Richardson
Matthias,

This is an interesting problem.

I would consider using long's as the column type, where your column
names are evenly distributed longs in sort order when you first write
your list out.  So if you have items A and C with the long column
names 1000 and 2000, and then you have to insert B, it gets inserted
at 1500.  Once you run out of room between any two column name
entries, i.e 1000, 1001, 1002 entries are all taken at any spot in the
list, go ahead and re-write the list.

If your unencrypted data is uniformly distributed, you will have very
few collisions on your column names and should not have to re-write
the list to often.

If your lists are small enough, then you could use ints to save space,
but will then have to re-write the list more often.

Thanks,

Zach

On Thu, Oct 13, 2011 at 2:47 AM, Matthias Pfau  wrote:
> Hi Stephen,
> this is a great idea but unfortunately doesn't work for us either as we can
> not store the data in an unencrypted form.
>
> Kind regards
> Matthias
>
> On 10/12/2011 07:42 PM, Stephen Connolly wrote:
>>
>> could you prefix the data with 3-4 bytes of a linear hash of the
>> unencypted data? it wouldn't be a perfect sort, but you'd have less of a
>> range to query to get the sorted values?
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on
>> the screen
>>
>> On 12 Oct 2011 17:57, "Matthias Pfau" mailto:p...@l3s.de>>
>> wrote:
>>
>>    Unfortunately, that is not an option as we have to store the data in
>>    an compressed and encrypted and therefore binary and non-sortable form.
>>
>>    On 10/12/2011 06:39 PM, David McNelis wrote:
>>
>>        Is it an option to not convert the data to binary prior to
>> inserting
>>        into Cassandra?  Also, how large are the strings you're sorting?
>>          If its
>>        viable to not convert to binary before writing to Cassandra, and
>>        you use
>>        one of the string based column ordering techniques (utf8, ascii,
>> for
>>        example), then the data would be sorted without you  needing to
>>        specifically worry about that.  Of course, if the strings are
>>        lengthy
>>        you could run into  additional issues.
>>
>>        On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau >        
>>        >> wrote:
>>
>>            Hi there,
>>            we are currently building a prototype based on cassandra and
>>        came
>>            into problems on implementing sorted lists containing
>>        millions of items.
>>
>>            The special thing about the items of our lists is, that
>>        cassandra is
>>            not able to sort them as the data is stored in a binary
>>        format which
>>            is not sortable. However, we are able to sort the data
>>        before the
>>            plain data gets encoded (our application is responsible for
>>        the order).
>>
>>            First Approach: Storing Lists in ColumnFamilies
>>            ***
>>            We first tried to map the list to a single row of a
>>        ColumnFamily in
>>            a way that the index of the list is mapped to the column
>>        names and
>>            the items of the list to the column values. The column names
>> are
>>            increasing numbers which define the sort order.
>>            This has the major drawback that big parts of the list have
>>        to be
>>            rewritten on inserts (because the column names are numbered
>>        by their
>>            index), which are quite common.
>>
>>
>>            Second Approach: Storing the whole List as Binary Data:
>>            ***
>>            We tried to store the compressed list in a single column.
>>        However,
>>            this is only feasible for smaller lists. Our lists are far
>>        to big
>>            leading to multi megabyte reads and writes. As we need to
>>        read and
>>            update the lists quite often, this would put our Cassandra
>>        cluster
>>            under a lot of pressure.
>>
>>            Ideal Solution: Native support for storing lists
>>            ***
>>            We would be very happy with a way to store a list of sorted
>>        values
>>            without making improper use of column names for the list
>>        index. This
>>            implies that we would need a possibility to insert values at
>>        defined
>>            positions. We know that this could lead to problems with
>>        concurrent
>>            inserts in a distributed environment, but this is handled by
>> our
>>            application logic.
>>
>>
>>            What are your ideas on that?
>>
>>            Thanks
>>            Matthias
>>
>>
>>
>>
>>        --
>>        *David McNelis*
>>        Lead Software Engineer
>>        Agentis Energy
>>        www.agentisenergy.com 

Re: Cassandra as session store under heavy load

2011-10-13 Thread Maciej Miklas
durable_writes sounds great - thank you! I really do not need commit log
here.

Another question: it is possible to configure live time of Tombstones?


Regards,
Maciej


Re: Existing column(s) not readable

2011-10-13 Thread Thomas Richter
Hi Aaron,

I guess i found it :-).

I added logging for the used IndexInfo to
SSTableNamesIterator.readIndexedColumns and got negative index postions
for the missing columns. This is the reason why the columns are not
loaded from sstable.

So I had a look at ColumnIndexer.serializeInternal and there it is:

int endPosition = 0, startPosition = -1;

Should be:

long endPosition = 0, startPosition = -1;

I'm currently running a compaction with a fixed version to verify.

Best,

Thomas

On 10/12/2011 11:54 PM, aaron morton wrote:
> Sounds a lot like the column is deleted. 
> 
> IIRC this is where the columns from various SSTables are reduced
> https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/db/filter/QueryFilter.java#L117
> 
> The call to ColumnFamily.addColumn() is where the column instance may be 
> merged with other instances. 
> 
> A 
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 13/10/2011, at 5:33 AM, Thomas Richter wrote:
> 
>> Hi Aaron,
>>
>> I cannot read the column with a slice query.
>> The slice query only returns data till a certain column and after that i
>> only get empty results.
>>
>> I added log output to QueryFilter.isRelevant to see if the filter is
>> dropping the column(s) but it doesn't even show up there.
>>
>> Next thing i will check check is the diff between columns contained in
>> json export and columns fetched with the slice query, maybe this gives
>> more clue...
>>
>> Any other ideas where to place more debugging output to see what's
>> happening?
>>
>> Best,
>>
>> Thomas
>>
>> On 10/11/2011 12:46 PM, aaron morton wrote:
>>> kewl, 
>>>
 * Row is not deleted (other columns can be read, row survives compaction
 with GCGraceSeconds=0)
>>>
>>> IIRC row tombstones can hang around for a while (until gc grace has 
>>> passed), and they only have an effect on columns that have a lower 
>>> timstamp. So it's possible to read columns from a row with a tombstone. 
>>>
>>> Can you read the column using a slice range rather than specifying it's 
>>> name ? 
>>>
>>> Aaron
>>>
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 11/10/2011, at 11:15 PM, Thomas Richter wrote:
>>>
 Hi Aaron,

 i invalidated the caches but nothing changed. I didn't get the mentioned
 log line either, but as I read the code SliceByNamesReadCommand uses
 NamesQueryFilter and not SliceQueryFilter.

 Next, there is only one SSTable.

 I can rule out that the row is deleted because I deleted all other rows
 in that CF to reduce data size and speed up testing. I set
 GCGraceSeconds to zero and ran a compaction. All other rows are gone,
 but i can still access at least one column from the left row.
 So as far as I understand it, there should not be a tombstone on row level.

 To make it a list:

 * One SSTable, one row
 *
 * Row is not deleted (other columns can be read, row survives compaction
 with GCGraceSeconds=0)
 * Most columns can be read by get['row']['col'] from cassandra-cli
 * Some columns can not be read by get['row']['col'] from cassandra-cli
 but can be found in output of sstable2json
 * unreadable data survives compaction with GCGraceSeconds=0 (checked
 with sstable2json)
 * Invalidation caches does not help
 * Nothing in the logs

 Does that point into any direction where i should look next?

 Best,

 Thomas

 On 10/11/2011 10:30 AM, aaron morton wrote:
> Nothing jumps out. The obvious answer is that the column has been 
> deleted. Did you check all the SSTables ?
>
> It looks like query returned from row cache, otherwise you would see this 
> as well…
>
> DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 
> 123) collecting 0 of 2147483647: 
> 1318294191654059:false:354@1318294191654861
>
> Which would mean a version of the column was found. 
>
> If you invalidate the cache with nodetool and run the query and the log 
> message appears it will mean the column was read from (all of the) 
> sstables. If you do not get a column returned I would say there is a 
> tombstone in place. It's either a row level or a column level one.  
>
> Hope that helps. 
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/10/2011, at 10:35 AM, Thomas Richter wrote:
>
>> Hi Aaron,
>>
>> normally we use hector to access cassandra, but for debugging I switched
>> to cassandra-cli.
>>
>> Column can not be read by a simple
>> get CFName['rowkey']['colna

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau

Hi Stephen,
this is a great idea but unfortunately doesn't work for us either as we 
can not store the data in an unencrypted form.


Kind regards
Matthias

On 10/12/2011 07:42 PM, Stephen Connolly wrote:

could you prefix the data with 3-4 bytes of a linear hash of the
unencypted data? it wouldn't be a perfect sort, but you'd have less of a
range to query to get the sorted values?

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on
the screen

On 12 Oct 2011 17:57, "Matthias Pfau" mailto:p...@l3s.de>>
wrote:

Unfortunately, that is not an option as we have to store the data in
an compressed and encrypted and therefore binary and non-sortable form.

On 10/12/2011 06:39 PM, David McNelis wrote:

Is it an option to not convert the data to binary prior to inserting
into Cassandra?  Also, how large are the strings you're sorting?
  If its
viable to not convert to binary before writing to Cassandra, and
you use
one of the string based column ordering techniques (utf8, ascii, for
example), then the data would be sorted without you  needing to
specifically worry about that.  Of course, if the strings are
lengthy
you could run into  additional issues.

On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau mailto:p...@l3s.de>
>> wrote:

Hi there,
we are currently building a prototype based on cassandra and
came
into problems on implementing sorted lists containing
millions of items.

The special thing about the items of our lists is, that
cassandra is
not able to sort them as the data is stored in a binary
format which
is not sortable. However, we are able to sort the data
before the
plain data gets encoded (our application is responsible for
the order).

First Approach: Storing Lists in ColumnFamilies
***
We first tried to map the list to a single row of a
ColumnFamily in
a way that the index of the list is mapped to the column
names and
the items of the list to the column values. The column names are
increasing numbers which define the sort order.
This has the major drawback that big parts of the list have
to be
rewritten on inserts (because the column names are numbered
by their
index), which are quite common.


Second Approach: Storing the whole List as Binary Data:
***
We tried to store the compressed list in a single column.
However,
this is only feasible for smaller lists. Our lists are far
to big
leading to multi megabyte reads and writes. As we need to
read and
update the lists quite often, this would put our Cassandra
cluster
under a lot of pressure.

Ideal Solution: Native support for storing lists
***
We would be very happy with a way to store a list of sorted
values
without making improper use of column names for the list
index. This
implies that we would need a possibility to insert values at
defined
positions. We know that this could lead to problems with
concurrent
inserts in a distributed environment, but this is handled by our
application logic.


What are your ideas on that?

Thanks
Matthias




--
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com 

c: 219.384.5143 

/A Smart Grid technology company focused on helping consumers of
energy
control an often under-managed resource./







Re: Schema versions reflect schemas on unwanted nodes

2011-10-13 Thread Eric Czech
I don't think that's what I'm after here since the unwanted nodes were
originally assimilated into the cluster with the same initial_token values
as other nodes that were already in the cluster (that have, and still do
have, useful data).  I know this is an awkward situation so I'll try to
depict it in a simpler way:

Let's say I have a simplified version of our production cluster that looks
like this -

cass-1   token = A
cass-2   token = B
cass-3   token = C

Then I tried to create a second cluster that looks like this -

cass-analysis-1   token = A  (and contains same data as cass-1)
cass-analysis-2   token = B  (and contains same data as cass-2)
cass-analysis-3   token = C  (and contains same data as cass-3)

But after starting the second cluster, things got crossed up between the
clusters and here's what the original cluster now looks like -

cass-1   token = A   (has data and schema)
cass-2   token = B   (has data and schema)
cass-3   token = C   (had data and schema)
cass-analysis-1   token = A  (has *no* data and is not part of the ring, but
is trying to be included in cluster schema)

A simplified version of "describe cluster"  for the original cluster now
shows:

Cluster Information:
   Schema versions:
 SCHEMA-UUID-1: [cass-1, cass-2, cass-3]
SCHEMA-UUID-2: [cass-analysis-1]

But the simplified ring looks like this (has only 3 nodes instead of 4):

Host   Owns Token
cass-1 33%   A
cass-2 33%   B
cass-3 33%   C

The original cluster is still working correctly but all live schema updates
are failing because of the inconsistent schema versions introduced by the
unwanted node.

>From my perspective, a simple fix seems to be for cassandra to exclude nodes
that aren't part of the ring from the schema consistency requirements.  Any
reason that wouldn't work?

And aside from a possible code patch, any recommendations as to how I can
best fix this given the current 8.4 release?


On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis  wrote:

> Does nodetool removetoken not work?
>
> On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech 
> wrote:
> > Not sure if anyone has seen this before but it's really killing me right
> > now.  Perhaps that was too long of a description of the issue so here's a
> > more succinct question -- How do I remove nodes associated with a cluster
> > that contain no data and have no reason to be associated with the cluster
> > whatsoever?
> > My last resort here is to stop cassandra (after recording all tokens for
> > each node), set the initial token for each node in the cluster in
> > cassandra.yaml, manually delete the LocationInfo* sstables in the system
> > keyspace, and then restart.  I'm hoping there's a simpler, less seemingly
> > risky way to do this so please, please let me know if that's true!
> > Thanks again.
> > - Eric
> > On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech 
> wrote:
> >>
> >> Hi, I'm having what I think is a fairly uncommon schema issue --
> >> My situation is that I had a cluster with 10 nodes and a consistent
> >> schema.  Then, in an experiment to setup a second cluster with the same
> >> information (by copying the raw sstables), I left the LocationInfo*
> sstables
> >> in the system keyspace in the new cluster and after starting the second
> >> cluster, I realized that the two clusters were discovering each other
> when
> >> they shouldn't have been.  Since then, I changed the cluster name for
> the
> >> second cluster and made sure to delete the LocationInfo* sstables before
> >> starting it and the two clusters are now operating independent of one
> >> another for the most part.  The only remaining connection between the
> two
> >> seems to be that the first cluster is still maintaining references to
> nodes
> >> in the second cluster in the schema versions despite those nodes not
> >> actually being part of the ring.
> >> Here's what my "describe cluster" looks like on the original cluster:
> >> Cluster Information:
> >>Snitch: org.apache.cassandra.locator.SimpleSnitch
> >>Partitioner: org.apache.cassandra.dht.RandomPartitioner
> >>Schema versions:
> >> 48971cb0-e9ff-11e0--eb9eab7d90bf: [,
> >> , ..., ]
> >> 848bcfc0-eddf-11e0--8a3bb58f08ff: [,
> >> ]
> >> The second cluster, however, contains no schema versions involving nodes
> >> from the first cluster.
> >> My question then is, how can I remove those schema versions from the
> >> original cluster that are associated with the unwanted nodes from the
> second
> >> cluster?  Is there any way to remove or evict an IP from a cluster
> instead
> >> of just a token?
> >> Thanks in advance!
> >> - Eric
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>