from:"David Boxenhorn"

Re: deleted counters keeps their value?

2011-09-21 Thread David Boxenhorn

The reason why counters work is that addition is commutative, i.e.

x + y = y + x

but deletes are not commutative, i.e.

x + delete ≠ delete + x

so the result depends on the order in which the messages arrive.


2011/9/21 Radim Kolar 

> Dne 21.9.2011 12:07, aaron morton napsal(a):
>
>  see technical limitations for deleting counters http://wiki.apache.org/**
>> cassandra/Counters 
>>
> For instance, if you issue very quickly the sequence "increment, remove,
> increment" it is possible for the removal to be lost (if for some reason the
> remove happens to be the last received messages).
>
> But i do not remove then very quickly. it does that even with 60 seconds
> between delete and increment. I do not understand what means: "remove
> happens to be the last received messages".
>

Read failure when adding node + move; Or: What is the right way to add a node?

2011-09-21 Thread David Boxenhorn

Initial state: 3 nodes, RF=3, version = 0.7.8, some queries are with
CL=QUORUM

1. Add node with with correct token for 4 nodes, repair
2. Move first node to balance 4 nodes, repair
3. Move second node

===> Start getting timeouts, Hector warning: WARNING - Error:
me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be
enough replicas present to handle consistency level.

What is going on? My traffic isn't high. None of my nodes' logs show
ANYTHING during the move

4. When the node finishes moving, the timeouts stop happening

Is there some state in the above scenario that I don't have the required
replication of at least 2?

What causes dropped messages?

2011-08-16 Thread David Boxenhorn

How can I tell what's causing dropped messages?

Is it just too much activity? I'm not getting any other, more specific
messages, just these:

WARN [ScheduledTasks:1] 2011-08-15 11:33:26,136 MessagingService.java (line
504) Dropped 1534 MUTATION messages in the last 5000ms
WARN [ScheduledTasks:1] 2011-08-15 11:33:26,137 MessagingService.java (line
504) Dropped 58 READ_REPAIR messages in the last 5000ms

Re: Changing the CLI, not a great idea!

2011-07-28 Thread David Boxenhorn

This is part of a much bigger problem, one which has many parts, among them:

1. Cassandra is complex. Getting a gestalt understanding of it makes me
think I understand how Alzheimer's patients must feel.
2. There is no official documentation. Perhaps everything is out there
somewhere, who knows?
3. Cassandra is a moving target. Books are out of date before they hit the
press.
4. Most of the important knowledge about Cassandra exists in a kind of oral
history, that is hard to keep up with, and even harder to understand once
it's long past.

I think it is clear that we need a better one-stop-shop for good
documentation. What hasn't been talked about much - but I think it's just as
important - is a good one-stop-shop for Cassandra's oral history.

(You might think this list is the place, but it's too noisy to be useful,
except at the very tip of the cowcatcher. Cassandra needs a canonized
version of its oral history.)

On Thu, Jul 28, 2011 at 7:24 AM, Edward Capriolo wrote:

>
>
> On Thu, Jul 28, 2011 at 12:01 AM, Jonathan Ellis wrote:
>
>> On Wed, Jul 27, 2011 at 10:53 PM, Edward Capriolo 
>> wrote:
>> > You can not even put two statements on the same line. So the ';' is semi
>> > useless syntax.
>>
>> Nobody ever asked for that, but lots of people asked to allow
>> statements spanning multiple lines.
>>
>> > Is their a way to move things forward without hurting backwards
>> > compatibility of the CLI?
>>
>> Yes.  Create a new one based on CQL but leave the old one around.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
> On a semi related note. How can you update a column family and add an
> index?
>
> [default@app] create column family people;
> 4e3310c0-b8d1-11e0--242d50cf1f9f
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@app] update column family people with column_metadata = [{
> column_name : ascii(inserted_at), validation_class : LongType , index_type :
> 0 , index_name : ins_idx}];
> org.apache.cassandra.db.marshal.MarshalException: cannot parse
> 'FUNCTION_CALL' as hex bytes
> [default@app] update column family people with column_metadata = [{
> column_name : inserted_at, validation_class : LongType , index_type : 0 ,
> index_name : ins_idx}];
> org.apache.cassandra.db.marshal.MarshalException: cannot parse
> 'inserted_at' as hex bytes
>
> Edward
>

Re: Counter consistency - are counters idempotent?

2011-07-26 Thread David Boxenhorn

I think that Yang Yang's comment on

https://issues.apache.org/jira/browse/CASSANDRA-2495

is correct. TTL could be used to expire the history. The TTL value could
either be a configurable parameter, or part of the counter API.

On Mon, Jul 25, 2011 at 9:48 PM, Aaron Turner  wrote:

> On Mon, Jul 25, 2011 at 11:24 AM, Sylvain Lebresne 
> wrote:
> > On Mon, Jul 25, 2011 at 7:35 PM, Aaron Turner 
> wrote:
> >> On Sun, Jul 24, 2011 at 3:36 PM, aaron morton 
> wrote:
> >>> What's your use case ? There are people out there having good times
> with counters, see
> >>>
> >>>
> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
> >>> http://www.scribd.com/doc/59830692/Cassandra-at-Twitter
> >>
> >> It's actually pretty similar to Twitter's click counting, but
> >> apparently we have different requirements for accuracy.  It's possible
> >> Rainbird does something on the front end to solve for this issue- I'm
> >> honestly not sure since they haven't released the code yet.
> >>
> >> Anyways, when you're building network aggregate graphs and fail to add
> >> the +100G of traffic from one switch to your site or metro aggregate,
> >> people around here notice.  And people quickly start distrusting
> >> graphs which don't look "real" and either ignore them completely or
> >> complain.
> >>
> >> Obviously, one should manage their Cassandra cluster to limit the
> >> occurrence of Timeouts, but frankly I don't want to be paged at 2am to
> >> "fix" these kind of problems.  If I knew "timeout" meant "failed to
> >> increment counter", I could spool my changes on the client and try
> >> again later, but that's not what timeout means.  Without any means to
> >> recover I've actually lost a lot of reliability that I currently have
> >> with my single PostgreSQL server backed data store.
> >
> > Just to make it very clear: *nobody* is arguing this is not a limitation.
> >
> > The thing is some find counters useful even while perfectly aware of
> > that limitation and seems to be very productive with it, so we have
> > added them. Truth is, if you can live with the limitations and manage
> > the timeout to a bare minimum (hopefully 0), then you won't find much
> > system that are able to scale counting both in term of number of
> > counters and number of ops/s on each counter, and that across
> > datacenters, like Cassandra counters does. And let's recall that
> > while you don't know what happened on a timeout, you at least know
> > when those happens, so you can compute the error margin.
> >
> > Again, this does not mean we don't want to fix the limitations, nor
> > that we want you to wake up at 2am, and there is actually a ticket
> > open for that:
> > https://issues.apache.org/jira/browse/CASSANDRA-2495
> > The problem is, so far, we haven't found any satisfying solution to
> > that problem. If someone has a solution, please, please, share!
> >
> > But yes, counters in their current state don't fit everyone needs
> > and we certainly don't want to hide it.
>
> I think the Cassandra community has been pretty open about the
> limitations and I can see there are some uses for them in their
> current state.  Probably my biggest concern is that I'm pretty new to
> Cassandra and don't understand why occasionally I see timeouts even
> under very low load (one single threaded client).  Once I understood
> the impacts wrt counters it went from "annoying" to "oh crap".
>
> Anyways, as I said earlier, I understand this problem is "hard" and I
> don't expect a fix in 0.8.2 :)
>
> Mostly right now I'm just bummed because I'm pretty much back at
> square one trying to create a scalable solution which meets our needs.
>  Not to say Cassandra won't be a part of it, but just that the
> solution design has become a lot less obvious.
>
>
> --
> Aaron Turner
> http://synfin.net/ Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
> -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>

Re: CompositeType for row Keys

2011-07-24 Thread David Boxenhorn

Why do you need another CF? Is there something wrong with repeating the key
as a column and indexing it?

On Fri, Jul 22, 2011 at 7:40 PM, Patrick Julien  wrote:

> Exactly.  In any case, I just answered my own question.  If I need
> range, I can just make another column family where the column name are
> these keys
>
> On Fri, Jul 22, 2011 at 12:37 PM, Nate McCall  wrote:
> >> yes,but why would you use CompositeType if you don't need range query?
> >
> > If you were doing composite keys anyway (common approach with time
> > series data for example), you would not have to write parsing and
> > concatenation code. Particularly useful if you had mixed types in the
> > key.
> >
>

Re: Repair taking a long, long time

2011-07-20 Thread David Boxenhorn

As I indicated below (but didn't say specifically) another option is to set
read repair chance to 1.0 for all your CFs and loop over all your data,
since read triggers a read repair.

On Wed, Jul 20, 2011 at 4:58 PM, Maxim Potekhin  wrote:

> **
> I can re-load all data that I have in the cluster, from a flat-file cache I
> have
> on NFS, many times faster than the nodetool repair takes. And that's not
> even accurate because as other noted nodetool repair eats up disk space
> for breakfast and takes more than 24hrs on 200GB data load, at which point
> I have to cancel. That's not acceptable. I simply don't know what to do
> now.
>
>
>
> On 7/20/2011 8:47 AM, David Boxenhorn wrote:
>
> I have this problem too, and I don't understand why.
>
> I can repair my nodes very quickly by looping though all my data (when you
> read your data it does read-repair), but nodetool repair takes forever. I
> understand that nodetool repair builds merkle trees, etc. etc., so it's a
> different algorithm, but why can't nodetool repair be smart enough to choose
> the best algorithm? Also, I don't understand what's special about my data
> that makes nodetool repair so much slower than looping through all my data.
>
>
> On Wed, Jul 20, 2011 at 12:18 AM, Maxim Potekhin  wrote:
>
>> Thanks Edward. I'm told by our IT that the switch connecting the nodes is
>> pretty fast.
>> Seriously, in my house I copy complete DVD images from my bedroom to
>> the living room downstairs via WiFi, and a dozen of GB does not seem like
>> a
>> problem, on dirt cheap hardware (Patriot Box Office).
>>
>> I also have just _one_ column major family but caveat emptor -- 8 indexes
>> attached to
>> it (and there will be more). There is one accounting CF which is small,
>> can't possibly
>> make a difference.
>>
>> By contrast, compaction (as in nodetool) performs quite well on this
>> cluster. I start suspecting some
>> sort of malfunction.
>>
>> Looked at the system log during the "repair", there is some compaction
>> agent doing
>> work that I'm not sure makes sense (and I didn't call for it). Disk
>> utilization all of a sudden goes up to 40%
>> per Ganglia, and stays there, this is pretty silly considering the cluster
>> is IDLE and we have SSDs. No external writes,
>> no reads. There are occasional GC stoppages, but these I can live with.
>>
>> This repair debacle happens 2nd time in a row. Cr@p. I need to go to
>> production soon
>> and that doesn't look good at all. If I can't manage a system that simple
>> (and/or get help
>> on this list) I may have to cut losses i.e. stay with Oracle.
>>
>> Regards,
>>
>> Maxim
>>
>>
>>
>>
>> On 7/19/2011 12:16 PM, Edward Capriolo wrote:
>>
>>>
>>> Well most SSD's are pretty fast. There is one more to consider. If
>>> Cassandra determines nodes are out of sync it has to transfer data across
>>> the network. If that is the case you have to look at 'nodetool streams' and
>>> determine how much data is being transferred between nodes. There are some
>>> open tickets where with larger tables repair is streaming more then it needs
>>> to. But even if the transfers are only 10% of your 200GB. Transferring 20 GB
>>> is not trivial.
>>>
>>> If you have multiple keyspaces and column families repair one at a time
>>> might make the process more manageable.
>>>
>>
>>
>
>

Re: Repair taking a long, long time

2011-07-20 Thread David Boxenhorn

I have this problem too, and I don't understand why.

I can repair my nodes very quickly by looping though all my data (when you
read your data it does read-repair), but nodetool repair takes forever. I
understand that nodetool repair builds merkle trees, etc. etc., so it's a
different algorithm, but why can't nodetool repair be smart enough to choose
the best algorithm? Also, I don't understand what's special about my data
that makes nodetool repair so much slower than looping through all my data.


On Wed, Jul 20, 2011 at 12:18 AM, Maxim Potekhin  wrote:

> Thanks Edward. I'm told by our IT that the switch connecting the nodes is
> pretty fast.
> Seriously, in my house I copy complete DVD images from my bedroom to
> the living room downstairs via WiFi, and a dozen of GB does not seem like a
> problem, on dirt cheap hardware (Patriot Box Office).
>
> I also have just _one_ column major family but caveat emptor -- 8 indexes
> attached to
> it (and there will be more). There is one accounting CF which is small,
> can't possibly
> make a difference.
>
> By contrast, compaction (as in nodetool) performs quite well on this
> cluster. I start suspecting some
> sort of malfunction.
>
> Looked at the system log during the "repair", there is some compaction
> agent doing
> work that I'm not sure makes sense (and I didn't call for it). Disk
> utilization all of a sudden goes up to 40%
> per Ganglia, and stays there, this is pretty silly considering the cluster
> is IDLE and we have SSDs. No external writes,
> no reads. There are occasional GC stoppages, but these I can live with.
>
> This repair debacle happens 2nd time in a row. Cr@p. I need to go to
> production soon
> and that doesn't look good at all. If I can't manage a system that simple
> (and/or get help
> on this list) I may have to cut losses i.e. stay with Oracle.
>
> Regards,
>
> Maxim
>
>
>
>
> On 7/19/2011 12:16 PM, Edward Capriolo wrote:
>
>>
>> Well most SSD's are pretty fast. There is one more to consider. If
>> Cassandra determines nodes are out of sync it has to transfer data across
>> the network. If that is the case you have to look at 'nodetool streams' and
>> determine how much data is being transferred between nodes. There are some
>> open tickets where with larger tables repair is streaming more then it needs
>> to. But even if the transfers are only 10% of your 200GB. Transferring 20 GB
>> is not trivial.
>>
>> If you have multiple keyspaces and column families repair one at a time
>> might make the process more manageable.
>>
>
>

Re: Default behavior of generate index_name for columns...

2011-07-18 Thread David Boxenhorn

It would be nice if this were fixed before I move up to 0.8...

On Mon, Jul 18, 2011 at 3:19 PM, Boris Yen  wrote:

> If it would not cause the dev team to much trouble, I think the cassandra
> should maintain the backward compatability regarding the generation of the
> default index_name, otherwise when people start dropping columns indices,
> the result might not be what they want.
>
>
> On Mon, Jul 18, 2011 at 7:59 PM, Jonathan Ellis  wrote:
>
>> On Mon, Jul 18, 2011 at 12:20 AM, Boris Yen  wrote:
>> > Will this have any side effect when doing a get_indexed_slices
>>
>> No
>>
>> > or when a
>> > user wants to drop an index by any means?
>>
>> Sort of; one of the indexes with the name will be dropped, but not all.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>

Re: Default behavior of generate index_name for columns...

2011-07-18 Thread David Boxenhorn

Ah, that's it. I'm on 0.7

On Mon, Jul 18, 2011 at 1:27 PM, Boris Yen  wrote:

> which version of cassandra do you use? What I mentioned here only happens
> on 0.8.1.
>
>
> On Mon, Jul 18, 2011 at 4:44 PM, David Boxenhorn wrote:
>
>> I have lots of indexes on columns with the same name. Why don't I have
>> this problem?
>>
>> For example:
>>
>> Keyspace: City:
>>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>> Replication Factor: 3
>>   Column Families:
>> ColumnFamily: AttractionCheckins
>>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>   Row cache size / save period: 0.0/0
>>   Key cache size / save period: 0.1/14400
>>   Memtable thresholds: 0.3/64/60
>>   GC grace seconds: 864000
>>   Compaction min/max thresholds: 4/64
>>   Read repair chance: 0.01
>>   Column Metadata:
>> Column Name: 09partition (09partition)
>>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>   Index Type: KEYS
>> ColumnFamily: Attractions
>>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>   Row cache size / save period: 3.0/14400
>>   Key cache size / save period: 3.0/14400
>>   Memtable thresholds: 0.3/64/60
>>   GC grace seconds: 864000
>>   Compaction min/max thresholds: 4/64
>>   Read repair chance: 0.01
>>   Column Metadata:
>> Column Name: 09partition (09partition)
>>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>   Index Type: KEYS
>> ColumnFamily: CityResources
>>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>   Row cache size / save period: 5000.0/14400
>>   Key cache size / save period: 5000.0/14400
>>   Memtable thresholds: 0.3/64/60
>>   GC grace seconds: 864000
>>   Compaction min/max thresholds: 4/64
>>   Read repair chance: 0.01
>>   Column Metadata:
>> Column Name: 09partition (09partition)
>>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>   Index Type: KEYS
>>
>>
>> On Mon, Jul 18, 2011 at 8:20 AM, Boris Yen  wrote:
>>
>>> Will this have any side effect when doing a get_indexed_slices or when a
>>> user wants to drop an index by any means?
>>>
>>> Boris
>>>
>>>
>>> On Mon, Jul 18, 2011 at 1:13 PM, Jonathan Ellis wrote:
>>>
>>>> 0.8.0 didn't check for name conflicts correctly.  0.8.1 does, but it
>>>> can't fix the ones 0.8.0 allowed, retroactively.
>>>>
>>>> On Sun, Jul 17, 2011 at 11:52 PM, Boris Yen  wrote:
>>>> > I have tested another case, not sure if this is a bug.
>>>> > I created a few column families on 0.8.0 each has user_name column, in
>>>> > addition, I also enabled secondary index on this column.  Then, I
>>>> upgraded
>>>> > to 0.8.1, when I used cassandra-cli: show keyspaces, I saw index name
>>>> > "user_name_idx" appears for different columns families. It seems the
>>>> > validation rule for index_name on 0.8.1 has been skipped completely.
>>>> >
>>>> > Is this a bug? or is it intentional?
>>>> > Regards
>>>> > Boris
>>>> > On Sat, Jul 16, 2011 at 10:38 AM, Boris Yen 
>>>> wrote:
>>>> >>
>>>> >> Done. It is CASSANDRA-2903.
>>>> >> On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis 
>>>> wrote:
>>>> >>>
>>>> >>> Please.
>>>> >>>
>>>> >>> On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen 
>>>> wrote:
>>>> >>> > Hi Jonathan,
>>>> >>> > Do I need to open a ticket for this?
>>>> >>> > Regards
>>>> >>> > Boris
>>>> >>> >
>>>> >>> > On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis <
>>>> jbel...@gmail.com>
>>>> >>> > wrote:
>>>> >>> >>
>>>> >>> >> Sounds reasonable to me.
>>>> >>> >>
>>>> >>> >> On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen 
>>>> wrote:
>>>> >>> >> > Hi,
>>>> >>> >> > I have a few column families, each has a column call

Re: Default behavior of generate index_name for columns...

2011-07-18 Thread David Boxenhorn

I have lots of indexes on columns with the same name. Why don't I have this
problem?

For example:

Keyspace: City:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 3
  Column Families:
ColumnFamily: AttractionCheckins
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 0.1/14400
  Memtable thresholds: 0.3/64/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/64
  Read repair chance: 0.01
  Column Metadata:
Column Name: 09partition (09partition)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS
ColumnFamily: Attractions
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 3.0/14400
  Key cache size / save period: 3.0/14400
  Memtable thresholds: 0.3/64/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/64
  Read repair chance: 0.01
  Column Metadata:
Column Name: 09partition (09partition)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS
ColumnFamily: CityResources
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 5000.0/14400
  Key cache size / save period: 5000.0/14400
  Memtable thresholds: 0.3/64/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/64
  Read repair chance: 0.01
  Column Metadata:
Column Name: 09partition (09partition)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS

On Mon, Jul 18, 2011 at 8:20 AM, Boris Yen  wrote:

> Will this have any side effect when doing a get_indexed_slices or when a
> user wants to drop an index by any means?
>
> Boris
>
>
> On Mon, Jul 18, 2011 at 1:13 PM, Jonathan Ellis  wrote:
>
>> 0.8.0 didn't check for name conflicts correctly.  0.8.1 does, but it
>> can't fix the ones 0.8.0 allowed, retroactively.
>>
>> On Sun, Jul 17, 2011 at 11:52 PM, Boris Yen  wrote:
>> > I have tested another case, not sure if this is a bug.
>> > I created a few column families on 0.8.0 each has user_name column, in
>> > addition, I also enabled secondary index on this column.  Then, I
>> upgraded
>> > to 0.8.1, when I used cassandra-cli: show keyspaces, I saw index name
>> > "user_name_idx" appears for different columns families. It seems the
>> > validation rule for index_name on 0.8.1 has been skipped completely.
>> >
>> > Is this a bug? or is it intentional?
>> > Regards
>> > Boris
>> > On Sat, Jul 16, 2011 at 10:38 AM, Boris Yen  wrote:
>> >>
>> >> Done. It is CASSANDRA-2903.
>> >> On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis 
>> wrote:
>> >>>
>> >>> Please.
>> >>>
>> >>> On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen 
>> wrote:
>> >>> > Hi Jonathan,
>> >>> > Do I need to open a ticket for this?
>> >>> > Regards
>> >>> > Boris
>> >>> >
>> >>> > On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis 
>> >>> > wrote:
>> >>> >>
>> >>> >> Sounds reasonable to me.
>> >>> >>
>> >>> >> On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen 
>> wrote:
>> >>> >> > Hi,
>> >>> >> > I have a few column families, each has a column called user_name.
>> I
>> >>> >> > tried to
>> >>> >> > use secondary index on user_name column for each of the column
>> >>> >> > family.
>> >>> >> > However, when creating these column families, cassandra keeps
>> >>> >> > reporting
>> >>> >> > "Duplicate index name..." exception. I finally figured out that
>> it
>> >>> >> > seems
>> >>> >> > the
>> >>> >> > default index name is "column name"+"_idx", this make my column
>> >>> >> > family
>> >>> >> > violate the "uniqueness of index name" rule.
>> >>> >> > I was wondering if the default index_name generating rule could
>> be
>> >>> >> > like
>> >>> >> > "column name"+"cf name", so the index name would not collide with
>> >>> >> > each
>> >>> >> > other
>> >>> >> > that easily, if the user do not assign "index_name" when creating
>> a
>> >>> >> > column
>> >>> >> > family.
>> >>> >> > Regards
>> >>> >> > Boris
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Jonathan Ellis
>> >>> >> Project Chair, Apache Cassandra
>> >>> >> co-founder of DataStax, the source for professional Cassandra
>> support
>> >>> >> http://www.datastax.com
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Jonathan Ellis
>> >>> Project Chair, Apache Cassandra
>> >>> co-founder of DataStax, the source for professional Cassandra support
>> >>> http://www.datastax.com
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn

Got it.

Thanks!

On Wed, Jul 13, 2011 at 6:05 PM, Jonathan Ellis  wrote:

> (1) the hash calculation is a small amount of CPU -- MD5 is
> specifically designed to be efficient in this kind of situation
> (2) we compute one hash per query, so for multiple columns the
> advantage over timestamp-per-column gets large quickly.
>
> On Wed, Jul 13, 2011 at 7:31 AM, David Boxenhorn 
> wrote:
> > Is that the actual reason?
> >
> > This seems like a big inefficiency to me. For those of us who don't worry
> > about this extreme edge case (that probably will NEVER happen in real
> life,
> > for most applications), is there a way to turn this off?
> >
> > Or am I wrong about this making the operation MUCH more expensive?
> >
> >
> > On Wed, Jul 13, 2011 at 3:20 PM, Boris Yen  wrote:
> >>
> >> For a specific column, If there are two versions with the same
> timestamp,
> >> the value of the column is used to break the tie.
> >> if v1.value().compareTo(v2.value()) < 0, it means that v2 wins.
> >> On Wed, Jul 13, 2011 at 7:13 PM, David Boxenhorn 
> >> wrote:
> >>>
> >>> How would you know which data is correct, if they both have the same
> >>> timestamp?
> >>>
> >>> On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen 
> wrote:
> >>>>
> >>>> I can only say, "data" does matter, that is why the developers use
> hash
> >>>> instead of timestamp. If hash value comes from other node is not a
> match, a
> >>>> read repair would perform. so that correct data can be returned.
> >>>>
> >>>> On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn 
> >>>> wrote:
> >>>>>
> >>>>> If you have to pieces of data that are different but have the same
> >>>>> timestamp, how can you resolve consistency?
> >>>>>
> >>>>> This is a pathological situation to begin with, why should you waste
> >>>>> effort to (not) solve it?
> >>>>>
> >>>>> On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen 
> wrote:
> >>>>>>
> >>>>>> I guess it is because the timestamp does not guarantee data
> >>>>>> consistency, but hash does.
> >>>>>> Boris
> >>>>>>
> >>>>>> On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn <
> da...@citypath.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> I just saw this
> >>>>>>>
> >>>>>>> http://wiki.apache.org/cassandra/DigestQueries
> >>>>>>>
> >>>>>>> and I was wondering why it returns a hash of the data. Wouldn't it
> be
> >>>>>>> better and easier to return the timestamp? You don't really care
> what the
> >>>>>>> data is, you only care whether it is more or less recent than
> another piece
> >>>>>>> of data.
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn

Is that the actual reason?

This seems like a big inefficiency to me. For those of us who don't worry
about this extreme edge case (that probably will NEVER happen in real life,
for most applications), is there a way to turn this off?

Or am I wrong about this making the operation MUCH more expensive?


On Wed, Jul 13, 2011 at 3:20 PM, Boris Yen  wrote:

> For a specific column, If there are two versions with the same timestamp,
> the value of the column is used to break the tie.
>
> if v1.value().compareTo(v2.value()) < 0, it means that v2 wins.
>
> On Wed, Jul 13, 2011 at 7:13 PM, David Boxenhorn wrote:
>
>> How would you know which data is correct, if they both have the same
>> timestamp?
>>
>> On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen  wrote:
>>
>>> I can only say, "data" does matter, that is why the developers use hash
>>> instead of timestamp. If hash value comes from other node is not a match, a
>>> read repair would perform. so that correct data can be returned.
>>>
>>>
>>> On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn wrote:
>>>
>>>> If you have to pieces of data that are different but have the same
>>>> timestamp, how can you resolve consistency?
>>>>
>>>> This is a pathological situation to begin with, why should you waste
>>>> effort to (not) solve it?
>>>>
>>>> On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen  wrote:
>>>>
>>>>> I guess it is because the timestamp does not guarantee data
>>>>> consistency, but hash does.
>>>>>
>>>>> Boris
>>>>>
>>>>>
>>>>> On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn 
>>>>> wrote:
>>>>>
>>>>>> I just saw this
>>>>>>
>>>>>> http://wiki.apache.org/cassandra/DigestQueries
>>>>>>
>>>>>> and I was wondering why it returns a hash of the data. Wouldn't it be
>>>>>> better and easier to return the timestamp? You don't really care what the
>>>>>> data is, you only care whether it is more or less recent than another 
>>>>>> piece
>>>>>> of data.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn

How would you know which data is correct, if they both have the same
timestamp?

On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen  wrote:

> I can only say, "data" does matter, that is why the developers use hash
> instead of timestamp. If hash value comes from other node is not a match, a
> read repair would perform. so that correct data can be returned.
>
>
> On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn wrote:
>
>> If you have to pieces of data that are different but have the same
>> timestamp, how can you resolve consistency?
>>
>> This is a pathological situation to begin with, why should you waste
>> effort to (not) solve it?
>>
>> On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen  wrote:
>>
>>> I guess it is because the timestamp does not guarantee data consistency,
>>> but hash does.
>>>
>>> Boris
>>>
>>>
>>> On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn wrote:
>>>
>>>> I just saw this
>>>>
>>>> http://wiki.apache.org/cassandra/DigestQueries
>>>>
>>>> and I was wondering why it returns a hash of the data. Wouldn't it be
>>>> better and easier to return the timestamp? You don't really care what the
>>>> data is, you only care whether it is more or less recent than another piece
>>>> of data.
>>>>
>>>
>>>
>>
>

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn

If you have to pieces of data that are different but have the same
timestamp, how can you resolve consistency?

This is a pathological situation to begin with, why should you waste effort
to (not) solve it?

On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen  wrote:

> I guess it is because the timestamp does not guarantee data consistency,
> but hash does.
>
> Boris
>
>
> On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn wrote:
>
>> I just saw this
>>
>> http://wiki.apache.org/cassandra/DigestQueries
>>
>> and I was wondering why it returns a hash of the data. Wouldn't it be
>> better and easier to return the timestamp? You don't really care what the
>> data is, you only care whether it is more or less recent than another piece
>> of data.
>>
>
>

Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn

I just saw this

http://wiki.apache.org/cassandra/DigestQueries

and I was wondering why it returns a hash of the data. Wouldn't it be better
and easier to return the timestamp? You don't really care what the data is,
you only care whether it is more or less recent than another piece of data.

Re: Questions about Cassandra reads

2011-07-03 Thread David Boxenhorn

Ah, I get it. Your normal access pattern should be one row at a time.

On Sun, Jul 3, 2011 at 11:41 AM, David Boxenhorn  wrote:
>>> What do you think ?
>>
>> I think you should strongly consider denormalizing so that you can
>> read ranges from a single row instead.
>
> Why do you recommend denormalizing instead of secondary indexes?
>

Re: Questions about Cassandra reads

2011-07-03 Thread David Boxenhorn

>> What do you think ?
>
> I think you should strongly consider denormalizing so that you can
> read ranges from a single row instead.

Why do you recommend denormalizing instead of secondary indexes?

Re: Truncate introspection

2011-06-28 Thread David Boxenhorn

Does drop work in a similar way?

When I drop a CF and add it back with a different schema, it seems to work.

But I notice that in between the drop and adding it back, when the CLI
tells me the CF doesn't exist, the old data is still there.

I've been assuming that this works, but just wanted to make sure...

On Tue, Jun 28, 2011 at 12:56 AM, Jonathan Ellis  wrote:
> Each node (independently) has logic that guarantees that any writes
> processed before the truncate, will be wiped out.
>
> This does not mean that each node will wipe out the same data, or even
> that each node will process the truncate (which would result in a
> timedoutexception).
>
> It also does not mean you can't have writes immediately after the
> truncate that would race w/ a "truncate, check for zero sstables"
> procedure.
>
> On Mon, Jun 27, 2011 at 3:35 PM, Ethan Rowe  wrote:
>> If those went to zero, it would certainly tell me something happened.  :)  I
>> guess watching that would be a way of seeing something was going on.
>> Is the truncate itself propagating a ring-wide marker or anything so the CF
>> is logically "empty" before being physically removed?  That's the impression
>> I got from the docs but it wasn't totally clear to me.
>>
>> On Mon, Jun 27, 2011 at 3:33 PM, Jonathan Ellis  wrote:
>>>
>>> There's a JMX method to get the number of sstables in a CF, is that
>>> what you're looking for?
>>>
>>> On Mon, Jun 27, 2011 at 1:04 PM, Ethan Rowe  wrote:
>>> > Is there any straightforward means of seeing what's going on after
>>> > issuing a
>>> > truncate (on 0.7.5)?  I'm not seeing evidence that anything actually
>>> > happened.  I've disabled read repair on the column family in question
>>> > and
>>> > don't have anything actively reading/writing at present, apart from my
>>> > one-off tests to see if rows have disappeared.
>>> > Thanks in advance.
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: UnsupportedOperationException: Index manager cannot support deleting and inserting into a row in the same mutation

2011-06-24 Thread David Boxenhorn

"it can cause index corruption IF the row delete timestamp is higher
than the column update's."

By "higher" you mean later, i.e. some modifications to a row, then delete?

I have not seen this error in our logs, but it could happen. I have a
process where I insert historical data into Cassandra, in batches.
Modifying something, then deleting it is a normal scenario.

On Fri, Jun 24, 2011 at 12:41 AM, Jim Ancona  wrote:
> I've reopened the issue. On our 0.7.6-2 cluster, system.log is filling
> with repeated instances of the UnsupportedOperationException. When
> we've attempted to restart a node, the restart fails with the same
> exception. Luckily we found this as part of our pre-deploy testing of
> 0.7.6, not in production, but this is not "mostly a non-problem" here.
>
> Jim
>
> On Thu, Jun 23, 2011 at 3:25 PM, Jonathan Ellis  wrote:
>> The patch probably applies as-is but I don't want to take any risks
>> with 0.7 to solve what is mostly a non-problem.
>>
>> On Thu, Jun 23, 2011 at 2:16 PM, Jim Ancona  wrote:
>>> Is there any reason this fix can't be back-ported to 0.7?
>>>
>>> Jim
>>>
>>> On Thu, Jun 23, 2011 at 3:00 PM, Jonathan Ellis  wrote:
 Sorry, 0.8.2 is correct.

 On Thu, Jun 23, 2011 at 1:36 PM, Les Hazlewood  wrote:
> The issue has the fix version as 0.8.2, not 0.7.7.  Is that incorrect?
> Cheers,
> Les
>



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>

Re: Re : get_range_slices result

2011-06-24 Thread David Boxenhorn

You can get the best of both worlds by repeating the key in a column,
and creating a secondary index on that column.

On Fri, Jun 24, 2011 at 1:16 PM, Sylvain Lebresne  wrote:
> On Fri, Jun 24, 2011 at 10:21 AM, karim abbouh  wrote:
>> i want get_range_slices() function returns records sorted(orded)  by the
>> key(rowId) used during the insertion.
>> is it possible?
>
> You will have to use the OrderPreservingPartitioner. This is no
> without inconvenience however.
> See for instance
> http://wiki.apache.org/cassandra/StorageConfiguration#line-100 or
> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
> that give more details on the pros and cons (the short version being
> that the main advantage of
> OrderPreservingPartitioner is what you're asking for, but it's main
> drawback is that load-balancing
> the cluster will likely be very very hard).
>
> In general the advice is to stick with RandomPartitioner and design a
> data model that avoids needing
> range slices (or at least needing that the result is sorted). This is
> very often not too hard and more
> efficient, and much more simpler than to deal with the load balancing
> problems of OrderPreservingPartitioner.
>
> --
> Sylvain
>
>>
>> 
>> De : aaron morton 
>> À : user@cassandra.apache.org
>> Envoyé le : Jeudi 23 Juin 2011 20h30
>> Objet : Re: get_range_slices result
>>
>> Not sure what your question is.
>> Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp
>> Cheers
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 23 Jun 2011, at 21:59, karim abbouh wrote:
>>
>> how can get_range_slices() function returns sorting key ?
>> BR
>>
>>
>>
>>
>

Re: 99.999% uptime - Operations Best Practices?

2011-06-23 Thread David Boxenhorn

I think very high uptime, and very low data loss is achievable in
Cassandra, but, for new users there are TONS of gotchas. You really
have to know what you're doing, and I doubt that many people acquire
that knowledge without making a lot of mistakes.

I see above that most people are talking about configuration issues.
But, the first thing that you will probably do, before you have any
experience with Cassandra(!), is architect your system. Architecture
is not easily changed when you bump into a gotcha, and for some reason
you really have to search the literature well to find out about them.
So, my contributions:

The too many CFs problem. Cassandra doesn't do well with many column
families. If you come from a relational world, a real application can
easily have hundreds of tables. Even if you combine them into entities
(which is the Cassandra way), you can easily end up with dozens of
entities. The most natural thing for someone with a relational
background is have one CF per entity, plus indexes according to your
needs. Don't do it. You need to store multiple entities in the same
CF. Group them together according to access patterns (i.e. when you
use X,  you probably also need Y), and distinguish them by adding a
prefix to their keys (e.g. entityName@key).

Don't use supercolumns, use composite columns. Supercolumns are
disfavored by the Cassandra community and are slowly being orphaned.
For example, secondary indexes don't work on supercolumns. Nor does
CQL. Bugs crop up with supercolumns that don't happen with regular
columns because internally there's a huge separate code base for
supercolumns, and every new feature is designed and implemented for
regular columns and then retrofitted for supercolumns (or not).

There should really be a database of gotchas somewhere, and how they
were solved...

On Thu, Jun 23, 2011 at 6:57 AM, Les Hazlewood  wrote:
> Edward,
> Thank you so much for this reply - this is great stuff, and I really
> appreciate it.
> You'll be happy to know that I've already pre-ordered your book.  I'm
> looking forward to it! (When is the ship date?)
> Best regards,
> Les
>
> On Wed, Jun 22, 2011 at 7:03 PM, Edward Capriolo 
> wrote:
>>
>>
>> On Wed, Jun 22, 2011 at 8:31 PM, Les Hazlewood  wrote:
>>>
>>> Hi Thoku,
>>> You were able to more concisely represent my intentions (and their
>>> reasoning) in this thread than I was able to do so myself.  Thanks!
>>>
>>> On Wed, Jun 22, 2011 at 5:14 PM, Thoku Hansen  wrote:

 I think that Les's question was reasonable. Why *not* ask the community
 for the 'gotchas'?
 Whether the info is already documented or not, it could be an
 opportunity to improve the documentation based on users' perception.
 The "you just have to learn" responses are fair also, but that reminds
 me of the days when running Oracle was a black art, and accumulated wisdom
 made DBAs irreplaceable.
>>>
>>> Yes, this was my initial concern.  I know that Cassandra is still young,
>>> and I expect this to be the norm for a while, but I was hoping to make that
>>> process a bit easier (for me and anyone else reading this thread in the
>>> future).

 Some recommendations *are* documented, but they are dispersed / stale /
 contradictory / or counter-intuitive.
 Others have not been documented in the wiki nor in DataStax's doco, and
 are instead learned anecdotally or The Hard Way.
 For example, whether documented or not, some of the 'gotchas' that I
 encountered when I first started working with Cassandra were:
 * Don't use OpenJDK. Prefer the Sun JDK. (Wiki says this, Jira says
 that).
 * Its not viable to run without JNA installed.
 * Disable swap memory.
 * Need to run nodetool repair on a regular basis.
 I'm looking forward to Edward Capriolo's Cassandra book which Les will
 probably find helpful.
>>>
>>> Thanks for linking to this.  I'm pre-ordering right away.
>>> And thanks for the pointers, they are exactly the kind of enumerated
>>> things I was looking to elicit.  These are the kinds of things that are hard
>>> to track down in a single place.  I think it'd be nice for the community to
>>> contribute this stuff to a single page ('best practices', 'checklist',
>>> whatever you want to call it).  It would certainly make things easier when
>>> getting started.
>>> Thanks again,
>>> Les
>>
>> Since I got a plug on the book I will chip in again to the thread :)
>>
>> Some things that were mentioned already:
>>
>> Install JNA absolutely (without JNA the snapshot command has to fork to
>> hard link the sstables, I have seen clients backoff from this). Also the
>> performance focused Cassandra devs always try to squeeze out performance by
>> utilizing more native features.
>>
>> OpenJDK vs Sun. I agree, almost always try to do what 'most others' do in
>> production, this way you get surprised less.
>>
>> Other stuff:
>>
>> RAID. You might want to go RAID 1+0 if you are aiming for uptime. RAID 0

Re: Replication-aware compaction

2011-06-07 Thread David Boxenhorn

Thanks! I'm actually on vacation now, so I hope to look into this next week.

On Mon, Jun 6, 2011 at 10:25 PM, aaron morton  wrote:
> You should consider upgrading to 0.7.6 to get a fix to Gossip. Earlier 0.7 
> releases were prone to marking nodes up and down when they should not have 
> been. See 
> https://github.com/apache/cassandra/blob/cassandra-0.7/CHANGES.txt#L22
>
> Are the TimedOutExceptions to the client for read or write requests ? During 
> the burst times which stages are backing up  nodetool tpstats ? Compaction 
> should not affect writes too much (assuming different log and data spindles).
>
> You could also take a look at the read and write latency stats for a 
> particular CF using nodetool cfstats or JConsole. These will give you the 
> stats for the local operations. You could also take a look at the iostats on 
> the box http://spyced.blogspot.com/2010/01/linux-performance-basics.html
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7 Jun 2011, at 00:30, David Boxenhorn wrote:
>
>> Version 0.7.3.
>>
>> Yes, I am talking about minor compactions. I have three nodes, RF=3.
>> 3G data (before replication). Not many users (yet). It seems like 3
>> nodes should be plenty. But when all 3 nodes are compacting, I
>> sometimes get timeouts on the client, and I see in my logs that each
>> one is full of notifications that the other nodes have died (and come
>> back to life after about a second). My cluster can tolerate one node
>> being out of commission, so I would rather have longer compactions one
>> at a time than shorter compactions all at the same time.
>>
>> I think that our usage pattern of bursty writes causes the three nodes
>> to decide to compact at the same time. These bursts are followed by
>> periods of relative quiet, so there should be time for the other two
>> nodes to compact one at a time.
>>
>>
>> On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn  wrote:
>>>
>>> Version 0.7.3.
>>>
>>> Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G 
>>> data (before replication). Not many users (yet). It seems like 3 nodes 
>>> should be plenty. But when all 3 nodes are compacting, I sometimes get 
>>> timeouts on the client, and I see in my logs that each one is full of 
>>> notifications that the other nodes have died (and come back to life after 
>>> about a second). My cluster can tolerate one node being out of commission, 
>>> so I would rather have longer compactions one at a time than shorter 
>>> compactions all at the same time.
>>>
>>> I think that our usage pattern of bursty writes causes the three nodes to 
>>> decide to compact at the same time. These bursts are followed by periods of 
>>> relative quiet, so there should be time for the other two nodes to compact 
>>> one at a time.
>>>
>>>
>>> On Mon, Jun 6, 2011 at 2:36 PM, aaron morton  
>>> wrote:
>>>>
>>>> Are you talking about minor (automatic) compactions ? Can you provide some 
>>>> more information on what's happening to make the node unusable and what 
>>>> version you are using? It's not lightweight process, but it should not 
>>>> hurt the node that badly. It is considered an online operation.
>>>>
>>>> Delaying compaction will only make it run for longer and take more 
>>>> resources.
>>>>
>>>> Cheers
>>>>
>>>> -
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 6 Jun 2011, at 20:14, David Boxenhorn wrote:
>>>>
>>>>> Is there some deep architectural reason why compaction can't be
>>>>> replication-aware?
>>>>>
>>>>> What I mean is, if one node is doing compaction, its replicas
>>>>> shouldn't be doing compaction at the same time. Or, at least a quorum
>>>>> of nodes should be available at all times.
>>>>>
>>>>> For example, if RF=3, and one node is doing compaction, the nodes to
>>>>> its right and left in the ring should wait on compaction until that
>>>>> node is done.
>>>>>
>>>>> Of course, my real problem is that compaction makes a node pretty much
>>>>> unavailable. If we can fix that problem then this is not necessary.
>>>>
>>>
>
>

Re: Replication-aware compaction

2011-06-06 Thread David Boxenhorn

Version 0.7.3.

Yes, I am talking about minor compactions. I have three nodes, RF=3.
3G data (before replication). Not many users (yet). It seems like 3
nodes should be plenty. But when all 3 nodes are compacting, I
sometimes get timeouts on the client, and I see in my logs that each
one is full of notifications that the other nodes have died (and come
back to life after about a second). My cluster can tolerate one node
being out of commission, so I would rather have longer compactions one
at a time than shorter compactions all at the same time.

I think that our usage pattern of bursty writes causes the three nodes
to decide to compact at the same time. These bursts are followed by
periods of relative quiet, so there should be time for the other two
nodes to compact one at a time.


On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn  wrote:
>
> Version 0.7.3.
>
> Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data 
> (before replication). Not many users (yet). It seems like 3 nodes should be 
> plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the 
> client, and I see in my logs that each one is full of notifications that the 
> other nodes have died (and come back to life after about a second). My 
> cluster can tolerate one node being out of commission, so I would rather have 
> longer compactions one at a time than shorter compactions all at the same 
> time.
>
> I think that our usage pattern of bursty writes causes the three nodes to 
> decide to compact at the same time. These bursts are followed by periods of 
> relative quiet, so there should be time for the other two nodes to compact 
> one at a time.
>
>
> On Mon, Jun 6, 2011 at 2:36 PM, aaron morton  wrote:
>>
>> Are you talking about minor (automatic) compactions ? Can you provide some 
>> more information on what's happening to make the node unusable and what 
>> version you are using? It's not lightweight process, but it should not hurt 
>> the node that badly. It is considered an online operation.
>>
>> Delaying compaction will only make it run for longer and take more resources.
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 6 Jun 2011, at 20:14, David Boxenhorn wrote:
>>
>> > Is there some deep architectural reason why compaction can't be
>> > replication-aware?
>> >
>> > What I mean is, if one node is doing compaction, its replicas
>> > shouldn't be doing compaction at the same time. Or, at least a quorum
>> > of nodes should be available at all times.
>> >
>> > For example, if RF=3, and one node is doing compaction, the nodes to
>> > its right and left in the ring should wait on compaction until that
>> > node is done.
>> >
>> > Of course, my real problem is that compaction makes a node pretty much
>> > unavailable. If we can fix that problem then this is not necessary.
>>
>

Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread David Boxenhorn

Jonathan, are Donal Zang's results (10x slowdown) typical?

On Mon, Jun 6, 2011 at 3:14 PM, Jonathan Ellis  wrote:

> On Mon, Jun 6, 2011 at 6:28 AM, Donal Zang  wrote:
> > Another thing I noticed is : if you first do insertion, and then build
> the
> > secondary index use "update column family ...", and then do select based
> on
> > the index, the result is not right (seems the index is still being built
> > though the "update" commands returns quickly).
>
> That is correct. "describe keyspace" from the cli tells you when an
> index has finished building.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread David Boxenhorn

Is there really a 10x difference between indexed CFs and non-indexed CFs?

On Mon, Jun 6, 2011 at 11:05 AM, Donal Zang  wrote:

> On 06/06/2011 05:38, Jonathan Ellis wrote:
>
>> Index updates require read-before-write (to find out what the prior
>> version was, if any, and update the index accordingly).  This is
>> random i/o.
>>
>> Index creation on the other hand is a lot of sequential i/o, hence
>> more efficient.
>>
>> So, the classic bulk load advice to ingest data prior to creating
>> indexes applies.
>>
> Thanks for the explanation!
>
> --
> Donal Zang
> Computing Center, IHEP
> 19B YuquanLu, Shijingshan District,Beijing, 100049
> zan...@ihep.ac.cn
> 86 010 8823 6018
>
>
>

Replication-aware compaction

2011-06-06 Thread David Boxenhorn

Is there some deep architectural reason why compaction can't be
replication-aware?

What I mean is, if one node is doing compaction, its replicas
shouldn't be doing compaction at the same time. Or, at least a quorum
of nodes should be available at all times.

For example, if RF=3, and one node is doing compaction, the nodes to
its right and left in the ring should wait on compaction until that
node is done.

Of course, my real problem is that compaction makes a node pretty much
unavailable. If we can fix that problem then this is not necessary.

CQL: Select for multiple ranges

2011-05-20 Thread David Boxenhorn

In order to fully implement the functionality of super columns using
compound columns I need to be able to select multiple column ranges - this
would be functionally equivalent to selecting multiple super columns (and
more!).

I would like to request the following CQL syntax:

SELECT [FIRST N] [REVERSED] name1..nameN1, name2..nameN2... FROM ...

I am heading into my weekend here. If no one has created a JIRA ticket for
this by Sunday, and I am not talked out of it, I will create one myself.

Re: Using composite column names in the CLI

2011-05-17 Thread David Boxenhorn

Excellent!

(I presume there is some way of representing ":", like "\:"?)


On Tue, May 17, 2011 at 11:44 AM, Sylvain Lebresne wrote:

> Provided you're working on a branch that has CASSANDRA-2231 applied (that's
> either the cassandra-0.8.1 branch or trunk), this work 'out of the box':
>
> The setup will look like:
> [default@unknown] create keyspace test;
> [default@unknown] use test;
> [default@test] create column family testCF with
> comparator='CompositeType(AsciiType, IntegerType(reversed=true),
> IntegerType)' and default_validation_class=AsciiType;
>
> Then:
> [default@test] set testCF[a]['foo:24:24'] = 'v1';
> Value inserted.
> [default@test] set testCF[a]['foo:42:24'] = 'v2';
> Value inserted.
> [default@test] set testCF[a]['foobar:42:24'] = 'v3';
> Value inserted.
> [default@test] set testCF[a]['boobar:42:24'] = 'v4';
> Value inserted.
> [default@test] set testCF[a]['boobar:42:42'] = 'v5';
> Value inserted.
> [default@test] get testCF[a];
> => (column=boobar:42:24, value=v4, timestamp=1305621115813000)
> => (column=boobar:42:42, value=v5, timestamp=1305621125563000)
> => (column=foo:42:24, value=v2, timestamp=1305621096473000)
> => (column=foo:24:24, value=v1, timestamp=1305621085548000)
> => (column=foobar:42:24, value=v3, timestamp=1305621110813000)
> Returned 5 results.
>
> --
> Sylvain
>
> On Tue, May 17, 2011 at 9:20 AM, David Boxenhorn 
> wrote:
> > This is what I'm talking about
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-2231
> >
> > The on-disk format is
> >
> > <(short)length> 0><(short)length> > byte = 0>...
> >
> > I would like to be able to input these kinds of keys into the CLI,
> something
> > like
> >
> > set cf[key]['constituent1':'constituent2':'constituent3'] = val
> >
> >
> > On Tue, May 17, 2011 at 2:15 AM, Sameer Farooqui <
> cassandral...@gmail.com>
> > wrote:
> >>
> >> Cassandra wouldn't know that the column name is composite of two
> different
> >> things. So you could just request the column names and values for a
> specific
> >> key like this and then just look at the column names that get returned:
> >> [default@MyKeyspace] get DemoCF[ascii('key_42')];
> >> => (column=CA_SanJose, value=50, timestamp=1305236885112000)
> >> => (column=CA_PaloAlto, value=49, timestamp=1305236885192000)
> >> => (column=FL_Orlando, value=45, timestamp=130523688528)
> >> => (column=NY_NYC, value=40, timestamp=1305236885361000)
> >>
> >> And I'm not sure what you mean by inputting composite column names. You
> >> just input them like any other column name:
> >> [default@MyKeyspace] set DemoCF['key_42']['CA_SanJose']='51';
> >> Value inserted.
> >>
> >>
> >>
> >>
> >> On Mon, May 16, 2011 at 2:34 PM, Aaron Morton 
> >> wrote:
> >>>
> >>> What do you mean by composite column names?
> >>>
> >>> Do the data type functions supported by get and set help? Or the assume
> >>> statement?
> >>>
> >>> Aaron
> >>> On 17/05/2011, at 3:21 AM, David Boxenhorn  wrote:
> >>>
> >>> > Is there a way to view composite column names in the CLI?
> >>> >
> >>> > Is there a way to input them (i.e. in the set command)?
> >>> >
> >>
> >
> >
>

Re: Using composite column names in the CLI

2011-05-17 Thread David Boxenhorn

This is what I'm talking about

https://issues.apache.org/jira/browse/CASSANDRA-2231

The on-disk format is

<(short)length><(short)length>...

I would like to be able to input these kinds of keys into the CLI, something
like

set cf[key]['constituent1':'constituent2':'constituent3'] = val


On Tue, May 17, 2011 at 2:15 AM, Sameer Farooqui wrote:

> Cassandra wouldn't know that the column name is composite of two different
> things. So you could just request the column names and values for a specific
> key like this and then just look at the column names that get returned:
>
> [default@MyKeyspace] get DemoCF[ascii('key_42')];
> => (column=CA_SanJose, value=50, timestamp=1305236885112000)
> => (column=CA_PaloAlto, value=49, timestamp=1305236885192000)
> => (column=FL_Orlando, value=45, timestamp=130523688528)
> => (column=NY_NYC, value=40, timestamp=1305236885361000)
>
>
> And I'm not sure what you mean by inputting composite column names. You
> just input them like any other column name:
>
> [default@MyKeyspace] set DemoCF['key_42']['CA_SanJose']='51';
> Value inserted.
>
>
>
>
>
> On Mon, May 16, 2011 at 2:34 PM, Aaron Morton wrote:
>
>> What do you mean by composite column names?
>>
>> Do the data type functions supported by get and set help? Or the assume
>> statement?
>>
>> Aaron
>> On 17/05/2011, at 3:21 AM, David Boxenhorn  wrote:
>>
>> > Is there a way to view composite column names in the CLI?
>> >
>> > Is there a way to input them (i.e. in the set command)?
>> >
>>
>
>

Using composite column names in the CLI

2011-05-16 Thread David Boxenhorn

Is there a way to view composite column names in the CLI?

Is there a way to input them (i.e. in the set command)?

Re: Import/Export of Schema Migrations

2011-05-16 Thread David Boxenhorn

What you describe below sounds like what I want to do. I think that the only
additional thing I am requesting is to export the migrations from the dev
cluster (since Cassandra already has a table that saves them - I just want
that information!) so I can import it to the other clusters. This would
ensure that my migrations are exactly right, without being dependent on
error-prone human intervention.

To really get rid of human intervention it would be nice to be able to mark
a certain migration with a version name. Then I could say something like,
"export migrations version1.2.3 to version1.2.4" and I would get the exact
migration path from one version to another.


On Mon, May 16, 2011 at 1:04 AM, aaron morton wrote:

> 
>
> Not sure what sort of changes you are making, but this is my approach.
>
> I've always managed database (my sql, sql server whatever) schema as source
> code (SQL DDL statements, CLI script etc). It makes it a lot easier to cold
> start the system, test changes and see who changed what.
>
> Once you have your initial schema you can hand roll a CLI script to update
>  / drop existing CF's. For the update column family statement all the
> attributes are delta to the current setting, i.e. you do not need to say
> comparator is ascii again. Other than the indexes, you need to specify all
> the indexes again those not included will be dropped.
>
> If you want to be able to replay multiple schema changes made during dev
> against other clusters my personal approach would be:
>
> - create a cli script for every change (using update and delete CF),
> prefixed with 000X so you can see the order.
> - manage the scripts in source control
> - sanity check to see if they can be collapsed
> - replay the changes in order when applying them to a cluster.
> (you will still need to manually delete data from dropped cf's)
>
> changes to conf/cassandra.yaml can be managed using chef
> 
>
> Others will have different ideas
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 14 May 2011, at 00:15, David Boxenhorn wrote:
>
> Actually, I want a way to propagate *any* changes from development to
> staging to production, but schema changes are the most important.
>
> Could I use 2221 to propagate schema changes by deleting the schema in the
> target cluster, doing "show schema" in the source cluster, redirecting to a
> file, and running the file as a script in the target cluster?
>
> Of course, I would have to delete the files of dropped CFs by hand
> (something I *hate* to do, because I'm afraid of making a mistake), but it
> would be a big improvement.
>
> I am open to any other ideas of how to propagate changes from one cluster
> to another in an efficient non-error-prone fashion. Our development
> environment (i.e. development, staging, production) is pretty standard, so
> I'm sure that I'm not the only one with this problem!
>
>
> On Fri, May 13, 2011 at 12:51 PM, aaron morton wrote:
>
>> What sort of schema changes are you making?  can you manage them as a CLI
>> script under source control ?
>>
>>
>> You may also be interested in  CASSANDRA-2221.
>>
>> Cheers
>> Aaron
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 12 May 2011, at 20:45, David Boxenhorn wrote:
>>
>> My use case is like this: I have a development cluster, a staging cluster
>> and a production cluster. When I finish a set of migrations (i.e. changes)
>> on the development cluster, I want to apply them to the staging cluster, and
>> eventually the production cluster. I don't want to do it by hand, because
>> it's a painful and error-prone process. What I would like to do is export
>> the last N migrations from the development cluster as a text file, with
>> exactly the same format as the original text commands, and import them to
>> the staging and production clusters.
>>
>> I think the best place to do this might be the CLI, since you would
>> probably want to view your migrations before exporting them. Something like
>> this:
>>
>> show migrations N;Shows the last N migrations.
>> export migrations N ;   Exports the last N migrations to
>> file fileName.
>> import migrations ; Imports migrations from fileName.
>>
>> The import process would apply the migrations one at a time giving you
>> feedback like, "applying migration: update column family...". If a migration
>> fails, the process should give an appropriate message and stop.
>>
>> Is anyone else interested in this? I have created a Jira ticket for it
>> here:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2636
>>
>>
>>
>>
>
>

Re: Import/Export of Schema Migrations

2011-05-13 Thread David Boxenhorn

Actually, I want a way to propagate *any* changes from development to
staging to production, but schema changes are the most important.

Could I use 2221 to propagate schema changes by deleting the schema in the
target cluster, doing "show schema" in the source cluster, redirecting to a
file, and running the file as a script in the target cluster?

Of course, I would have to delete the files of dropped CFs by hand
(something I *hate* to do, because I'm afraid of making a mistake), but it
would be a big improvement.

I am open to any other ideas of how to propagate changes from one cluster to
another in an efficient non-error-prone fashion. Our development environment
(i.e. development, staging, production) is pretty standard, so I'm sure that
I'm not the only one with this problem!

On Fri, May 13, 2011 at 12:51 PM, aaron morton wrote:

> What sort of schema changes are you making?  can you manage them as a CLI
> script under source control ?
>
>
> You may also be interested in  CASSANDRA-2221.
>
> Cheers
> Aaron
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12 May 2011, at 20:45, David Boxenhorn wrote:
>
> My use case is like this: I have a development cluster, a staging cluster
> and a production cluster. When I finish a set of migrations (i.e. changes)
> on the development cluster, I want to apply them to the staging cluster, and
> eventually the production cluster. I don't want to do it by hand, because
> it's a painful and error-prone process. What I would like to do is export
> the last N migrations from the development cluster as a text file, with
> exactly the same format as the original text commands, and import them to
> the staging and production clusters.
>
> I think the best place to do this might be the CLI, since you would
> probably want to view your migrations before exporting them. Something like
> this:
>
> show migrations N;Shows the last N migrations.
> export migrations N ;   Exports the last N migrations to file
> fileName.
> import migrations ; Imports migrations from fileName.
>
> The import process would apply the migrations one at a time giving you
> feedback like, "applying migration: update column family...". If a migration
> fails, the process should give an appropriate message and stop.
>
> Is anyone else interested in this? I have created a Jira ticket for it
> here:
>
> https://issues.apache.org/jira/browse/CASSANDRA-2636
>
>
>
>

Import/Export of Schema Migrations

2011-05-12 Thread David Boxenhorn

My use case is like this: I have a development cluster, a staging cluster
and a production cluster. When I finish a set of migrations (i.e. changes)
on the development cluster, I want to apply them to the staging cluster, and
eventually the production cluster. I don't want to do it by hand, because
it's a painful and error-prone process. What I would like to do is export
the last N migrations from the development cluster as a text file, with
exactly the same format as the original text commands, and import them to
the staging and production clusters.

I think the best place to do this might be the CLI, since you would probably
want to view your migrations before exporting them. Something like this:

show migrations N;Shows the last N migrations.
export migrations N ;   Exports the last N migrations to file
fileName.
import migrations ; Imports migrations from fileName.

The import process would apply the migrations one at a time giving you
feedback like, "applying migration: update column family...". If a migration
fails, the process should give an appropriate message and stop.

Is anyone else interested in this? I have created a Jira ticket for it here:

https://issues.apache.org/jira/browse/CASSANDRA-2636

Re: compaction strategy

2011-05-09 Thread David Boxenhorn

If they each have their own copy of the data, then they are *not*
non-overlapping!

If you have non-overlapping SSTables (and you know the min/max keys), it's
like having one big SSTable because you know exactly where each row is, and
it becomes easy to merge a new SSTable in small batches, rather than in one
huge batch.

The only step that you have to add to the current merge process is, when you
going to write a new SSTable, if it's too big, to write N (non-overlapping!)
pieces instead.

On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen  wrote:

> Yes, agreed.
>
> I actually think cassandra has to.
>
> And if you do not go down to that single file, how do you avoid getting
> into a situation where you can very realistically end up with 4-5 big
> sstables each having its own copy of the same data massively increasing disk
> requirements?
>
> Terje
>
> On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn  wrote:
>
>> "I'm also not too much in favor of triggering major compactions, because
>> it mostly have a nasty effect (create one huge sstable)."
>>
>> If that is the case, why can't major compactions create many,
>> non-overlapping SSTables?
>>
>> In general, it seems to me that non-overlapping SSTables have all the
>> advantages of big SSTables (i.e. you know exactly where the data is) without
>> the disadvantages that come with being big. Why doesn't Cassandra take
>> advantage of that in a major way?
>>
>
>

Re: compaction strategy

2011-05-09 Thread David Boxenhorn

"I'm also not too much in favor of triggering major compactions, because it
mostly have a nasty effect (create one huge sstable)."

If that is the case, why can't major compactions create many,
non-overlapping SSTables?

In general, it seems to me that non-overlapping SSTables have all the
advantages of big SSTables (i.e. you know exactly where the data is) without
the disadvantages that come with being big. Why doesn't Cassandra take
advantage of that in a major way?

Cassandra and JCR

2011-05-06 Thread David Boxenhorn

I think this is a question specifically for Patricio Echagüe, though I
welcome answers from anyone else who can contribute...

We are considering using Magnolia as a CMS. Magnolia uses Jackrabbit for its
data storage. Jackrabbit is a JCR implementation.

Questions:

1. Can we plug Cassandra into JCR/Jackrabbit as its data storage?

2. I see that some work has already been done on this issue (specifically, I
see that Patrico was involved in this). Where does that work stand now? Is
this a viable option for us?

3. How much work would it be for us?

4. What are the issues involved?

Re: Compound columns spec

2011-05-05 Thread David Boxenhorn

What is the format of  ?

On Thu, May 5, 2011 at 6:14 PM, Eric Evans  wrote:

> On Thu, 2011-05-05 at 17:44 +0300, David Boxenhorn wrote:
> > Is there a spec for compound columns?
> >
> > I want to know the exact format of compound columns so I can adhere to
> > it. For example, what is the separator - or is some other format used
> > (e.g. length:value or type:length:value)?
>
> Tentatively, CQL will use colon delimited terms like this, yes
> (tentatively).
>
> --
> Eric Evans
> eev...@rackspace.com
>
>

Re: Compound columns spec

2011-05-05 Thread David Boxenhorn

Thanks, yes, I was referring to the "compound columns" in this quote (from a
previous thread):

"No CQL will never support super columns, but later versions (not 1.0.0)
will support compound columns.  Compound columns are better; instead of
a two-deep structure, you can have one of arbitrary depth."

I would like to design my keys to take advantage of this future development,
when it comes.

On Thu, May 5, 2011 at 5:53 PM, Sylvain Lebresne wrote:

> I suppose it depends what you are referring to by "compound columns".
> If you're talking
> about the CompositeType of CASSANDRA-2231 (which is my only guess), then
> the
> format is in the javadoc and is:
> /*
> * The encoding of a CompositeType column name should be:
> *...
> * where  is:
> *   <'end-of-component' byte>
> * where the 'end-of-component' byte should always be 0 for actual column
> * name.  However, it can set to 1 for query bounds. This allows to query
> for
> * the equivalent of 'give me the full super-column'. That is, if during a
> * slice query uses:
> *   start = <3><"foo".getBytes()><0>
> *   end   = <3><"foo".getBytes()><1>
> * then he will be sure to get *all* the columns whose first component is
> "foo".
> * If for a component, the 'end-of-component' is != 0, there should not be
> any
> * following component.
> */
>
> I'll mention that this is not committed code yet (but soon hopefully
> and the format
> shouldn't change).
>
> --
> Sylvain
>
> On Thu, May 5, 2011 at 4:44 PM, David Boxenhorn  wrote:
> > Is there a spec for compound columns?
> >
> > I want to know the exact format of compound columns so I can adhere to
> it.
> > For example, what is the separator - or is some other format used (e.g.
> > length:value or type:length:value)?
> >
>

Compound columns spec

2011-05-05 Thread David Boxenhorn

Is there a spec for compound columns?

I want to know the exact format of compound columns so I can adhere to it.
For example, what is the separator - or is some other format used (e.g.
length:value or type:length:value)?

Re: Cassandra CMS

2011-05-05 Thread David Boxenhorn

I'm looking at Magnolia at the moment (as in, this second). At first glance,
it looks like I should be able to use Cassandra as the database:

http://documentation.magnolia-cms.com/technical-guide/content-storage-and-structure.html#Persistent_storage

If it can use a filesystem as its database, it can use Cassandra, no?

On Thu, May 5, 2011 at 2:01 PM, aaron morton wrote:

> Would you think of Django as a CMS ?
>
> http://stackoverflow.com/questions/2369793/how-to-use-cassandra-in-django-framework
>
>
> 
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5 May 2011, at 22:54, Eric tamme wrote:
>
> Does anyone know of a content management system that can be easily
>
> customized to use Cassandra as its database?
>
>
> (Even better, if it can use Cassandra without customization!)
>
>
>
> I think your best bet will be to look for a CMS that uses an ORM for
> the storage layer and write a specific ORM for Cassandra that you can
> plugin to whatever frame work the CMS uses.
>
> -Eric
>
>
>

Cassandra CMS

2011-05-05 Thread David Boxenhorn

Does anyone know of a content management system that can be easily
customized to use Cassandra as its database?

(Even better, if it can use Cassandra without customization!)

One cluster or many?

2011-05-03 Thread David Boxenhorn

If I have a database that partitions naturally into non-overlapping
datasets, in which there are no references between datasets, where each
dataset is quite large (i.e. large enough to merit its own cluster from the
point of view of quantity of data), should I set up one cluster per database
or one large cluster for everything together?

As I see it:

The primary advantage of separate clusters is total isolation: if I have a
problem with one dataset, my application will continue working normally for
all other datasets.

The primary advantage of one big cluster is usage pooling: when one server
goes down in a large cluster it's much less important than when one server
goes down in a small cluster. Also, different temporal usage patterns of the
different datasets (i.e. there will be different peak hours on different
datasets) can be combined to ease capacity requirements.

Any thoughts?

Re: Combining all CFs into one big one

2011-05-02 Thread David Boxenhorn

I guess I'm still feeling fuzzy on this because my actual use-case isn't so
black-and-white. I don't have any CFs that are accessed purely, or even
mostly, in once-through batch mode. What I have is CFs with more and less
data, and CFs that are accessed more and less frequently.


On Mon, May 2, 2011 at 7:52 PM, Tyler Hobbs  wrote:

> On Mon, May 2, 2011 at 5:05 AM, David Boxenhorn  wrote:
>
>> Wouldn't it be the case that the once-used rows in your batch process
>> would quickly be traded out of the cache, and replaced by frequently-used
>> rows?
>>
>
> Yes, and you'll pay a cache miss penalty for each of the replacements.
>
>
>> This would be the case even if your batch process goes on for a long time,
>> since caching is done on a row-by-row basis. In effect, it would mean that
>> part of your cache is taken up by the batch process, much as if you
>> dedicated a permanent cache to the batch - except that it isn't permanent,
>> so it's better!
>>
>
> Right, but we didn't want to cache any of the batch CF in the first place,
> because caching that CF is worth very little.  With separate CFs, we could
> explicitly give it no cache.  Now we have no control over how much of the
> cache it evicts.
>
>

Terrible CQL idea: > and < aliases of >= and <=

2011-05-02 Thread David Boxenhorn

Is this still true?

*Note: The greater-than and less-than operators (> and <) result in key
ranges that are inclusive of the terms. There is no supported notion of
“strictly” greater-than or less-than; these operators are merely supported
as aliases to >= and <=.

*
I think that making > and < aliases of >= and <= is a terrible idea!

First of all, it is very misleading. Second, what will happen to old code
when > and < are really supported? (*Some* day they will be supported!)

Re: Combining all CFs into one big one

2011-05-02 Thread David Boxenhorn

Wouldn't it be the case that the once-used rows in your batch process would
quickly be traded out of the cache, and replaced by frequently-used rows?
This would be the case even if your batch process goes on for a long time,
since caching is done on a row-by-row basis. In effect, it would mean that
part of your cache is taken up by the batch process, much as if you
dedicated a permanent cache to the batch - except that it isn't permanent,
so it's better!


On Mon, May 2, 2011 at 7:50 AM, Tyler Hobbs  wrote:

> If you had one big cache, wouldn't it be the case that it's mostly
>> populated with frequently accessed rows, and less populated with rarely
>> accessed rows?
>>
>
> Yes.
>
> In fact, wouldn't one big cache dynamically and automatically give you
>> exactly what you want? If you try to partition the same amount of memory
>> manually, by guesswork, among many tables, aren't you always going to do a
>> worse job?
>>
>
> Suppose you have one CF that's used constantly through interaction by
> users.  Suppose you have another CF that's only used periodically by a batch
> process, you tend to access most or all of the rows during the batch
> process, and it's too large to cache all of the rows.  Normally, you would
> dedicate cache space to the first CF as anything with human interaction
> tends to have good temporal locality and you want to keep latencies there
> low.  On the other hand, caching the second CF provides little to no real
> benefit.  When you combine these two CFs, every time your batch process
> runs, rows from the second CF will populate the cache and will cause
> eviction of rows from the first CF, even though having those rows in the
> cache provides little benefit to you.
>
> As another example, if you mix a CF with wide rows and a CF with small
> rows, you no longer have the option of using a row cache, even if it makes
> great sense for the small-row CF data.
>
> Knowledge of data and access patterns gives you a very good advantage when
> it comes to caching your data effectively.
>
>
> --
> Tyler Hobbs
> Software Engineer, DataStax 
> Maintainer of the pycassa  Cassandra
> Python client library
>
>

Re: Combining all CFs into one big one

2011-05-01 Thread David Boxenhorn

If you had one big cache, wouldn't it be the case that it's mostly populated
with frequently accessed rows, and less populated with rarely accessed rows?

In fact, wouldn't one big cache dynamically and automatically give you
exactly what you want? If you try to partition the same amount of memory
manually, by guesswork, among many tables, aren't you always going to do a
worse job?


On Sun, May 1, 2011 at 10:43 PM, Tyler Hobbs  wrote:

> On Sun, May 1, 2011 at 2:16 PM, Jake Luciani  wrote:
>
>>
>>
>> On Sun, May 1, 2011 at 2:58 PM, shimi  wrote:
>>
>>> On Sun, May 1, 2011 at 9:48 PM, Jake Luciani  wrote:
>>>
 If you have N column families you need N * memtable size of RAM to
 support this.  If that's not an option you can merge them into one as you
 suggest but then you will have much larger SSTables, slower compactions,
 etc.
>>>
>>>
>>>
 I don't necessarily agree with Tyler that the OS cache will be less
 effective... But I do agree that if the sizes of sstables are too large for
 you then more hardware is the solution...
>>>
>>>
>>> If you merge CFs which are hardly accessed with one which are accessed
>>> frequently, when you read the SSTable you load data that is hardly accessed
>>> to the OS cache.
>>>
>>
>>  Only the rows or portions of rows you read will be loaded into the OS
>> cache.  Just because different rows are in the same file doesn't mean the
>> entire file is loaded into the OS cache.  The bloom filter and index file
>> will be loaded but those are not large files.
>>
>
> Right -- it does depend on the page size and the average amount of data
> read.  The effect will be more pronounced on CFs with small rows that those
> with wide rows.
>

Re: Combining all CFs into one big one

2011-05-01 Thread David Boxenhorn

Shouldn't these kinds of problems be solved by Cassandra? Isn't there a
maximum SSTable size?

On Sun, May 1, 2011 at 3:24 PM, shimi  wrote:

> Big sstables, long compactions, in major compaction you will need to have
> free disk space in the size of all the sstables (which you should have
> anyway).
>
> Shimi
>
>
> On Sun, May 1, 2011 at 2:03 PM, David Boxenhorn  wrote:
>
>> I'm having problems administering my cluster because I have too many CFs
>> (~40).
>>
>> I'm thinking of combining them all into one big CF. I would prefix the
>> current CF name to the keys, repeat the CF name in a column, and index the
>> column (so I can loop over all rows, which I have to do sometimes, for some
>> CFs).
>>
>> Can anyone think of any disadvantages to this approach?
>>
>>
>

Combining all CFs into one big one

2011-05-01 Thread David Boxenhorn

I'm having problems administering my cluster because I have too many CFs
(~40).

I'm thinking of combining them all into one big CF. I would prefix the
current CF name to the keys, repeat the CF name in a column, and index the
column (so I can loop over all rows, which I have to do sometimes, for some
CFs).

Can anyone think of any disadvantages to this approach?

Re: encryption_options & 0.8

2011-04-27 Thread David Boxenhorn

How about a more general (and encrypted!) solution: Add a password
decryption class to the YAML. If it is not defined, that means the passwords
are not encrypted, if it is defined, use it to decrypt the passwords.

That way, you need to steal both the YAML and the decryption class if you
want to steal the passwords.

On Wed, Apr 27, 2011 at 1:56 PM, Sasha Dolgy  wrote:

> "IBM WebSphere applies a hardcoded XOR. Each caracter is XOR'd with
> the caracter ‘_’, and the resulting string is encoded in base64. This
> is not cryptography, it is just enough encoding so that a casual
> glance at the file will not reveal the password."
>
> I'm sure there are many different options.  Key point here is the
> 'casual glance' reference.  In terms of adding additional overhead
> when a node starts up ... it shouldn't be that prohibitive.  when the
> cassandra.yaml is loaded in and the "encryption_properties", if
> keystore_password.clear or truststore_password.clear exists, rewrite
> these properties in the yaml to the encrypted values of the cleartext
> string, removing the ".clear" suffix and continue on as normal.  the
> default within cassandra should be looking at decrypting an encrypted
> value.  if the decrypted values are wrong, throw an error as you would
> normally ...
>
> Yes, all of this can be circumvented if the cassandra.yaml is set to
> read only for user and no one else, but really ... do i want anyone in
> our organization who may have access to restart a cassandra node, but
> may not be privileged to know what the keystore / truststore passwords
> are to easily find out by looking at the cassandra.yaml ?
>
> -sd
>
>
> > On Wed, Apr 27, 2011 at 5:09 AM, David Strauss 
> wrote:
> >
> > If the passwords are encrypted, when and how would they be decrypted?
>

Re: Indexes on heterogeneous rows

2011-04-17 Thread David Boxenhorn

Thanks, Jonathan. I think I understand now.

To sum up: Everything would work, but if your only equality is on "type"
(all the rest inequalities), it could be very inefficient.

Is that right?

On Thu, Apr 14, 2011 at 7:22 PM, Jonathan Ellis  wrote:

> On Thu, Apr 14, 2011 at 6:48 AM, David Boxenhorn 
> wrote:
> > The reason why I put "type" first is that queries on type will
> > always be an exact match, whereas the other clauses might be
> inequalities.
>
> Expression order doesn't matter, but as you imply, non-equalities
> can't be used in an index lookup and have to be checked in a nested
> loop phase afterwards.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Indexes on heterogeneous rows

2011-04-14 Thread David Boxenhorn

Thanks. I'm aware that I can roll my own. I wanted to avoid that, for ease
of use, but especially for atomicity concerns.

I thought that the secondary index would bring into memory all keys where
type=2, and then iterate over them to find keys where=5. (This is a case
were 1/3 of the rows are of type 2, but, say only a few hundred rows of type
2 have e=5.) The reason why I put "type" first is that queries on type will
always be an exact match, whereas the other clauses might be inequalities.

On Thu, Apr 14, 2011 at 2:07 PM, aaron morton wrote:

> You could make your own inverted index by using keys like  "e=5-type=2"
> where the columns are either the keys for the object or the objects
> themselves. Then just grab the full row back. If you know you always want to
> run queries like that.
>
> This recent discussion and blog post from Ed is good background
> http://www.mail-archive.com/user@cassandra.apache.org/msg12136.html
>
> I'm not sure how efficient the join from "e" to type would be. AFAIK it
> will iterate all keys where e=5 and lookup corresponding rows to find out if
> type = 2.
>
> If know how you want to read things back and need to deal with lots-o-data
> I would start testing with custom indexes. Then compare to the built in
> ones, it should be reasonably simple add them for a test.
>
> <http://www.mail-archive.com/user@cassandra.apache.org/msg12136.html>Hope
> that helps.
> Aaron
>
> On 14 Apr 2011, at 22:33, David Boxenhorn wrote:
>
> Thank you for your answer, and sorry about the sloppy terminology.
>
> I'm thinking of the scenario where there are a small number of results in
> the result set, but there are billions of rows in the first of your
> secondary indexes.
>
> That is, I want to do something like (not sure of the CQL syntax):
>
> select * where type=2 and e=5
>
> where there are billions of rows of type 2, but some manageable number of
> those rows have e=5.
>
> As I understand it, secondary indexes are like column families, where each
> value is a column. So the billions of rows where type=2 would go into a
> single row of the secondary index. This sounds like a problem to me, is it?
>
>
> I'm assuming that the billions of rows that don't have column "e" at all
> (those rows of other types) are not a problem at all...
>
> On Thu, Apr 14, 2011 at 12:12 PM, aaron morton wrote:
>
>> Need to clear up some terminology here.
>>
>> Rows have a key and can be retrieved by key. This is *sort of* the primary
>> index, but not primary in the normal RDBMS sense.
>> Rows can have different columns and the column names are sorted and can be
>> efficiently selected.
>> There are "secondary indexes" in cassandra 0.7 based on column values
>> http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
>>
>> So you could create secondary indexes on the a,e, and h columns and get
>> rows that have specific values. There are some limitations to secondary
>> indexes, read the linked article.
>>
>> Or you can make your own secondary indexes using row keys as the index
>> values.
>>
>> If you have billions of rows, how many do you need to read back at once?
>>
>> Hope that helps
>> Aaron
>>
>> On 14 Apr 2011, at 04:23, David Boxenhorn wrote:
>>
>> Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
>> different sets of columns?
>>
>> For example, let's say you have three types of objects (1, 2, 3) which
>> each had three members. If your rows had the following pattern
>>
>> type=1 a=? b=? c=?
>> type=2 d=? e=? f=?
>> type=3 g=? h=? i=?
>>
>> could you index "type" as your primary index, and also index "a", "e", "h"
>> as secondary indexes, to get the objects of that type that you are looking
>> for?
>>
>> Would it work if you had billions of rows of each type?
>>
>>
>>
>
>

Re: Indexes on heterogeneous rows

2011-04-14 Thread David Boxenhorn

Thank you for your answer, and sorry about the sloppy terminology.

I'm thinking of the scenario where there are a small number of results in
the result set, but there are billions of rows in the first of your
secondary indexes.

That is, I want to do something like (not sure of the CQL syntax):

select * where type=2 and e=5

where there are billions of rows of type 2, but some manageable number of
those rows have e=5.

As I understand it, secondary indexes are like column families, where each
value is a column. So the billions of rows where type=2 would go into a
single row of the secondary index. This sounds like a problem to me, is it?

I'm assuming that the billions of rows that don't have column "e" at all
(those rows of other types) are not a problem at all...

On Thu, Apr 14, 2011 at 12:12 PM, aaron morton wrote:

> Need to clear up some terminology here.
>
> Rows have a key and can be retrieved by key. This is *sort of* the primary
> index, but not primary in the normal RDBMS sense.
> Rows can have different columns and the column names are sorted and can be
> efficiently selected.
> There are "secondary indexes" in cassandra 0.7 based on column values
> http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
>
> So you could create secondary indexes on the a,e, and h columns and get
> rows that have specific values. There are some limitations to secondary
> indexes, read the linked article.
>
> Or you can make your own secondary indexes using row keys as the index
> values.
>
> If you have billions of rows, how many do you need to read back at once?
>
> Hope that helps
> Aaron
>
> On 14 Apr 2011, at 04:23, David Boxenhorn wrote:
>
> Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
> different sets of columns?
>
> For example, let's say you have three types of objects (1, 2, 3) which each
> had three members. If your rows had the following pattern
>
> type=1 a=? b=? c=?
> type=2 d=? e=? f=?
> type=3 g=? h=? i=?
>
> could you index "type" as your primary index, and also index "a", "e", "h"
> as secondary indexes, to get the objects of that type that you are looking
> for?
>
> Would it work if you had billions of rows of each type?
>
>
>

Indexes on heterogeneous rows

2011-04-13 Thread David Boxenhorn

Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
different sets of columns?

For example, let's say you have three types of objects (1, 2, 3) which each
had three members. If your rows had the following pattern

type=1 a=? b=? c=?
type=2 d=? e=? f=?
type=3 g=? h=? i=?

could you index "type" as your primary index, and also index "a", "e", "h"
as secondary indexes, to get the objects of that type that you are looking
for?

Would it work if you had billions of rows of each type?

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-14 Thread David Boxenhorn

How do you write to two versions of Cassandra from the same client? Two
versions of Hector?

On Mon, Mar 14, 2011 at 6:46 PM, Robert Coli  wrote:

> On Mon, Mar 14, 2011 at 8:39 AM, Jedd Rashbrooke 
> wrote:
> >  But more importantly for us it would mean we'd have just the
> >  one major outage, rather than two (relocation and 0.6 -> 0.7)
>
> Take zero major outages instead? :D
>
> a) Set up new cluster on new version.
> b) Fork application writes, so all writes go to both clusters.
> c) Backfill old data to new cluster via API writes.
> d) Flip the switch to read from the new cluster.
> e) Turn off old cluster.
>
> =Rob
>

Re: Double ColumnType and comparing

2011-03-14 Thread David Boxenhorn

I you do it, I'd recommend BigDecimal. It's an exact type, and usually what
you want.

On Mon, Mar 14, 2011 at 3:40 PM, Jonathan Ellis  wrote:

> We'd be happy to commit a patch contributing a DoubleType.
>
> On Sun, Mar 13, 2011 at 7:36 PM, Paul Teasdale 
> wrote:
> > I am quite new to Cassandra and am trying to model a simple Column Family
> > which uses Doubles as column names:
> > Datalines: { // ColumnFamilly
> > dataline-1:{ // row key
> > 23.5: 'someValue',
> > 23.6: 'someValue',
> > ...
> >4334.99: 'someValue'
> > },
> > dataline-2:{
> > 10.5: 'someValue',
> > 12.6: 'someValue',
> > ...
> >23334.99: 'someValue'
> > },
> > ...
> > dataline-n:{
> > 10.5: 'someValue',
> > 12.6: 'someValue',
> > ...
> >23334.99: 'someValue'
> >   }
> > }
> > In declaring this column family, I need to specify a 'CompareWith'
> attribute
> > for a Double type, but the only available values I found for this
> attribute
> > are:
> >  * BytesType
> >  * AsciiType
> >  * UTF8Type
> >  * LongType
> >  * LexicalUUIDType
> >  * TimeUUIDType
> > Is there any support anywere for double values (there has to be
> something)?
> > And if not, does this mean we need to extend
> >  org.apache.cassandra.db.marshal.AbstractType?
> > package  com.mycom.types;
> > class  DoubleType extends
> > org.apache.cassandra.db.marshal.AbstractType {
> >  public int compare(ByteBuffer o1, ByteBuffer o2){
> >// trivial implementation
> >Double d1  = o1.getDouble(0);
> >   Double d2 = o2.getDoube(0);
> >   return d1.compareTo(d2);
> >  }
> >  //...
> > }
> > And declare the column family:
> >  Name="Datalines"/>
> > Thanks,
> > Paul
> >
> >
> >
> >
> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Nodes frozen in GC

2011-03-08 Thread David Boxenhorn

If RF=2 and CL= QUORUM, you're getting no benefit from replication. When a
node is in GC it stops everything. Set RF=3, so when one node is busy the
cluster will still work.

On Tue, Mar 8, 2011 at 11:46 AM, ruslan usifov wrote:

>
>
> 2011/3/8 Chris Goffinet 
>
>> How large are your SSTables on disk? My thought was because you have so
>> many on disk, we have to store the bloom filter + every 128 keys from index
>> in memory.
>>
>>
> 0.5GB
>  But as I understand store in memory happens only when read happens, i do
> only inserts. And i think that memory doesn't problem, because heap
> allocations looks like saw (in max Heap allocations get about 5,5 GB then
> they reduce to 2GB)
>
>
> Also when i increase Heap Size to 7GB, situation stay mach better, but
> nodes frozen still happens, and in gc.log I steel see:
>
> Total time for which application threads were stopped: 20.0686307 seconds
>
> lines (right not so often, like before)
>

Re: Exceptions on 0.7.0

2011-02-22 Thread David Boxenhorn

Thanks, Shimi. I'll keep you posted if we make progress. Riptano is working
on this problem too.

On Tue, Feb 22, 2011 at 3:30 PM, shimi  wrote:

> I didn't solved it.
> Since it is a test cluster I deleted all the data. I copied some sstables
> from my production cluster and I tried again, this time I didn't have this
> problem.
> I am planing on removing everything from this test cluster. I will start
> all over again with 0.6.x , then I will load it with 10th of GB of data (not
> sstable copy) and test the upgrade again.
>
> I did a mistake that I didn't backup the data files before I upgraded.
>
> Shimi
>
> On Tue, Feb 22, 2011 at 2:24 PM, David Boxenhorn wrote:
>
>> Shimi,
>>
>> I am getting the same error that you report here. What did you do to solve
>> it?
>>
>> David
>>
>>
>> On Thu, Feb 10, 2011 at 2:54 PM, shimi  wrote:
>>
>>> I upgraded the version on all the nodes but I still gets the Exceptions.
>>> I run cleanup on one of the nodes but I don't think there is any cleanup
>>> going on.
>>>
>>> Another weird thing that I see is:
>>> INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353
>>> CompactionIterator.java (line 135) Compacting large row
>>> 333531353730363835363237353338383836383035363036393135323132383
>>> 73630323034313a446f20322e384c20656e67696e657320686176652061646a75737461626c65206c696674657273
>>> (725849473109 bytes) incrementally
>>>
>>> In my production version the largest row is 10259. It shouldn't be
>>> different in this case.
>>>
>>> The first Exception is been thrown on 3 nodes during compaction.
>>> The second Exception (Internal error processing get_range_slices) is
>>> been thrown all the time by a forth node. I disabled gossip and any client
>>> traffic to it and I still get the Exceptions.
>>> Is it possible to boot a node with gossip disable?
>>>
>>> Shimi
>>>
>>> On Thu, Feb 10, 2011 at 11:11 AM, aaron morton 
>>> wrote:
>>>
>>>> I should be able to repair, install the new version and kick off
>>>> nodetool repair .
>>>>
>>>> If you are uncertain search for cassandra-1992 on the list, there has
>>>> been some discussion. You can also wait till some peeps in the states wake
>>>> up if you want to be extra sure.
>>>>
>>>>  The number if the number of columns the iterator is going to return
>>>> from the row. I'm guessing that because this happening during compaction
>>>> it's using asked for the maximum possible number of columns.
>>>>
>>>> Aaron
>>>>
>>>>
>>>>
>>>> On 10 Feb 2011, at 21:37, shimi wrote:
>>>>
>>>> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>>>>
>>>>  Out of curiosity, do you really have on the order of 1,986,622,313
>>>> elements (I believe elements=keys) in the cf?
>>>>
>>>> Dan
>>>>
>>>> No. I was too puzzled by the numbers
>>>>
>>>>
>>>> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton >>> > wrote:
>>>>
>>>>> Shimi,
>>>>> You may be seeing the result of CASSANDRA-1992, are you able to test
>>>>> with the most recent 0.7 build ?
>>>>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>>>>
>>>>>
>>>>> Aaron
>>>>>
>>>> I will. I hope the data was not corrupted.
>>>>
>>>>
>>>>
>>>> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton >>> > wrote:
>>>>
>>>>> Shimi,
>>>>> You may be seeing the result of CASSANDRA-1992, are you able to test
>>>>> with the most recent 0.7 build ?
>>>>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>>>>
>>>>>
>>>>> Aaron
>>>>>
>>>>> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>>>>>
>>>>> Out of curiosity, do you really have on the order of 1,986,622,313
>>>>> elements (I believe elements=keys) in the cf?
>>>>>
>>>>> Dan
>>>>>
>>>>>  *From:* shimi [mailto:shim...@gmail.com]
>>>>> *Sent:* February-09-11 15:06
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* Exceptions on 0.7.0
&

Re: Exceptions on 0.7.0

2011-02-22 Thread David Boxenhorn

Shimi,

I am getting the same error that you report here. What did you do to solve
it?

David

On Thu, Feb 10, 2011 at 2:54 PM, shimi  wrote:

> I upgraded the version on all the nodes but I still gets the Exceptions.
> I run cleanup on one of the nodes but I don't think there is any cleanup
> going on.
>
> Another weird thing that I see is:
> INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353 CompactionIterator.java
> (line 135) Compacting large row
> 333531353730363835363237353338383836383035363036393135323132383
> 73630323034313a446f20322e384c20656e67696e657320686176652061646a75737461626c65206c696674657273
> (725849473109 bytes) incrementally
>
> In my production version the largest row is 10259. It shouldn't be
> different in this case.
>
> The first Exception is been thrown on 3 nodes during compaction.
> The second Exception (Internal error processing get_range_slices) is been
> thrown all the time by a forth node. I disabled gossip and any client
> traffic to it and I still get the Exceptions.
> Is it possible to boot a node with gossip disable?
>
> Shimi
>
> On Thu, Feb 10, 2011 at 11:11 AM, aaron morton wrote:
>
>> I should be able to repair, install the new version and kick off nodetool
>> repair .
>>
>> If you are uncertain search for cassandra-1992 on the list, there has been
>> some discussion. You can also wait till some peeps in the states wake up if
>> you want to be extra sure.
>>
>>  The number if the number of columns the iterator is going to return from
>> the row. I'm guessing that because this happening during compaction it's
>> using asked for the maximum possible number of columns.
>>
>> Aaron
>>
>>
>>
>> On 10 Feb 2011, at 21:37, shimi wrote:
>>
>> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>>
>>  Out of curiosity, do you really have on the order of 1,986,622,313
>> elements (I believe elements=keys) in the cf?
>>
>> Dan
>>
>> No. I was too puzzled by the numbers
>>
>>
>> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton 
>>  wrote:
>>
>>> Shimi,
>>> You may be seeing the result of CASSANDRA-1992, are you able to test with
>>> the most recent 0.7 build ?
>>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>>
>>>
>>> Aaron
>>>
>> I will. I hope the data was not corrupted.
>>
>>
>>
>> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton 
>> wrote:
>>
>>> Shimi,
>>> You may be seeing the result of CASSANDRA-1992, are you able to test with
>>> the most recent 0.7 build ?
>>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>>
>>>
>>> Aaron
>>>
>>> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>>>
>>> Out of curiosity, do you really have on the order of 1,986,622,313
>>> elements (I believe elements=keys) in the cf?
>>>
>>> Dan
>>>
>>>  *From:* shimi [mailto:shim...@gmail.com]
>>> *Sent:* February-09-11 15:06
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Exceptions on 0.7.0
>>>
>>> I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X
>>> On 3 out of the 4 nodes I get exceptions in the log.
>>> I am using RP.
>>> Changes that I did:
>>> 1. changed the replication factor from 3 to 4
>>> 2. configured the nodes to use Dynamic Snitch
>>> 3. RR of 0.33
>>>
>>> I run repair on 2 nodes  before I noticed the errors. One of them is
>>> having the first error and the other the second.
>>> I restart the nodes but I still get the exceptions.
>>>
>>> The following Exception I get from 2 nodes:
>>>  WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java
>>> (line 84) Cannot provide an optimal Bloom
>>> Filter for 1986622313 elements (1/4 buckets per element).
>>> ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190
>>> AbstractCassandraDaemon.java (line 91) Fatal exception in
>>> thread Thread[CompactionExecutor:1,1,main]
>>> java.io.IOError: java.io.EOFException
>>> at
>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105)
>>> at
>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:34)
>>> at
>>> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
>>> at
>>> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
>>> at
>>> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
>>> at
>>> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
>>> at
>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>>> at
>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>>> at
>>> com.google.common.collect.Iterators$7.computeNext(Iterators.java:604)
>>> at
>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>>> at
>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>>> at
>>> org.apache.cassandra.db.ColumnIndexer.serializeInt

Re: Distribution Factor: part of the solution to many-CF problem?

2011-02-21 Thread David Boxenhorn

No, that's not what I mean at all.

That message is about the ability to use different partitioners for
different CFs, say, RandomPartitioner for one, OPP for another.

I'm talking about defining how many nodes a CF should be distributed over,
which would be useful if you have a lot of nodes and a lot of small CFs
(small relative to the total amount of data).

On Mon, Feb 21, 2011 at 9:58 PM, Aaron Morton wrote:

> Sounds a bit like this idea
> http://www.mail-archive.com/dev@cassandra.apache.org/msg01799.html
>
> Aaron
>
> On 22/02/2011, at 1:28 AM, David Boxenhorn  wrote:
>
> > Cassandra is both distributed and replicated. We have Replication Factor
> but no Distribution Factor!
> >
> > Distribution Factor would define over how many nodes a CF should be
> distributed.
> >
> > Say you want to support millions of multi-tenant users in clusters with
> thousands of nodes, where you don't know the user's schema in advance, so
> you can't have users share CFs.
> >
> > In this case you wouldn't want to spread out each user's Column Families
> over thousands of nodes! You would want something like: RF=3, DF=10 i.e.
> distribute each CF over 10 nodes, within those nodes replicate 3 times.
> >
> > One implementation of DF would be to hash the CF name, and use the same
> strategies defined for RF to choose the N nodes in DF=N.
> >
>

Distribution Factor: part of the solution to many-CF problem?

2011-02-21 Thread David Boxenhorn

Cassandra is both distributed and replicated. We have Replication Factor but
no Distribution Factor!

Distribution Factor would define over how many nodes a CF should be
distributed.

Say you want to support millions of multi-tenant users in clusters with
thousands of nodes, where you don't know the user's schema in advance, so
you can't have users share CFs.

In this case you wouldn't want to spread out each user's Column Families
over thousands of nodes! You would want something like: RF=3, DF=10 i.e.
distribute each CF over 10 nodes, within those nodes replicate 3 times.

One implementation of DF would be to hash the CF name, and use the same
strategies defined for RF to choose the N nodes in DF=N.

Re: Do supercolumns have a purpose?

2011-02-13 Thread David Boxenhorn

I agree, that is the way to go. Then each piece of new functionality will
not have to be implemented twice.

On Sat, Feb 12, 2011 at 9:41 AM, Stu Hood  wrote:

> I would like to continue to support super columns, but to slowly convert
> them into "compound column names", since that is really all they really are.
>
>
> On Thu, Feb 10, 2011 at 10:16 AM, Frank LoVecchio wrote:
>
>> I've found super column families quite useful when using
>> RandomOrderedPartioner on a low-maintenance cluster (as opposed to
>> Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type;
>> try doing that with one regular column family and secondary indexes (you
>> could obviously sort on the client side, but that is tedious and not logical
>> for older data).
>>
>> On Thu, Feb 10, 2011 at 12:32 AM, David Boxenhorn wrote:
>>
>>> Mike, my problem is that I have an database and codebase that already
>>> uses supercolumns. If I had to do it over, it wouldn't use them, for the
>>> reasons you point out. In fact, I have a feeling that over time supercolumns
>>> will become deprecated de facto, if not de jure. That's why I would like to
>>> see them represented internally as regular columns, with an upgrade path for
>>> backward compatibility.
>>>
>>> I would love to do it myself! (I haven't looked at the code base, but I
>>> don't understand why it should be so hard.) But my employer has other
>>> ideas...
>>>
>>>
>>> On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone  wrote:
>>>
>>>> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn wrote:
>>>>
>>>>> Shaun, I agree with you, but marking them as deprecated is not good
>>>>> enough for me. I can't easily stop using supercolumns. I need an upgrade
>>>>> path.
>>>>>
>>>>
>>>> David,
>>>>
>>>> Cassandra is open source and community developed. The right thing to do
>>>> is what's best for the community, which sometimes conflicts with what's 
>>>> best
>>>> for individual users. Such strife should be minimized, it will never be
>>>> eliminated. Luckily, because this is an open source, liberal licensed
>>>> project, if you feel strongly about something you should feel free to add
>>>> whatever features you want yourself. I'm sure other people in your 
>>>> situation
>>>> will thank you for it.
>>>>
>>>> At a minimum I think it would behoove you to re-read some of the
>>>> comments here re: why super columns aren't really needed and take another
>>>> look at your data model and code. I would actually be quite surprised to
>>>> find a use of super columns that could not be trivially converted to normal
>>>> columns. In fact, it should be possible to do at the framework/client
>>>> library layer - you probably wouldn't even need to change any application
>>>> code.
>>>>
>>>> Mike
>>>>
>>>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts wrote:
>>>>>
>>>>>>
>>>>>> I'm a newbie here, but, with apologies for my presumptuousness, I
>>>>>> think you should deprecate SuperColumns. They are already distracting 
>>>>>> you,
>>>>>> and as the years go by the cost of supporting them as you add more and 
>>>>>> more
>>>>>> functionality is only likely to get worse. It would be better to 
>>>>>> concentrate
>>>>>> on making the "core" column families better (and I'm sure we can all 
>>>>>> think
>>>>>> of lots of things we'd like).
>>>>>>
>>>>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>>>>> users like David who are currently using them. But if you mark them 
>>>>>> clearly
>>>>>> as deprecated and explain why and what to do instead (perhaps putting a 
>>>>>> bit
>>>>>> of effort into migration tools... or even a "virtual" layer supporting
>>>>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>>>>> you get to 1.0, say), without people feeling betrayed.
>>>>>>
>>>>>> -- Shaun
>>>>>>
>>>>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>

Re: Do supercolumns have a purpose?

2011-02-09 Thread David Boxenhorn

Mike, my problem is that I have an database and codebase that already uses
supercolumns. If I had to do it over, it wouldn't use them, for the reasons
you point out. In fact, I have a feeling that over time supercolumns will
become deprecated de facto, if not de jure. That's why I would like to see
them represented internally as regular columns, with an upgrade path for
backward compatibility.

I would love to do it myself! (I haven't looked at the code base, but I
don't understand why it should be so hard.) But my employer has other
ideas...


On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone  wrote:

> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn  wrote:
>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>>
>
> David,
>
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
>
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
>
> Mike
>
> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts  wrote:
>>
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>>
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>>
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne 
>>> wrote:
>>>
>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote:
>>>>
>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne >>>> > wrote:
>>>>>
>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote:
>>>>>>
>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>> families.
>>>>>>>
>>>>>>
>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>> probably a much better solution anyway.
>>>>>>
>>>>&

Re: time to live rows

2011-02-08 Thread David Boxenhorn

I hope you don't consider this a hijack of the thread...

What I'd like to know is the following:

The GC removes TTL rows some time after they expire, at its convenience. But
will they stop being returned as soon as they expire? (This is the expected
behavior...)

On Tue, Feb 8, 2011 at 5:11 PM, Kallin Nagelberg  wrote:

> So the empty row will be ultimately removed then? Is there a way to
> for the GC to verify this?
>
> Thanks,
> -Kal
>
> On Tue, Feb 8, 2011 at 2:21 AM, Stu Hood  wrote:
> > The expired columns were converted into tombstones, which will live for
> the
> > GC timeout. The "empty" row will be cleaned up when those tombstones are
> > removed.
> > Returning the empty row is unfortunate... we'd love to find a more
> > appropriate solution that might not involve endless scanning.
> > See
> > http://wiki.apache.org/cassandra/FAQ#i_deleted_what_gives
> > http://wiki.apache.org/cassandra/FAQ#range_ghosts
> >
> > On Mon, Feb 7, 2011 at 1:49 PM, Kallin Nagelberg
> >  wrote:
> >>
> >> I also tried forcing a major compaction on the column family using JMX
> >> but the row remains.
> >>
> >> On Mon, Feb 7, 2011 at 4:43 PM, Kallin Nagelberg
> >>  wrote:
> >> > I tried that but I still see the row coming back on a list
> >> >  in the CLI. My concern is that there will be a pointer
> >> > to an empty row for all eternity.
> >> >
> >> > -Kal
> >> >
> >> > On Mon, Feb 7, 2011 at 4:38 PM, Aaron Morton  >
> >> > wrote:
> >> >> Deleting all the columns in a row via TTL has the same affect as
> >> >> deleting th
> >> >> row, the data will physically by removed during compaction.
> >> >>
> >> >> Aaron
> >> >>
> >> >>
> >> >> On 08 Feb, 2011,at 10:24 AM, Bill Speirs 
> wrote:
> >> >>
> >> >> I don't think this is supported (but I could be completely wrong).
> >> >> However, I'd love to see this functionality as well.
> >> >>
> >> >> How would one go about requesting such a feature?
> >> >>
> >> >> Bill-
> >> >>
> >> >> On Mon, Feb 7, 2011 at 4:15 PM, Kallin Nagelberg
> >> >>  wrote:
> >> >>> Hey,
> >> >>>
> >> >>> I have read about the new TTL columns in Cassandra 0.7. In my case
> I'd
> >> >>> like to expire an entire row automatically after a certain amount of
> >> >>> time. Is this possible as well?
> >> >>>
> >> >>> Thanks,
> >> >>> -Kal
> >> >>>
> >> >>
> >> >
> >
> >
>

Re: Do supercolumns have a purpose?

2011-02-08 Thread David Boxenhorn

Shaun, I agree with you, but marking them as deprecated is not good enough
for me. I can't easily stop using supercolumns. I need an upgrade path.

On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts  wrote:

>
> I'm a newbie here, but, with apologies for my presumptuousness, I think you
> should deprecate SuperColumns. They are already distracting you, and as the
> years go by the cost of supporting them as you add more and more
> functionality is only likely to get worse. It would be better to concentrate
> on making the "core" column families better (and I'm sure we can all think
> of lots of things we'd like).
>
> Just dropping SuperColumns would be bad for your reputation -- and for
> users like David who are currently using them. But if you mark them clearly
> as deprecated and explain why and what to do instead (perhaps putting a bit
> of effort into migration tools... or even a "virtual" layer supporting
> arbitrary hierarchical data), then you can drop them in a few years (when
> you get to 1.0, say), without people feeling betrayed.
>
> -- Shaun
>
> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>
> "My main point was to say that it's think it is better to create tickets
> for what you want, rather than for something else completely different that
> would, as a by-product, give you what you want."
>
> Then let me say what I want: I want supercolumn families to have any
> feature that regular column families have.
>
> My data model is full of supercolumns. I used them, even though I knew it
> didn't *have to*, "because they were there", which implied to me that I was
> supposed to use them for some good reason. Now I suspect that they will
> gradually become less and less functional, as features are added to regular
> column families and not supported for supercolumn families.
>
>
> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne wrote:
>
>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone  wrote:
>>
>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne 
>>> wrote:
>>>
>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote:
>>>>
>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>> families.
>>>>>
>>>>
>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>> magnitude less work than getting rid of super column internally, and
>>>> probably a much better solution anyway.
>>>>
>>>
>>> I realize that this is largely subjective, and on such matters code
>>> speaks louder than words, but I don't think I agree with you on the issue of
>>> which alternative is less work, or even which is a better solution.
>>>
>>
>> You are right, I put probably too much emphase in that sentence. My main
>> point was to say that it's think it is better to create tickets for what you
>> want, rather than for something else completely different that would, as a
>> by-product, give you what you want.
>> Then I suspect that *if* the only goal is to get secondary indexes on
>> super columns, then there is a good chance this would be less work than
>> getting rid of super columns. But to be fair, secondary indexes on super
>> columns may not make too much sense without #598, which itself would require
>> quite some work, so clearly I spoke a bit quickly.
>>
>>
>>> If the goal is to have a hierarchical model, limiting the depth to two
>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>> hierarchy?
>>>
>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>> impractical, allowing a depth of two seems inconsistent and
>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>> has a similar architecture and goes even further [2].
>>>
>>> It seems to me that super columns are a historical artifact from
>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>> posting lists of messages, sharded by user. So that's what they built. In my
>>> dealings with the Cassandra code, super columns end up making a mess all
>>> over the place when algorithms need to be special cased and branch based on
>>> the column/supercolumn distinction.
>>&g

Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-07 Thread David Boxenhorn

Why not synchronize on the client side? Make sure that the process that
allocates user ids runs on only a single machine, in a synchronized method,
and uses QUORUM for its reads and writes to Cassandra?

On Sun, Feb 6, 2011 at 11:02 PM, Aaron Morton wrote:

> If you mix mysql and Cassandra you risk creating a single point of failure
> around the mysql system.
>
> If you have use data that changes infrequently, a row cache in cassandra
> will give you fast reads.
>
> Aaron
>
> On 5/02/2011, at 8:13 AM, Aklin_81  wrote:
>
> > Thanks so much Ryan for the links; I'll definitely take them into
> > consideration.
> >
> > Just another thought which came to my mind:-
> > perhaps it may be beneficial to store(or duplicate) some of the data
> > like the Login credentials & particularly userId to User's Name
> > mapping, etc (which is very heavily read), in a fast MyISAM table.
> > This could solve the problem of keys though auto-generated unique &
> > sequential primary keys. I could use the same keys for Cassandra rows
> > for that user. And also since Cassandra reads are relatively slow, it
> > makes sense to store data like userId to Name mapping in MyISAM as
> > this data would be required after almost all queries to the database.
> >
> > Regards
> > -Asil
> >
> >
> >
> > On Fri, Feb 4, 2011 at 10:14 PM, Ryan King  wrote:
> >> On Thu, Feb 3, 2011 at 9:12 PM, Aklin_81  wrote:
> >>> Thanks Matthew & Ryan,
> >>>
> >>> The main inspiration behind me trying to generate Ids in sequential
> >>> manner is to reduce the size of the userId, since I am using it for
> >>> heavy denormalization. UUIDs are 16 bytes long, but I can also have a
> >>> unique Id in just 4 bytes, and since this is just a one time process
> >>> when the user signs-up, it makes sense to try cutting down the space
> >>> requirements, if it is feasible "without any downsides"(!?).
> >>>
> >>> I am also using userIds to attach to Id of the other data of the user
> >>> on my application. If I could reduce the userId size that I can also
> >>> reduce the size of other Ids, I could drastically cut down the space
> >>> requirements.
> >>>
> >>>
> >>> [Sorry for this question is not directly related to cassandra but I
> >>> think Cassandra factors here because of its  tuneable consistency]
> >>
> >> Don't generate these ids in cassandra. Use something like snowflake,
> >> flickr's ticket servers [2] or zookeeper sequential nodes.
> >>
> >> -ryan
> >>
> >>
> >> 1. http://github.com/twitter/snowflake
> >> 2.
> http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
> >>
>

Re: Do supercolumns have a purpose?

2011-02-06 Thread David Boxenhorn

"My main point was to say that it's think it is better to create tickets for
what you want, rather than for something else completely different that
would, as a by-product, give you what you want."

Then let me say what I want: I want supercolumn families to have any feature
that regular column families have.

My data model is full of supercolumns. I used them, even though I knew it
didn't *have to*, "because they were there", which implied to me that I was
supposed to use them for some good reason. Now I suspect that they will
gradually become less and less functional, as features are added to regular
column families and not supported for supercolumn families.


On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne wrote:

> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone  wrote:
>
>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne wrote:
>>
>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote:
>>>
>>>> The advantage would be to enable secondary indexes on supercolumn
>>>> families.
>>>>
>>>
>>> Then I suggest opening a ticket for adding secondary indexes to
>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>> magnitude less work than getting rid of super column internally, and
>>> probably a much better solution anyway.
>>>
>>
>> I realize that this is largely subjective, and on such matters code speaks
>> louder than words, but I don't think I agree with you on the issue of which
>> alternative is less work, or even which is a better solution.
>>
>
> You are right, I put probably too much emphase in that sentence. My main
> point was to say that it's think it is better to create tickets for what you
> want, rather than for something else completely different that would, as a
> by-product, give you what you want.
> Then I suspect that *if* the only goal is to get secondary indexes on super
> columns, then there is a good chance this would be less work than getting
> rid of super columns. But to be fair, secondary indexes on super columns may
> not make too much sense without #598, which itself would require quite some
> work, so clearly I spoke a bit quickly.
>
>
>> If the goal is to have a hierarchical model, limiting the depth to two
>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>> hierarchy?
>>
>> If a more sophisticated hierarchical model is deemed unnecessary, or
>> impractical, allowing a depth of two seems inconsistent and
>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>> implemented a custom comparator that does the job [1]. Google's Megastore
>> has a similar architecture and goes even further [2].
>>
>> It seems to me that super columns are a historical artifact from
>> Cassandra's early life as Facebook's inbox storage system. They needed
>> posting lists of messages, sharded by user. So that's what they built. In my
>> dealings with the Cassandra code, super columns end up making a mess all
>> over the place when algorithms need to be special cased and branch based on
>> the column/supercolumn distinction.
>>
>> I won't even mention what it does to the thrift interface.
>>
>
> Actually, I agree with you, more than you know. If I were to start coding
> Cassandra now, I wouldn't include super columns (and I would probably not go
> for a depth unlimited hierarchical model either). But it's there and I'm not
> sure getting rid of them fully (meaning, including in thrift) is an option
> (it would be a big compatibility breakage). And (even though I certainly
> though about this more than once :)) I'm slightly less enthusiastic about
> keeping them in thrift but encoding them in regular column family
> internally: it would still be a lot of work but we would still probably end
> up with nasty tricks to stick to the thrift api.
>
> --
> Sylvain
>
>
>> Mike
>>
>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>
>
>

Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn

Well, I am an "actual active developer" and I have "managed to do pretty
nice stuffs with Cassandra" - without secondary indexes so far. But I'm
looking forward to having secondary indexes in my arsenal when new
functional requirements come up, and I'm bummed out that my early design
decision to use supercolums wherever I could, instead of concatenating keys,
has closed off a whole lot of possibilities. I knew when I started that
secondary keys were in the future, if I had known that they would be only
for regular column families I wouldn't have used supercolumn families in the
first place, now I'm pretty much stuck (too late to go back - we're
launching in March).


On Thu, Feb 3, 2011 at 4:44 PM, Sylvain Lebresne wrote:

> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn  wrote:
>
>> The advantage would be to enable secondary indexes on supercolumn
>> families.
>>
>
> Then I suggest opening a ticket for adding secondary indexes to supercolumn
> families and voting on it. This will be 1 or 2 order of magnitude less work
> than getting rid of super column internally, and probably a much better
> solution anyway.
>
>
>> I understand from this thread that indexes are supercolumn families are
>> not going to be:
>>
>> http://www.mail-archive.com/user@cassandra.apache.org/msg09527.html
>>
>
> I should maybe let Jonathan answer this one, but the way I understand it is
> that adding secondary indexes to super column is not a top priority to
> actual active developers. Not that it will never ever happen. And voting for
> tickets in JIRA is one way to help make it raise its priority.
>
> In any case, if the goal you're pursuing is adding secondary indexes to
> super column, then that's the ticket you should open, and if after careful
> consideration it is decided that getting rid of super column is the best way
> to reach that goal then so be it (spoiler: it is not).
>
>
>> Which, it seems to me, effectively deprecates supercolumn families. (I
>> don't see any of the three problems you brought up as overcoming this
>> problem, except, perhaps, for special cases.)
>>
>
> You're untitled to your opinions obviously but I doubt everyone share that
> feeling (I don't for instance). Before 0.7, there was no secondary indexes
> at all and still a bunch of people managed to do pretty nice stuffs with
> Cassandra. In particular denormalized views are sometimes (often?)
> preferable to secondary indexes for performance reasons. For that super
> columns are quite handy.
>
> --
> Sylvain
>
>
>>
>>
>>  On Thu, Feb 3, 2011 at 3:32 PM, Sylvain Lebresne 
>> wrote:
>>
>>> On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn wrote:
>>>
>>>> Thanks Sylvain!
>>>>
>>>> Can I vote for internally implementing supercolumn families as regular
>>>> column families? (With a smooth upgrade process that doesn't require
>>>> shutting down a live cluster.)
>>>>
>>>
>>> I forgot to add that I don't know if this make a lot of sense. That would
>>> be a fairly major refactor (so error prone), you'd still have to deal with
>>> the point I mentioned in my previous mail (for range deletes you would have
>>> to change the on-disk format for instance), and all this for no actual
>>> benefits, even downsides actually (encoded supercolumn will take more space
>>> on-disk (and on-memory)). Super columns are there and work fairly well, so
>>> what would be the point ?
>>>
>>> I'm only just saying that 'in theory', super columns are not the super
>>> shiny magical feature that give you stuff you can't hope to have with only
>>> regular column family. That doesn't make then at least nice.
>>>
>>> That being said, you are free to create whatever ticket you want and vote
>>> for it. Don't expect too much support tough :)
>>>
>>>
>>>> What if supercolumn families were supported as regular column families +
>>>> an index (on what used to be supercolumn keys)? Would that solve some
>>>> problems?
>>>>
>>>
>>> You'd still have to remember for each CF if it has this index on what
>>> used to be supercolumn keys and handle those differently. Really not
>>> convince this would make the code cleaner that how it is now. And making the
>>> code cleaner is really the only reason I can thing of for wanting to get rid
>>> of super columns internally, so ...
>>>
>>>
>>>>
>>>>
>&g

Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn

The advantage would be to enable secondary indexes on supercolumn families.

I understand from this thread that indexes are supercolumn families are not
going to be:

http://www.mail-archive.com/user@cassandra.apache.org/msg09527.html

Which, it seems to me, effectively deprecates supercolumn families. (I don't
see any of the three problems you brought up as overcoming this problem,
except, perhaps, for special cases.)


On Thu, Feb 3, 2011 at 3:32 PM, Sylvain Lebresne wrote:

> On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn  wrote:
>
>> Thanks Sylvain!
>>
>> Can I vote for internally implementing supercolumn families as regular
>> column families? (With a smooth upgrade process that doesn't require
>> shutting down a live cluster.)
>>
>
> I forgot to add that I don't know if this make a lot of sense. That would
> be a fairly major refactor (so error prone), you'd still have to deal with
> the point I mentioned in my previous mail (for range deletes you would have
> to change the on-disk format for instance), and all this for no actual
> benefits, even downsides actually (encoded supercolumn will take more space
> on-disk (and on-memory)). Super columns are there and work fairly well, so
> what would be the point ?
>
> I'm only just saying that 'in theory', super columns are not the super
> shiny magical feature that give you stuff you can't hope to have with only
> regular column family. That doesn't make then at least nice.
>
> That being said, you are free to create whatever ticket you want and vote
> for it. Don't expect too much support tough :)
>
>
>> What if supercolumn families were supported as regular column families +
>> an index (on what used to be supercolumn keys)? Would that solve some
>> problems?
>>
>
> You'd still have to remember for each CF if it has this index on what used
> to be supercolumn keys and handle those differently. Really not convince
> this would make the code cleaner that how it is now. And making the code
> cleaner is really the only reason I can thing of for wanting to get rid of
> super columns internally, so ...
>
>
>>
>>
>> On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne wrote:
>>
>>> > Is there any advantage to using supercolumns
>>> > (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
>>> > columns with concatenated keys
>>> > (columnFamilyName[superColumnName@columnName[val]])?
>>> >
>>> > When I designed my data model, I used supercolumns wherever I needed
>>> two
>>> > levels of key depth - just because they were there, and I figured that
>>> they
>>> > must be there for a reason.
>>> >
>>> > Now I see that in 0.7 secondary indexes don't work on supercolumns or
>>> > subcolumns (is that right?), which seems to me like a very serious
>>> > limitation of supercolumn families.
>>> >
>>> > It raises the question: Is there anything that supercolumn families are
>>> good
>>> > for?
>>>
>>> There is a bunch of queries that you cannot do (or less conveniently) if
>>> you
>>> encode super columns using regular columns with concatenated keys:
>>>
>>> 1) If you use regular columns with concatenated keys, the count argument
>>> count simple columns. With super columns it counts super columns. It
>>> means
>>> that you can't do "give me the 10 first super columns of this row".
>>>
>>> 2) If you need to get x super columns by name, you'll have to issue x
>>> get_slice query (one of each super column). On the client side it sucks.
>>> Internally in Cassandra we could do it reasonably well though.
>>>
>>> 3) You cannot remove entire super columns since there is no support for
>>> range
>>> deletions.
>>>
>>> Moreover, the encoding with concatenated keys uses more disk space (and
>>> less
>>> disk used for the same information means less things to read so it may
>>> have
>>> a slight impact on read performance too -- it's probably really slight on
>>> most
>>> usage but nevertheless).
>>>
>>> > And here's a related question: Why can't Cassandra implement
>>> supercolumn
>>> > families as regular column families, internally, and give you that
>>> > functionality?
>>>
>>> For the 1) and 2) above, we could deal with those internally fairly
>>> easily I
>>> think and rather well (which means it wouldn't be much worse
>>> performance-wise
>>> than with the actual implementaion of super columns, not that it would be
>>> better). For 3), range deletes are harder and would require more
>>> significant
>>> changes (that doesn't mean that Cassandra will never have it). Even
>>> without
>>> that, there would be the disk space lost.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>
>

Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn

Thanks Sylvain!

Can I vote for internally implementing supercolumn families as regular
column families? (With a smooth upgrade process that doesn't require
shutting down a live cluster.)

What if supercolumn families were supported as regular column families + an
index (on what used to be supercolumn keys)? Would that solve some problems?


On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne wrote:

> > Is there any advantage to using supercolumns
> > (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
> > columns with concatenated keys
> > (columnFamilyName[superColumnName@columnName[val]])?
> >
> > When I designed my data model, I used supercolumns wherever I needed two
> > levels of key depth - just because they were there, and I figured that
> they
> > must be there for a reason.
> >
> > Now I see that in 0.7 secondary indexes don't work on supercolumns or
> > subcolumns (is that right?), which seems to me like a very serious
> > limitation of supercolumn families.
> >
> > It raises the question: Is there anything that supercolumn families are
> good
> > for?
>
> There is a bunch of queries that you cannot do (or less conveniently) if
> you
> encode super columns using regular columns with concatenated keys:
>
> 1) If you use regular columns with concatenated keys, the count argument
> count simple columns. With super columns it counts super columns. It means
> that you can't do "give me the 10 first super columns of this row".
>
> 2) If you need to get x super columns by name, you'll have to issue x
> get_slice query (one of each super column). On the client side it sucks.
> Internally in Cassandra we could do it reasonably well though.
>
> 3) You cannot remove entire super columns since there is no support for
> range
> deletions.
>
> Moreover, the encoding with concatenated keys uses more disk space (and
> less
> disk used for the same information means less things to read so it may have
> a slight impact on read performance too -- it's probably really slight on
> most
> usage but nevertheless).
>
> > And here's a related question: Why can't Cassandra implement supercolumn
> > families as regular column families, internally, and give you that
> > functionality?
>
> For the 1) and 2) above, we could deal with those internally fairly easily
> I
> think and rather well (which means it wouldn't be much worse
> performance-wise
> than with the actual implementaion of super columns, not that it would be
> better). For 3), range deletes are harder and would require more
> significant
> changes (that doesn't mean that Cassandra will never have it). Even without
> that, there would be the disk space lost.
>
> --
> Sylvain
>
>

Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn

Is there any advantage to using supercolumns
(columnFamilyName[superColumnName[columnName[val]]]) instead of regular
columns with concatenated keys
(columnFamilyName[superColumnName@columnName[val]])?


When I designed my data model, I used supercolumns wherever I needed two
levels of key depth - just because they were there, and I figured that they
must be there for a reason.

Now I see that in 0.7 secondary indexes don't work on supercolumns or
subcolumns (is that right?), which seems to me like a very serious
limitation of supercolumn families.

It raises the question: Is there anything that supercolumn families are good
for?

And here's a related question: Why can't Cassandra implement supercolumn
families as regular column families, internally, and give you that
functionality?

Re: How to monitor Cassandra's throughput?

2011-02-01 Thread David Boxenhorn

You should bring this up in the Hector user group. It sounds like in the
case of a tie, a random winner should be chosen, instead of taking the first
one.

On Tue, Feb 1, 2011 at 5:50 PM, Oleg Proudnikov wrote:

> Thanks for the insight, Jonathan!
>
> As it turns out using single threaded clients with Hector's
> LeastActiveBalancingPolicy leads to the first node always winning :-)
>
> Is StorageProxy bean the only way to detect this, considering that all
> nodes are
> evenly loaded?
>
> Oleg
>
>
>
>
>
>

Re: Use Cassandra to store 2 million records of persons

2011-01-20 Thread David Boxenhorn

Cassandra is not a good solution for data mining type problems, since it
doesn't have ad-hoc queries. Cassandra is designed to maximize throughput,
which is not usually a problem for data mining.

On Thu, Jan 20, 2011 at 2:07 PM, Surender Singh wrote:

> Hi All
>
> I want to use Apache Cassandra to store information (like first name, last
> name, gender, address)  about 2 million people.  Then need to perform
> analytic and reporting on that data.
> is need to store information about 2 million people in Mysql and then
> transfer that information into Cassandra.?
>
> Please help me as i m new to Apache Cassandra.
>
> if you have any use case like that, please share.
>
> Thanks and regards
> Surender Singh
>
>

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread David Boxenhorn

I have added my comments to this issue:

https://issues.apache.org/jira/browse/CASSANDRA-2006

Good luck!

On Thu, Jan 20, 2011 at 1:53 PM, indika kumara wrote:

> Thanks David We decided to do it at our client-side as the initial
> implementation. I will investigate the approaches for supporting the fine
> grained control of the resources consumed by a sever, tenant, and CF.
>
> Thanks,
>
> Indika
>
> On Thu, Jan 20, 2011 at 3:20 PM, David Boxenhorn wrote:
>
>> As far as I can tell, if Cassandra supports three levels of configuration
>> (server, keyspace, column family) we can support multi-tenancy. It is
>> trivial to give each tenant their own keyspace (e.g. just use the tenant's
>> id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
>> on the CF level will be stopped at the keyspace and server level before
>> doing any damage.)
>>
>> I don't think Cassandra needs to know about end-users. From Cassandra's
>> point of view the tenant is the user.
>>
>> On Thu, Jan 20, 2011 at 7:00 AM, indika kumara wrote:
>>
>>> +1   Are there JIRAs for these requirements? I would like to contribute
>>> from my capacity.
>>>
>>> As per my understanding, to support some muti-tenant models, it is needed
>>> to qualified keyspaces' names, Cfs' names, etc. with the tenant namespace
>>> (or id). The easiest way to do this would be to modify corresponding
>>> constructs transparently. I tought of a stage (optional and configurable)
>>> prior to authorization. Is there any better solutions? I appreciate the
>>> community's suggestions.
>>>
>>> Moreover, It is needed to send the tenant NS(id) with the user
>>> credentials (A users belongs to this tenant (org.)). For that purpose, I
>>> thought of using the user credentials in the AuthenticationRequest. s there
>>> any better solution?
>>>
>>> I would like to have a MT support at the Cassandra level which is
>>> optional and configurable.
>>>
>>> Thanks,
>>>
>>> Indika
>>>
>>>
>>> On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn wrote:
>>>
>>>> Yes, the way I see it - and it becomes even more necessary for a
>>>> multi-tenant configuration - there should be completely separate
>>>> configurations for applications and for servers.
>>>>
>>>> - Application configuration is based on data and usage characteristics
>>>> of your application.
>>>> - Server configuration is based on the specific hardware limitations of
>>>> the server.
>>>>
>>>> Obviously, server limitations take priority over application
>>>> configuration.
>>>>
>>>> Assuming that each tenant in a multi-tenant environment gets one
>>>> keyspace, you would also want to enforce limitations based on keyspace
>>>> (which correspond to parameters that the tenant payed for).
>>>>
>>>> So now we have three levels:
>>>>
>>>> 1. Server configuration (top priority)
>>>> 2. Keyspace configuration (payed-for service - second priority)
>>>> 3. Column family configuration (configuration provided by tenant - third
>>>> priority)
>>>>
>>>>
>>>> On Wed, Jan 19, 2011 at 3:15 PM, indika kumara 
>>>> wrote:
>>>>
>>>>> As the actual problem is mostly related to the number of CFs in the
>>>>> system (may be number of the columns), I still believe that supporting
>>>>> exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
>>>>> need some fixes.  That multi-tenancy model allows a tenant to use the
>>>>> programming model of the Cassandra ‘as-is’, enabling the seamless 
>>>>> migration
>>>>> of an application that uses the Cassandra into the cloud. Moreover, In 
>>>>> order
>>>>> to support different SLA requirements of different tenants, the
>>>>> configurability of keyspaces, cfs, etc., per tenant may be critical.
>>>>> However, there are trade-offs among usability, memory consumption, and
>>>>> performance. I believe that it is important to consider the SLA 
>>>>> requirements
>>>>> of different tenants when deciding the strategies for controlling resource
>>>>> consumption.
>>>>>
>>>>> I like to the idea of system-wide parameters for controlling resource
>>>>>

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread David Boxenhorn

As far as I can tell, if Cassandra supports three levels of configuration
(server, keyspace, column family) we can support multi-tenancy. It is
trivial to give each tenant their own keyspace (e.g. just use the tenant's
id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
on the CF level will be stopped at the keyspace and server level before
doing any damage.)

I don't think Cassandra needs to know about end-users. From Cassandra's
point of view the tenant is the user.

On Thu, Jan 20, 2011 at 7:00 AM, indika kumara wrote:

> +1   Are there JIRAs for these requirements? I would like to contribute
> from my capacity.
>
> As per my understanding, to support some muti-tenant models, it is needed
> to qualified keyspaces' names, Cfs' names, etc. with the tenant namespace
> (or id). The easiest way to do this would be to modify corresponding
> constructs transparently. I tought of a stage (optional and configurable)
> prior to authorization. Is there any better solutions? I appreciate the
> community's suggestions.
>
> Moreover, It is needed to send the tenant NS(id) with the user credentials
> (A users belongs to this tenant (org.)). For that purpose, I thought of
> using the user credentials in the AuthenticationRequest. s there any better
> solution?
>
> I would like to have a MT support at the Cassandra level which is optional
> and configurable.
>
> Thanks,
>
> Indika
>
>
> On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn wrote:
>
>> Yes, the way I see it - and it becomes even more necessary for a
>> multi-tenant configuration - there should be completely separate
>> configurations for applications and for servers.
>>
>> - Application configuration is based on data and usage characteristics of
>> your application.
>> - Server configuration is based on the specific hardware limitations of
>> the server.
>>
>> Obviously, server limitations take priority over application
>> configuration.
>>
>> Assuming that each tenant in a multi-tenant environment gets one keyspace,
>> you would also want to enforce limitations based on keyspace (which
>> correspond to parameters that the tenant payed for).
>>
>> So now we have three levels:
>>
>> 1. Server configuration (top priority)
>> 2. Keyspace configuration (payed-for service - second priority)
>> 3. Column family configuration (configuration provided by tenant - third
>> priority)
>>
>>
>> On Wed, Jan 19, 2011 at 3:15 PM, indika kumara wrote:
>>
>>> As the actual problem is mostly related to the number of CFs in the
>>> system (may be number of the columns), I still believe that supporting
>>> exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
>>> need some fixes.  That multi-tenancy model allows a tenant to use the
>>> programming model of the Cassandra ‘as-is’, enabling the seamless migration
>>> of an application that uses the Cassandra into the cloud. Moreover, In order
>>> to support different SLA requirements of different tenants, the
>>> configurability of keyspaces, cfs, etc., per tenant may be critical.
>>> However, there are trade-offs among usability, memory consumption, and
>>> performance. I believe that it is important to consider the SLA requirements
>>> of different tenants when deciding the strategies for controlling resource
>>> consumption.
>>>
>>> I like to the idea of system-wide parameters for controlling resource
>>> usage. I believe that the tenant-specific parameters are equally important.
>>> There are resources, and each tenant can claim a portion of them based on
>>> SLA. For instance, if there is a threshold on the number of columns per a
>>> node, it should be able to decide how many columns a particular tenant can
>>> have.  It allows selecting a suitable Cassandra cluster for a tenant based
>>> on his or her SLA. I believe the capability to configure resource
>>> controlling parameters per keyspace would be important to support a keyspace
>>> per tenant model. Furthermore, In order to maximize the resource sharing
>>> among tenants, a threshold (on a resource) per keyspace should not be a hard
>>> limit. Rather, it should be oscillated between a hard minimum and a maximum.
>>> For example, if a particular tenant needs more resources at a given time, he
>>> or she should be possible to borrow from the others up to the maximum. The
>>> threshold is only considered when a tenant is assigned to a cluster - the
>>> remaining resources of a cluster should be equal or higher than the resourc

Re: Getting the version number

2011-01-19 Thread David Boxenhorn

Yet another reason to move up to 0.7...

Thanks.

On Wed, Jan 19, 2011 at 5:27 PM, Daniel Lundin  wrote:

> in 0.7 nodetool has a `version` command.
>
> On Wed, Jan 19, 2011 at 4:09 PM, David Boxenhorn 
> wrote:
> > Is there any way to use nodetool (or anything else) to get the Cassandra
> > version number of a deployed cluster?
> >
>

Getting the version number

2011-01-19 Thread David Boxenhorn

Is there any way to use nodetool (or anything else) to get the Cassandra
version number of a deployed cluster?

Re: Multi-tenancy, and authentication and authorization

2011-01-19 Thread David Boxenhorn

Yes, the way I see it - and it becomes even more necessary for a
multi-tenant configuration - there should be completely separate
configurations for applications and for servers.

- Application configuration is based on data and usage characteristics of
your application.
- Server configuration is based on the specific hardware limitations of the
server.

Obviously, server limitations take priority over application configuration.

Assuming that each tenant in a multi-tenant environment gets one keyspace,
you would also want to enforce limitations based on keyspace (which
correspond to parameters that the tenant payed for).

So now we have three levels:

1. Server configuration (top priority)
2. Keyspace configuration (payed-for service - second priority)
3. Column family configuration (configuration provided by tenant - third
priority)


On Wed, Jan 19, 2011 at 3:15 PM, indika kumara wrote:

> As the actual problem is mostly related to the number of CFs in the system
> (may be number of the columns), I still believe that supporting exposing the
> Cassandra ‘as-is’ to a tenant is doable and suitable though need some
> fixes.  That multi-tenancy model allows a tenant to use the programming
> model of the Cassandra ‘as-is’, enabling the seamless migration of an
> application that uses the Cassandra into the cloud. Moreover, In order to
> support different SLA requirements of different tenants, the configurability
> of keyspaces, cfs, etc., per tenant may be critical. However, there are
> trade-offs among usability, memory consumption, and performance. I believe
> that it is important to consider the SLA requirements of different tenants
> when deciding the strategies for controlling resource consumption.
>
> I like to the idea of system-wide parameters for controlling resource
> usage. I believe that the tenant-specific parameters are equally important.
> There are resources, and each tenant can claim a portion of them based on
> SLA. For instance, if there is a threshold on the number of columns per a
> node, it should be able to decide how many columns a particular tenant can
> have.  It allows selecting a suitable Cassandra cluster for a tenant based
> on his or her SLA. I believe the capability to configure resource
> controlling parameters per keyspace would be important to support a keyspace
> per tenant model. Furthermore, In order to maximize the resource sharing
> among tenants, a threshold (on a resource) per keyspace should not be a hard
> limit. Rather, it should be oscillated between a hard minimum and a maximum.
> For example, if a particular tenant needs more resources at a given time, he
> or she should be possible to borrow from the others up to the maximum. The
> threshold is only considered when a tenant is assigned to a cluster - the
> remaining resources of a cluster should be equal or higher than the resource
> limit of the tenant. It may need to spread a single keyspace across multiple
> clusters; especially when there are no enough resources in a single
> cluster.
>
> I believe that it would be better to have a flexibility to change
> seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
> the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
> on what I have learnt, each model requires adding tenant id (name space) to
> a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
> better to have a kind of pluggable handler that can access those resources
> prior to doing the actual operation so that the required changes can be
> done? May be prior to authorization.
>
> Thanks,
>
> Indika
>

Re: Multi-tenancy, and authentication and authorization

2011-01-19 Thread David Boxenhorn

+1



On Wed, Jan 19, 2011 at 10:35 AM, Stu Hood  wrote:

> Opened https://issues.apache.org/jira/browse/CASSANDRA-2006 with the
> solution we had suggested on the MultiTenant wiki page.
>
>
> On Tue, Jan 18, 2011 at 11:56 PM, David Boxenhorn wrote:
>
>> I think tuning of Cassandra is overly complex, and even with a single
>> tenant you can run into problems with too many CFs.
>>
>> Right now there is a one-to-one mapping between memtables and SSTables.
>> Instead of that, would it be possible to have one giant memtable for each
>> Cassandra instance, with partial flushing to SSTs?
>>
>> It seems to me like a single memtable would make it MUCH easier to tune
>> Cassandra, since the decision whether to (partially) flush the memtable to
>> disk could be made on a node-wide basis, based on the resources you really
>> have, instead of the guess-work that we are forced to do today.
>>
>
>

Re: Multi-tenancy, and authentication and authorization

2011-01-19 Thread David Boxenhorn

I'm not sure that "you'd still want to retain the ability to individually
control how flushing happens on a per-cf basis in order to cater to
different workloads that benefit from different flushing behavior". It seems
to me like a good system-wide algorithm that works dynamically, and takes
into account moment-by-moment usage, can do this better than a human who is
guessing and making decisions on a static basis.

Having said that, my suggestion doesn't really depend so much on having one
memtable or many. Rather, it depends on making flushing behavior dependent
on system-wide parameters, which reflect the actual physical resources
available per node, rather than per-CF parameters (though per-CF tuning can
be taken into account, it should be a suggestion that gets overridden by
system-wide needs).

On Wed, Jan 19, 2011 at 10:48 AM, Peter Schuller <
peter.schul...@infidyne.com> wrote:

> > Right now there is a one-to-one mapping between memtables and SSTables.
> > Instead of that, would it be possible to have one giant memtable for each
> > Cassandra instance, with partial flushing to SSTs?
>
> I think a complication here is that, although I agree things need to
> be easier to tweak at least for the common case, I'm pretty sure you'd
> still want to retain the ability to individually control how flushing
> happens on a per-cf basis in order to cater to different workloads
> that benefit from different flushing behavior.
>
> I suspect the main concern here may be that there is a desire to have
> better overal control over how flushing happens and when writes start
> blocking, rather than necessarily implying that there can't be more
> than one memtable (the ticket Stu posted seems to address one such
> means of control).
>
> --
> / Peter Schuller
>

Re: Multi-tenancy, and authentication and authorization

2011-01-18 Thread David Boxenhorn

I think tuning of Cassandra is overly complex, and even with a single tenant
you can run into problems with too many CFs.

Right now there is a one-to-one mapping between memtables and SSTables.
Instead of that, would it be possible to have one giant memtable for each
Cassandra instance, with partial flushing to SSTs?

It seems to me like a single memtable would make it MUCH easier to tune
Cassandra, since the decision whether to (partially) flush the memtable to
disk could be made on a node-wide basis, based on the resources you really
have, instead of the guess-work that we are forced to do today.

Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread David Boxenhorn

Thanks.

On Tue, Jan 18, 2011 at 3:55 PM, Sylvain Lebresne wrote:

> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn 
> wrote:
> > Thanks, Aaron, but I'm not 100% clear.
> >
> > My situation is this: My use case spins off rows (not columns) that I no
> > longer need and want to delete. It is possible that these rows were never
> > created in the first place, or were already deleted. This is a very large
> > cleanup task that normally deletes a lot of rows, and the last thing that
> I
> > want to do is create tombstones for rows that didn't exist in the first
> > place, or lengthen the life on disk of tombstones of rows that are
> already
> > deleted.
> >
> > So the question is: before I delete, do I have to retrieve the row to see
> if
> > it exists in the first place?
>
> Yes, in your situation you do.
>
> >
> >
> >
> > On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton 
> > wrote:
> >>
> >> AFAIK that's not necessary, there is no need to worry about previous
> >> deletes. You can delete stuff that does not even exist, neither
> batch_mutate
> >> or remove are going to throw an error.
> >> All the columns that were (roughly speaking) present at your first
> >> deletion will be available for GC at the end of the first tombstones
> life.
> >> Same for the second.
> >> Say you were to write a col between the two deletes with the same name
> as
> >> one present at the start. The first version of the col is avail for GC
> after
> >> tombstone 1, and the second after tombstone 2.
> >> Hope that helps
> >> Aaron
> >> On 18/01/2011, at 9:37 PM, David Boxenhorn  wrote:
> >>
> >> Thanks. In other words, before I delete something, I should check to see
> >> whether it exists as a live row in the first place.
> >>
> >> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King  wrote:
> >>>
> >>> On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn 
> >>> wrote:
> >>> > If I delete a row, and later on delete it again, before
> GCGraceSeconds
> >>> > has
> >>> > elapsed, does the tombstone live longer?
> >>>
> >>> Each delete is a new tombstone, which should answer your question.
> >>>
> >>> -ryan
> >>>
> >>> > In other words, if I have the following scenario:
> >>> >
> >>> > GCGraceSeconds = 10 days
> >>> > On day 1 I delete a row
> >>> > On day 5 I delete the row again
> >>> >
> >>> > Will the tombstone be removed on day 10 or day 15?
> >>> >
> >>
> >
> >
>

Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread David Boxenhorn

Thanks, Aaron, but I'm not 100% clear.

My situation is this: My use case spins off rows (not columns) that I no
longer need and want to delete. It is possible that these rows were never
created in the first place, or were already deleted. This is a very large
cleanup task that normally deletes a lot of rows, and the last thing that I
want to do is create tombstones for rows that didn't exist in the first
place, or lengthen the life on disk of tombstones of rows that are already
deleted.

So the question is: before I delete, do I have to retrieve the row to see if
it exists in the first place?

On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton wrote:

> AFAIK that's not necessary, there is no need to worry about previous
> deletes. You can delete stuff that does not even exist, neither batch_mutate
> or remove are going to throw an error.
>
> All the columns that were (roughly speaking) present at your first deletion
> will be available for GC at the end of the first tombstones life. Same for
> the second.
>
> Say you were to write a col between the two deletes with the same name as
> one present at the start. The first version of the col is avail for GC after
> tombstone 1, and the second after tombstone 2.
>
> Hope that helps
> Aaron
>
> On 18/01/2011, at 9:37 PM, David Boxenhorn  wrote:
>
> Thanks. In other words, before I delete something, I should check to see
> whether it exists as a live row in the first place.
>
> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King < 
> r...@twitter.com> wrote:
>
>> On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn < 
>> da...@lookin2.com> wrote:
>> > If I delete a row, and later on delete it again, before GCGraceSeconds
>> has
>> > elapsed, does the tombstone live longer?
>>
>> Each delete is a new tombstone, which should answer your question.
>>
>> -ryan
>>
>> > In other words, if I have the following scenario:
>> >
>> > GCGraceSeconds = 10 days
>> > On day 1 I delete a row
>> > On day 5 I delete the row again
>> >
>> > Will the tombstone be removed on day 10 or day 15?
>> >
>>
>
>

Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread David Boxenhorn

Thanks. In other words, before I delete something, I should check to see
whether it exists as a live row in the first place.

On Tue, Jan 18, 2011 at 9:24 AM, Ryan King  wrote:

> On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn 
> wrote:
> > If I delete a row, and later on delete it again, before GCGraceSeconds
> has
> > elapsed, does the tombstone live longer?
>
> Each delete is a new tombstone, which should answer your question.
>
> -ryan
>
> > In other words, if I have the following scenario:
> >
> > GCGraceSeconds = 10 days
> > On day 1 I delete a row
> > On day 5 I delete the row again
> >
> > Will the tombstone be removed on day 10 or day 15?
> >
>

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread David Boxenhorn

I think you should just tell everybody that if you want to use QUORUM you
need RF >= 3 for it to be meaningful.

No one would use QUORUM with RF < 3 except in error.

On Mon, Jan 17, 2011 at 6:08 PM, Jonathan Ellis  wrote:

> On Mon, Jan 17, 2011 at 9:55 AM, Samuel Benz 
> wrote:
> > We have a cluster with 4 nodes. ReplicationFactor is 2, ReplicaPlacment
> > is the RackAwareStrategy and the EndpointSnitch is the
> > PropertyFileEndpointSnitch (with two data center and two racks each).
> >
> > My understanding is, that with this parameter of the cluster, it should
> > be possible to update with consistency level quorum while one data
> > center (two nodes) are shutdown completely.
>
> No.  Quorum of 2 is 2.
>
> > Case1:
> > If 'TEST' was previous stored on Node1, Node2, Node3 -> The update will
> > succeed.
> >
> > Case2:
> > If 'TEST' was previous stored on Node2, Node3, Node4 -> The update will
> > not work.
>
> If you have RF=2 then it will be stored on 2 nodes, not 3.  I think
> this is the source of the confusion.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Tombstone lifespan after multiple deletions

2011-01-16 Thread David Boxenhorn

If I delete a row, and later on delete it again, before GCGraceSeconds has
elapsed, does the tombstone live longer?

In other words, if I have the following scenario:

GCGraceSeconds = 10 days
On day 1 I delete a row
On day 5 I delete the row again

Will the tombstone be removed on day 10 or day 15?

Re: Usage Pattern : "unique" value of a key.

2011-01-13 Thread David Boxenhorn

"It is unlikely that both racing threads will have exactly the same
microsecond timestamp at the moment of creating a new user - so if data you
read
have exactly the same timestamp you used to write data - this is your data."

I think this would have to be combined with CL=QUORUM for both write and
read.

On Thu, Jan 13, 2011 at 9:57 AM, Oleg Anastasyev  wrote:

> Benoit Perroud  noisette.ch> writes:
>
> >
> > My idea to solve such use case is to have both thread writing the
> > username, but with a colum like "lock-", and then read
> > the row, and find out if the first lock column appearing belong to the
> > thread. If this is the case, it can continue the process, otherwise it
> > has been preempted by another thread.
>
> This looks ok for this task. As an alternative you can avoid creating extra
> \lock-random value' column and compare timestamps of new user data you just
> written. It is unlikely that both racing threads will have exactly the same
> microsecond timestamp at the moment of creating a new user - so if data you
> read
> have exactly the same timestamp you used to write data - this is your data.
>
> Another possible way is to use some external lock coordinator, eg
> zookeeper.
> Although for this task it looks a bit overkill, but this can become even
> more
> valuable, if you have more data concurrency issues to solve and can bear
> extra
> 5-10ms update operations latency.
>
>

Re: Why my posts are marked as spam?

2011-01-12 Thread David Boxenhorn

What's wrong with topposting?

This email is non-plain and topposted...

On Wed, Jan 12, 2011 at 4:32 PM, zGreenfelder wrote:

> >
> > On 12 January 2011 05:28, Oleg Tsvinev  wrote:
> > > Whatever I do, it happens :(
> >On Wed, Jan 12, 2011 at 1:53 AM, Arijit Mukherjee 
> wrote:
> >
> > I think this happens for RTF. Some of the mails in the post are RTF,
> > and the reply button creates an RTF reply - that's when it happens.
> > Wonder how the mail to which I replied was in RTF...
> >
> > Arijit
> >
> >
> > --
> > "And when the night is cloudy,
> > There is still a light that shines on me,
> > Shine on until tomorrow, let it be."
>
> I think it happens for any non-plain text.. be it RTF, HTML, or
> whatever.   at least that's been my limited experience with mailing
> lists.
>
> and for what it's worth (I just had to correct myself, so don't take
> this as huge criticism), many people are also opposed to topposting ..
> or adding a reply to the top of an email.   FWIW.
>
> --
> Even the Magic 8 ball has an opinion on email clients: Outlook not so good.
>

Re: Reclaim deleted rows space

2011-01-12 Thread David Boxenhorn

I think that if SSTs are partitioned within the node using RP, so that each
partition is small and can be compacted independently of all other
partitions, you can implement an algorithm that will spread out the work of
compaction over time so that it never takes a node out of commission, as it
does now.

I have left a comment here to that effect here:

https://issues.apache.org/jira/browse/CASSANDRA-1608?focusedCommentId=12980654&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12980654

On Mon, Jan 10, 2011 at 10:56 PM, Jonathan Ellis  wrote:

> I'd suggest describing your approach on
> https://issues.apache.org/jira/browse/CASSANDRA-1608, and if it's
> attractive, porting it to 0.8.  It's too late for us to make deep
> changes in 0.6 and probably even 0.7 for the sake of stability.
>
> On Mon, Jan 10, 2011 at 8:00 AM, shimi  wrote:
> > I modified the code to limit the size of the SSTables.
> > I will be glad if someone can take a look at it
> > https://github.com/Shimi/cassandra/tree/cassandra-0.6
> > Shimi
> >
> > On Fri, Jan 7, 2011 at 2:04 AM, Jonathan Shook  wrote:
> >>
> >> I believe the following condition within submitMinorIfNeeded(...)
> >> determines whether to continue, so it's not a hard loop.
> >>
> >> // if (sstables.size() >= minThreshold) ...
> >>
> >>
> >>
> >> On Thu, Jan 6, 2011 at 2:51 AM, shimi  wrote:
> >> > According to the code it make sense.
> >> > submitMinorIfNeeded() calls doCompaction() which
> >> > calls submitMinorIfNeeded().
> >> > With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always
> >> > run
> >> > compaction.
> >> >
> >> > Shimi
> >> > On Thu, Jan 6, 2011 at 10:26 AM, shimi  wrote:
> >> >>
> >> >>
> >> >> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis 
> >> >> wrote:
> >> >>>
> >> >>> Pretty sure there's logic in there that says "don't bother
> compacting
> >> >>> a single sstable."
> >> >>
> >> >> No. You can do it.
> >> >> Based on the log I have a feeling that it triggers an infinite
> >> >> compaction
> >> >> loop.
> >> >>
> >> >>>
> >> >>> On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
> >> >>> > How does minor compaction is triggered? Is it triggered Only when
> a
> >> >>> > new
> >> >>> > SStable is added?
> >> >>> >
> >> >>> > I was wondering if triggering a compaction
> >> >>> > with minimumCompactionThreshold
> >> >>> > set to 1 would be useful. If this can happen I assume it will do
> >> >>> > compaction
> >> >>> > on files with similar size and remove deleted rows on the rest.
> >> >>> > Shimi
> >> >>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
> >> >>> > 
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> > I don't have a problem with disk space. I have a problem with
> the
> >> >>> >> > data
> >> >>> >> > size.
> >> >>> >>
> >> >>> >> [snip]
> >> >>> >>
> >> >>> >> > Bottom line is that I want to reduce the number of requests
> that
> >> >>> >> > goes to
> >> >>> >> > disk. Since there is enough data that is no longer valid I can
> do
> >> >>> >> > it
> >> >>> >> > by
> >> >>> >> > reclaiming the space. The only way to do it is by running Major
> >> >>> >> > compaction.
> >> >>> >> > I can wait and let Cassandra do it for me but then the data
> size
> >> >>> >> > will
> >> >>> >> > get
> >> >>> >> > even bigger and the response time will be worst. I can do it
> >> >>> >> > manually
> >> >>> >> > but I
> >> >>> >> > prefer it to happen in the background with less impact on the
> >> >>> >> > system
> >> >>> >>
> >> >>> >> Ok - that makes perfect sense then. Sorry for misunderstanding :)
> >> >>> >>
> >> >>> >> So essentially, for workloads that are teetering on the edge of
> >> >>> >> cache
> >> >>> >> warmness and is subject to significant overwrites or removals, it
> >> >>> >> may
> >> >>> >> be beneficial to perform much more aggressive background
> compaction
> >> >>> >> even though it might waste lots of CPU, to keep the in-memory
> >> >>> >> working
> >> >>> >> set down.
> >> >>> >>
> >> >>> >> There was talk (I think in the compaction redesign ticket) about
> >> >>> >> potentially improving the use of bloom filters such that obsolete
> >> >>> >> data
> >> >>> >> in sstables could be eliminated from the read set without
> >> >>> >> necessitating actual compaction; that might help address cases
> like
> >> >>> >> these too.
> >> >>> >>
> >> >>> >> I don't think there's a pre-existing silver bullet in a current
> >> >>> >> release; you probably have to live with the need for
> >> >>> >> greater-than-theoretically-optimal memory requirements to keep
> the
> >> >>> >> working set in memory.
> >> >>> >>
> >> >>> >> --
> >> >>> >> / Peter Schuller
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Jonathan Ellis
> >> >>> Project Chair, Apache Cassandra
> >> >>> co-founder of Riptano, the source for professional Cassandra support
> >> >>> http://riptano.com
> >> >>
> >> >
> >> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for profession

Re: The CLI sometimes gets 100 results even though there are more, and sometimes gets more than 100

2011-01-05 Thread David Boxenhorn

I know that there's a limit, and I just assumed that the CLI set it to 100,
until I saw more than 100 results.

On Wed, Jan 5, 2011 at 6:56 PM, Peter Schuller
wrote:

> > The CLI sometimes gets only 100 results (even though there are more) -
> and
> > sometimes gets all the results, even when there are more than 100!
> >
> > What is going on here? Is there some logic that says if there are too
> many
> > results return 100, even though "too many" can be more than 100?
>
> API calls have a limit since streaming is not supported and you could
> potentially have almost arbitrary large result sets. I believe
> cassandra-cli will allow you to set the limit if you look at the
> 'help' output and look for the word 'limit'.
>
> The way to iterate over large amounts of data is to do paging, with
> multiple queries.
>
> --
> / Peter Schuller
>

Re: Bootstrapping taking long

2011-01-05 Thread David Boxenhorn

If "seed list should be the same across the cluster" that means that nodes
*should* have themselves as a seed. If that doesn't work for Ran, then that
is the first problem, no?


On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani  wrote:

> Well your ring issues don't make sense to me, seed list should be the same
> across the cluster.
> I'm just thinking of other things to try, non-boostrapped nodes should join
> the ring instantly but reads will fail if you aren't using quorum.
>
>
> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:
>
>> I haven't tried repair.  Should I?
>> On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
>> > Have you tried not bootstrapping but setting the token and manually
>> calling
>> > repair?
>> >
>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory  wrote:
>> >
>> >> My conclusion is lame: I tried this on several hosts and saw the same
>> >> behavior, the only way I was able to join new nodes was to first start
>> them
>> >> when they are *not in* their own seeds list and after they
>> >> finish transferring the data, then restart them with themselves *in*
>> their
>> >> own seeds list. After doing that the node would join the ring.
>> >> This is either my misunderstanding or a bug, but the only place I found
>> it
>> >> documented stated that the new node should not be in its own seeds
>> list.
>> >> Version 0.6.6.
>> >>
>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn > >wrote:
>> >>
>> >>> My nodes all have themselves in their list of seeds - always did - and
>> >>> everything works. (You may ask why I did this. I don't know, I must
>> have
>> >>> copied it from an example somewhere.)
>> >>>
>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
>> >>>
>> >>>> I was able to make the node join the ring but I'm confused.
>> >>>> What I did is, first when adding the node, this node was not in the
>> seeds
>> >>>> list of itself. AFAIK this is how it's supposed to be. So it was able
>> to
>> >>>> transfer all data to itself from other nodes but then it stayed in
>> the
>> >>>> bootstrapping state.
>> >>>> So what I did (and I don't know why it works), is add this node to
>> the
>> >>>> seeds list in its own storage-conf.xml file. Then restart the server
>> and
>> >>>> then I finally see it in the ring...
>> >>>> If I had added the node to the seeds list of itself when first
>> joining
>> >>>> it, it would not join the ring but if I do it in two phases it did
>> work.
>> >>>> So it's either my misunderstanding or a bug...
>> >>>>
>> >>>>
>> >>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
>> >>>>
>> >>>>> The new node does not see itself as part of the ring, it sees all
>> others
>> >>>>> but itself, so from that perspective the view is consistent.
>> >>>>> The only problem is that the node never finishes to bootstrap. It
>> stays
>> >>>>> in this state for hours (It's been 20 hours now...)
>> >>>>>
>> >>>>>
>> >>>>> $ bin/nodetool -p 9004 -h localhost streams
>> >>>>>> Mode: Bootstrapping
>> >>>>>> Not sending any streams.
>> >>>>>> Not receiving any streams.
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall 
>> wrote:
>> >>>>>
>> >>>>>> Does the new node have itself in the list of seeds per chance? This
>> >>>>>> could cause some issues if so.
>> >>>>>>
>> >>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory 
>> wrote:
>> >>>>>> > I'm still at lost. I haven't been able to resolve this. I tried
>> >>>>>> > adding another node at a different location on the ring but this
>> node
>> >>>>>> > too remains stuck in the bootstrapping state for many hours
>> without
>> >>>>>> > any of the other nodes being busy with anti compaction or
>> anything
>> >>>>>> > else.

The CLI sometimes gets 100 results even though there are more, and sometimes gets more than 100

2011-01-05 Thread David Boxenhorn

The CLI sometimes gets only 100 results (even though there are more) - and
sometimes gets all the results, even when there are more than 100!

What is going on here? Is there some logic that says if there are too many
results return 100, even though "too many" can be more than 100?

Re: Bootstrapping taking long

2011-01-05 Thread David Boxenhorn

I started all my nodes the first time with seeds in their own lists, and it
worked. I think I started them in 0.6.1, but I'm not sure. (I'm now using
0.6.8).


On Wed, Jan 5, 2011 at 2:07 PM, Ran Tavory  wrote:

> My conclusion is lame: I tried this on several hosts and saw the same
> behavior, the only way I was able to join new nodes was to first start them
> when they are *not in* their own seeds list and after they
> finish transferring the data, then restart them with themselves *in* their
> own seeds list. After doing that the node would join the ring.
> This is either my misunderstanding or a bug, but the only place I found it
> documented stated that the new node should not be in its own seeds list.
> Version 0.6.6.
>
> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn wrote:
>
>> My nodes all have themselves in their list of seeds - always did - and
>> everything works. (You may ask why I did this. I don't know, I must have
>> copied it from an example somewhere.)
>>
>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
>>
>>> I was able to make the node join the ring but I'm confused.
>>> What I did is, first when adding the node, this node was not in the seeds
>>> list of itself. AFAIK this is how it's supposed to be. So it was able to
>>> transfer all data to itself from other nodes but then it stayed in the
>>> bootstrapping state.
>>> So what I did (and I don't know why it works), is add this node to the
>>> seeds list in its own storage-conf.xml file. Then restart the server and
>>> then I finally see it in the ring...
>>> If I had added the node to the seeds list of itself when first joining
>>> it, it would not join the ring but if I do it in two phases it did work.
>>> So it's either my misunderstanding or a bug...
>>>
>>>
>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
>>>
>>>> The new node does not see itself as part of the ring, it sees all others
>>>> but itself, so from that perspective the view is consistent.
>>>> The only problem is that the node never finishes to bootstrap. It stays
>>>> in this state for hours (It's been 20 hours now...)
>>>>
>>>>
>>>> $ bin/nodetool -p 9004 -h localhost streams
>>>>> Mode: Bootstrapping
>>>>> Not sending any streams.
>>>>> Not receiving any streams.
>>>>
>>>>
>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:
>>>>
>>>>> Does the new node have itself in the list of seeds per chance? This
>>>>> could cause some issues if so.
>>>>>
>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
>>>>> > I'm still at lost.   I haven't been able to resolve this. I tried
>>>>> > adding another node at a different location on the ring but this node
>>>>> > too remains stuck in the bootstrapping state for many hours without
>>>>> > any of the other nodes being busy with anti compaction or anything
>>>>> > else. I don't know what's keeping it from finishing the bootstrap,no
>>>>> > CPU, no io, files were already streamed so what is it waiting for?
>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
>>>>> > be anything addressing a similar issue so I figured there was no
>>>>> point
>>>>> > in upgrading. But let me know if you think there is.
>>>>> > Or any other advice...
>>>>> >
>>>>> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
>>>>> >> Thanks Jake, but unfortunately the streams directory is empty so I
>>>>> don't think that any of the nodes is anti-compacting data right now or had
>>>>> been in the past 5 hours. It seems that all the data was already 
>>>>> transferred
>>>>> to the joining host but the joining node, after having received the data
>>>>> would still remain in bootstrapping mode and not join the cluster. I'm not
>>>>> sure that *all* data was transferred (perhaps other nodes need to transfer
>>>>> more data) but nothing is actually happening so I assume all has been 
>>>>> moved.
>>>>> >> Perhaps it's a configuration error from my part. Should I use I use
>>>>> AutoBootstrap=true ? Anything else I should look out for in the
>>>>> configuration file or something else?
>>&g

Re: Bootstrapping taking long

2011-01-05 Thread David Boxenhorn

My nodes all have themselves in their list of seeds - always did - and
everything works. (You may ask why I did this. I don't know, I must have
copied it from an example somewhere.)

On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:

> I was able to make the node join the ring but I'm confused.
> What I did is, first when adding the node, this node was not in the seeds
> list of itself. AFAIK this is how it's supposed to be. So it was able to
> transfer all data to itself from other nodes but then it stayed in the
> bootstrapping state.
> So what I did (and I don't know why it works), is add this node to the
> seeds list in its own storage-conf.xml file. Then restart the server and
> then I finally see it in the ring...
> If I had added the node to the seeds list of itself when first joining it,
> it would not join the ring but if I do it in two phases it did work.
> So it's either my misunderstanding or a bug...
>
>
> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
>
>> The new node does not see itself as part of the ring, it sees all others
>> but itself, so from that perspective the view is consistent.
>> The only problem is that the node never finishes to bootstrap. It stays in
>> this state for hours (It's been 20 hours now...)
>>
>>
>> $ bin/nodetool -p 9004 -h localhost streams
>>> Mode: Bootstrapping
>>> Not sending any streams.
>>> Not receiving any streams.
>>
>>
>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:
>>
>>> Does the new node have itself in the list of seeds per chance? This
>>> could cause some issues if so.
>>>
>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
>>> > I'm still at lost.   I haven't been able to resolve this. I tried
>>> > adding another node at a different location on the ring but this node
>>> > too remains stuck in the bootstrapping state for many hours without
>>> > any of the other nodes being busy with anti compaction or anything
>>> > else. I don't know what's keeping it from finishing the bootstrap,no
>>> > CPU, no io, files were already streamed so what is it waiting for?
>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
>>> > be anything addressing a similar issue so I figured there was no point
>>> > in upgrading. But let me know if you think there is.
>>> > Or any other advice...
>>> >
>>> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
>>> >> Thanks Jake, but unfortunately the streams directory is empty so I
>>> don't think that any of the nodes is anti-compacting data right now or had
>>> been in the past 5 hours. It seems that all the data was already transferred
>>> to the joining host but the joining node, after having received the data
>>> would still remain in bootstrapping mode and not join the cluster. I'm not
>>> sure that *all* data was transferred (perhaps other nodes need to transfer
>>> more data) but nothing is actually happening so I assume all has been moved.
>>> >> Perhaps it's a configuration error from my part. Should I use I use
>>> AutoBootstrap=true ? Anything else I should look out for in the
>>> configuration file or something else?
>>> >>
>>> >>
>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani 
>>> wrote:
>>> >>
>>> >> In 0.6, locate the node doing anti-compaction and look in the
>>> "streams" subdirectory in the keyspace data dir to monitor the
>>> anti-compaction progress (it puts new SSTables for bootstrapping node in
>>> there)
>>> >>
>>> >>
>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:
>>> >>
>>> >>
>>> >> Running nodetool decommission didn't help. Actually the node refused
>>> to decommission itself (b/c it wasn't part of the ring). So I simply stopped
>>> the process, deleted all the data directories and started it again. It
>>> worked in the sense of the node bootstrapped again but as before, after it
>>> had finished moving the data nothing happened for a long time (I'm still
>>> waiting, but nothing seems to be happening).
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks
>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
>>> >> Thanks Shimi, so indeed anticompaction was run on one of the other
>>> nodes from the same DC but to my understanding it has already ended. A few
>>> hour ago...
>>> >>
>>> >>
>>> >>
>>> >> I plenty of log messages such as [1] which ended a couple of hours
>>> ago, and I've seen the new node streaming and accepting the data from the
>>> node which performed the anticompaction and so far it was normal so it
>>> seemed that data is at its right place. But now the new node seems sort of
>>> stuck. None of the other nodes is anticompacting right now or had been
>>> anticompacting since then.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> The new node's CPU is close to zero, it's iostats are almost zero so I
>>> can't find another bottleneck that would keep it hanging.
>>> >> On the IRC someone suggested I'd maybe retry to join this node,
>>> e.g. decommission and rejoin it again. I'll try it now...
>>> >>

Re: iterate over all the rows with RP

2010-12-13 Thread David Boxenhorn

Shimi, I am using Hector to do exactly what you want to do, with no
problems.

(In fact, the question didn't even occur to me...)

On Sun, Dec 12, 2010 at 9:03 PM, Ran Tavory  wrote:

> This should be the case, yes, semantics isn't affected by the
> connection and state isn't kept. What might happen if you read/write
> with low consistency levels then when you hit a different host on the
> ring it might have an inconsistent state in case of partition.
>
> On Sunday, December 12, 2010, shimi  wrote:
> > So if I will use a different connection (thrift via Hector), will I get
> the same results? It's make sense when you use OPP and I assume it is the
> same with RP. I just wanted to make sure this is the case and there is no
> state which is kept.
> >
> > Shimi
> >
> > On Sun, Dec 12, 2010 at 8:14 PM, Peter Schuller <
> peter.schul...@infidyne.com> wrote:
> >
> >> Is the same connection is required when iterating over all the rows with
> >> Random Paritioner or is it possible to use a different connection for
> each
> >> iteration?
> >
> > In general, the choice of RPC connection (I assume you mean the
> > underlying thrift connection) does not affect the semantics of the RPC
> > calls.
> >
> > --
> > / Peter Schuller
> >
> >
> >
>
> --
> /Ran
>

Re: N to N relationships

2010-12-12 Thread David Boxenhorn

You want to store every value twice? That would be a pain to maintain, and
possibly lead to inconsistent data.

On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey  wrote:

> I would also recommend two column families. Storing the key as NxN would
> require you to hit multiple machines to query for an entire row or column
> with RandomPartitioner. Even with OPP you would need to pick row or columns
> to order by and the other would require hitting multiple machines.  Two
> column families avoids this and avoids any problems with choosing OPP.
>
>
> On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton wrote:
>
>> Am assuming you have one matrix and you know the dimensions. Also as you
>> say the most important queries are to get an entire column or an entire row.
>>
>> I would consider using a standard CF for the Columns and one for the Rows.
>>  The key for each would be the col / row number, each cassandra column name
>> would be the id of the other dimension and the value whatever you want.
>>
>> - when storing the data update both the Column and Row CF
>> - reading a whole row/col would be simply reading from the appropriate CF.
>> - reading an intersection is a get_slice to either col or row CF using the
>> column_names field to identify the other dimension.
>>
>> You would not need secondary indexes to serve these queries.
>>
>> Hope that helps.
>> Aaron
>>
>> On 10 Dec, 2010,at 07:02 AM, Sébastien Druon  wrote:
>>
>> I mean if I have secondary indexes. Apparently they are calculated in the
>> background...
>>
>> On 9 December 2010 18:33, David Boxenhorn  wrote:
>>
>>> What do you mean by indexing?
>>>
>>>
>>> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon wrote:
>>>
>>>> Thanks a lot for the answer
>>>>
>>>> What about the indexing when adding a new element? Is it incremental?
>>>>
>>>> Thanks again
>>>>
>>>>
>>>>
>>>> On 9 December 2010 14:38, David Boxenhorn  wrote:
>>>>
>>>>> How about a regular CF where keys are n...@n ?
>>>>>
>>>>> Then, getting a matrix row would be the same cost as getting a matrix
>>>>> column (N gets), and it would be very easy to add element N+1.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> For a specific case, we are thinking about representing a N to N
>>>>>> relationship with a NxN Matrix in Cassandra.
>>>>>> The relations will be only between a subset of elements, so the Matrix
>>>>>> will mostly contain empty elements.
>>>>>>
>>>>>> We have a set of questions concerning this:
>>>>>> - what is the best way to represent this matrix? what would have the
>>>>>> best performance in reading? in writing?
>>>>>>   . a super column family with n column families, with n columns each
>>>>>>   . a column family with n columns and n lines
>>>>>>
>>>>>> In the second case, we would need to extract 2 kinds of information:
>>>>>> - all the relations for a line: this should be no specific problem;
>>>>>> - all the relations for a column: in that case we would need an index
>>>>>> for the columns, right? and then get all the lines where the value of the
>>>>>> column in question is not null... is it the correct way to do?
>>>>>> When using indexes, say we want to add another element N+1. What
>>>>>> impact in terms of time would it have on the indexation job?
>>>>>>
>>>>>> Thanks a lot for the answers,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Sébastien Druon
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn

What do you mean by, "The included secondary indexes still aren't good at
finding keys for ranges of indexed values, such as " name > 'b' and name <
'c' "."?

Do you mean that secondary indexes don't support range queries at all?

Besides supporting range queries, I see the importance of secondary indexes
as solving the problem of really big indexes, which are almost (if not
completely) impossible to write by hand on the client.

On Thu, Dec 9, 2010 at 7:41 PM, Tyler Hobbs  wrote:

> OPP is not yet obsolete.
>
> The included secondary indexes still aren't good at finding keys for ranges
> of indexed values, such as " name > 'b' and name < 'c' ".  This is something
> that an OPP index would be good at.  Of course, you can do something similar
> with one or more rows, so it's not that big of an advantage for OPP.
>
> If you can make primary indexes useful, you might as well -- no reason to
> throw that away.
>
> The main thing that the secondary index support does is relieve you from
> having to write all of the indexing code and CFs by hand.
>
> - Tyler
>
>
> On Thu, Dec 9, 2010 at 8:23 AM, David Boxenhorn  wrote:
>
>> - OPP becomes obsolete (OOP is not obsolete!)
>> - primary indexes become obsolete if you ever want to do a range query
>> (which you probably will...), better to assign a random row id
>>
>> Taken together, it's likely that very little will remain of your old
>> database schema...
>>
>> Am I right?
>>
>>
>

Re: N to N relationships

2010-12-09 Thread David Boxenhorn

What do you mean by indexing?

On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon  wrote:

> Thanks a lot for the answer
>
> What about the indexing when adding a new element? Is it incremental?
>
> Thanks again
>
>
> On 9 December 2010 14:38, David Boxenhorn  wrote:
>
>> How about a regular CF where keys are n...@n ?
>>
>> Then, getting a matrix row would be the same cost as getting a matrix
>> column (N gets), and it would be very easy to add element N+1.
>>
>>
>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon wrote:
>>
>>> Hello,
>>>
>>> For a specific case, we are thinking about representing a N to N
>>> relationship with a NxN Matrix in Cassandra.
>>> The relations will be only between a subset of elements, so the Matrix
>>> will mostly contain empty elements.
>>>
>>> We have a set of questions concerning this:
>>> - what is the best way to represent this matrix? what would have the best
>>> performance in reading? in writing?
>>>   . a super column family with n column families, with n columns each
>>>   . a column family with n columns and n lines
>>>
>>> In the second case, we would need to extract 2 kinds of information:
>>> - all the relations for a line: this should be no specific problem;
>>> - all the relations for a column: in that case we would need an index for
>>> the columns, right? and then get all the lines where the value of the column
>>> in question is not null... is it the correct way to do?
>>> When using indexes, say we want to add another element N+1. What impact
>>> in terms of time would it have on the indexation job?
>>>
>>> Thanks a lot for the answers,
>>>
>>> Best regards,
>>>
>>> Sébastien Druon
>>>
>>
>>
>

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn

If that is what you want, use CL=ONE

On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig wrote:

>
> On Dec 9, 2010, at 17:39, David Boxenhorn wrote:
>
> > In other words, if you want to use QUORUM, you need to set RF>=3.
> >
> > (I know because I had exactly the same problem.)
>
> I naively assume that if I kill either node that holds N1 (i.e. node 1 or
> 3), N1 will still remain on another node. Only if both fail, I actually lose
> data. But apparently this is not how it works...
>
> > On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne 
> wrote:
> > I'ts 2 out of the number of replicas, not the number of nodes. At RF=2,
> you have
> > 2 replicas. And since quorum is also 2 with that replication factor,
> > you cannot lose
> > a node, otherwise some query will end up as UnavailableException.
> >
> > Again, this is not related to the total number of nodes. Even with 200
> > nodes, if
> > you use RF=2, you will have some query that fail (altough much less that
> what
> > you are probably seeing).
> >
> > On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig 
> wrote:
> > >
> > > On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
> > >
> > >> Quorum is really only useful when RF > 2, since the for a quorum to
> > >> succeed RF/2+1 replicas must be available.
> > >
> > > 2/2+1==2 and I killed 1 of 3, so... don't get it.
> > >
> > >> This means for RF = 2, consistency levels QUORUM and ALL yield the
> same result.
> > >>
> > >> /d
> > >>
> > >> On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig <
> timo.nent...@toptarif.de> wrote:
> > >>> Hi!
> > >>>
> > >>> I've 3 servers running (0.7rc1) with a replication_factor of 2 and
> use quorum for writes. But when I shut down one of them
> UnavailableExceptions are thrown. Why is that? Isn't that the sense of
> quorum and a fault-tolerant DB that it continues with the remaining 2 nodes
> and redistributes the data to the broken one as soons as its up again?
> > >>>
> > >>> What may I be doing wrong?
> > >>>
> > >>> thx
> > >>> tcn
> > >
> > >
> >
>
>

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn

In other words, if you want to use QUORUM, you need to set RF>=3.

(I know because I had exactly the same problem.)

On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne  wrote:

> I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you
> have
> 2 replicas. And since quorum is also 2 with that replication factor,
> you cannot lose
> a node, otherwise some query will end up as UnavailableException.
>
> Again, this is not related to the total number of nodes. Even with 200
> nodes, if
> you use RF=2, you will have some query that fail (altough much less that
> what
> you are probably seeing).
>
> On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig 
> wrote:
> >
> > On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
> >
> >> Quorum is really only useful when RF > 2, since the for a quorum to
> >> succeed RF/2+1 replicas must be available.
> >
> > 2/2+1==2 and I killed 1 of 3, so... don't get it.
> >
> >> This means for RF = 2, consistency levels QUORUM and ALL yield the same
> result.
> >>
> >> /d
> >>
> >> On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig 
> wrote:
> >>> Hi!
> >>>
> >>> I've 3 servers running (0.7rc1) with a replication_factor of 2 and use
> quorum for writes. But when I shut down one of them UnavailableExceptions
> are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant
> DB that it continues with the remaining 2 nodes and redistributes the data
> to the broken one as soons as its up again?
> >>>
> >>> What may I be doing wrong?
> >>>
> >>> thx
> >>> tcn
> >
> >
>

1 2 3 >

1 - 100 of 216 matches

Mail list logo