Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-18 Thread Maxim Potekhin

I must have accidentally deleted all messages in this thread save this one.

On the face value, we are talking about saving 2 bytes per column. I 
know it can add up with many columns, but relative to the size of the 
column -- is it THAT significant?


I made an effort to minimize my CF footprint by replacing the "natural" 
column keys with integers (and translating back and forth when writing 
and reading). It's easy to see that in my case I achieve almost 50% 
storage savings and at least 30%. But if the column in question contains 
more than 20 bytes -- what's up with trying to save 2?


Cheers

Maxim


On 1/18/2012 11:49 PM, Ertio Lew wrote:

I believe the timestamps *on per column basis* are only required until
the compaction time after that it may also work if the timestamp range
could be specified globally on per SST table basis. and thus the
timestamps until compaction are only required to be measure the time
from the initialization of the new memtable to the point the column is
written to that memtable. Thus you can easily fit that time in 4
bytes. This I believe would save atleast  4 bytes overhead for each
column.

Is anything related to these overheads under consideration/ or planned
in the roadmap ?



On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyev  wrote:

I have a patch for trunk which I just have to get time to test a bit before I

submit.

It is for super columns and will use the super columns timestamp as the base

and only store variant encoded offsets in the underlying columns.
Could you please measure how much real benefit it brings (in real RAM
consumption by JVM). It is hard to tell will it give noticeable results or not.
AFAIK memory structures used for memtable consume much more memory. And 64-bit
JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
consumption reduction looks doubtful.






Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-18 Thread Ertio Lew
I believe the timestamps *on per column basis* are only required until
the compaction time after that it may also work if the timestamp range
could be specified globally on per SST table basis. and thus the
timestamps until compaction are only required to be measure the time
from the initialization of the new memtable to the point the column is
written to that memtable. Thus you can easily fit that time in 4
bytes. This I believe would save atleast  4 bytes overhead for each
column.

Is anything related to these overheads under consideration/ or planned
in the roadmap ?



On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyev  wrote:
>
>>
>> I have a patch for trunk which I just have to get time to test a bit before I
> submit.
>> It is for super columns and will use the super columns timestamp as the base
> and only store variant encoded offsets in the underlying columns.
>>
>
> Could you please measure how much real benefit it brings (in real RAM
> consumption by JVM). It is hard to tell will it give noticeable results or 
> not.
> AFAIK memory structures used for memtable consume much more memory. And 64-bit
> JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
> consumption reduction looks doubtful.
>
>


RE: Incremental backups

2012-01-18 Thread Michael Vaknine
I am on 1.0.3 release and it looks like very old files that remained from
the upgrade process.

How can I verify that?

 

Michael

 

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Thursday, January 19, 2012 2:22 AM
To: user@cassandra.apache.org
Subject: Re: Incremental backups

 

Looks like you are on a 0.7.X release, which one exactly ? It would be a
really good idea to at least be on 8.X, preferably 1.0

 

Pre 1.0 compacted SSTables were removed during JVM GC, but compacted
SSTables  have a .Compacted file created so we know they are no longer
needed. 

 

These SSTables look like secondary index files. It may be a bug if they are
not included in the incremental backups. 

 

Cheers 

 

-

Aaron Morton

Freelance Developer

@aaronmorton

http://www.thelastpickle.com

 

On 19/01/2012, at 12:13 AM, Michael Vaknine wrote:





Hi,

Thank you for response.

I did restart for all the nodes and now I can see files in backup folders so
It seems like it is working.

During this process I have noticed to something very strange

 

In data/City folder there are files that are not created in the snapshot
folder (it looks like old orphaned files)

Is there any process of cassandta that will delete uneeded files I tried to
run nodetool cleanup but it did not help.

 

This is the files:

-rw-r--r-- 1 cassandra cassandra 230281 2011-12-06 00:57
AttractionCheckins.3039706172746974696f6e-f-157-Data.db

-rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 00:57
AttractionCheckins.3039706172746974696f6e-f-157-Filter.db

-rw-r--r-- 1 cassandra cassandra 27 2011-12-06 00:57
AttractionCheckins.3039706172746974696f6e-f-157-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 00:57
AttractionCheckins.3039706172746974696f6e-f-157-Statistics.db

-rw-r--r-- 1 cassandra cassandra   1321 2011-12-06 00:58
AttractionCheckins.3039706172746974696f6e-f-158-Data.db

-rw-r--r-- 1 cassandra cassandra 16 2011-12-06 00:58
AttractionCheckins.3039706172746974696f6e-f-158-Filter.db

-rw-r--r-- 1 cassandra cassandra 27 2011-12-06 00:58
AttractionCheckins.3039706172746974696f6e-f-158-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 00:58
AttractionCheckins.3039706172746974696f6e-f-158-Statistics.db

-rw-r--r-- 1 cassandra cassandra2627100 2011-12-06 06:55
Attractions.3039706172746974696f6e-f-1156-Data.db

-rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 06:55
Attractions.3039706172746974696f6e-f-1156-Filter.db

-rw-r--r-- 1 cassandra cassandra 20 2011-12-06 06:55
Attractions.3039706172746974696f6e-f-1156-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 06:55
Attractions.3039706172746974696f6e-f-1156-Statistics.db

-rw-r--r-- 1 cassandra cassandra2238358 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1157-Data.db

-rw-r--r-- 1 cassandra cassandra 16 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1157-Filter.db

-rw-r--r-- 1 cassandra cassandra 20 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1157-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1157-Statistics.db

-rw-r--r-- 1 cassandra cassandra 92 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1158-Data.db

-rw-r--r-- 1 cassandra cassandra 16 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1158-Filter.db

-rw-r--r-- 1 cassandra cassandra 20 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1158-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1158-Statistics.db

-rw-r--r-- 1 cassandra cassandra  44799 2011-12-06 01:25
CityResources.3039706172746974696f6e-f-365-Data.db

-rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 01:25
CityResources.3039706172746974696f6e-f-365-Filter.db

-rw-r--r-- 1 cassandra cassandra196 2011-12-06 01:25
CityResources.3039706172746974696f6e-f-365-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 01:25
CityResources.3039706172746974696f6e-f-365-Statistics.db

-rw-r--r-- 1 cassandra cassandra   7647 2011-12-06 07:50
CityResources.3039706172746974696f6e-f-366-Data.db

-rw-r--r-- 1 cassandra cassandra 24 2011-12-06 07:50
CityResources.3039706172746974696f6e-f-366-Filter.db

-rw-r--r-- 1 cassandra cassandra 96 2011-12-06 07:50
CityResources.3039706172746974696f6e-f-366-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50
CityResources.3039706172746974696f6e-f-366-Statistics.db

 

 

Thanks

Michael

 

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
Sent: Wednesday, January 18, 2012 10:40 AM
To: user@cassandra.apache.org
Subject: Re: Incremental backups

 

As this option is in the cassandra.yaml file, you might need to perform a
restart of your entire cluster (a rolling restart should work).

 

Hope this will help.

 

Alain

2012/1/18 Michael Vaknine 

Hi,

I am configured to

Re: poor Memtable performance on column slices?

2012-01-18 Thread Josep Blanquer
On Wed, Jan 18, 2012 at 12:44 PM, Jonathan Ellis  wrote:

> On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer
>  wrote:
> > If I do a slice without a start (i.e., get me the first column)...it
> seems
> > to fly. GET("K", :count => 1 )
>
> Yep, that's a totally different code path (SimpleSliceReader instead
> of IndexedSliceReader) that we've done to optimize this common case.
>
>
Thanks Jonathan, yup, that makes sense. It was surprising to me that
"avoiding the seek" was that much faster..but I guess if it's a completely
different code path, there might be many other things in play.


> > The same starting at the last one.  GET("K",:start
> > => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , :count => 1 )
> > -- 6.489683  -> Much faster than any other slice ... although not quite
> as
> > fast as not using a start column
>
> That's not a special code path, but I'd guess that the last column is
> more likely to be still in memory instead of on disk.
>
>
Well, no need to prolong the thread, but my tests are exclusively in
Memtable reads (data has not flushed)...so there's no SSTable read involved
here...which is exactly why is felt a bit funny to have that case be
considerably faster. I just wanted to bring it up to you guys, in case you
can think of some cause and/or potential issue.

Thanks for the responses!

Josep M.


Re: Max records per node for a given secondary index value

2012-01-18 Thread Mohit Anchlia
You need to shard your rows

On Wed, Jan 18, 2012 at 5:46 PM, Kamal Bahadur  wrote:
> Anyone?
>
>
> On Wed, Jan 18, 2012 at 9:53 AM, Kamal Bahadur 
> wrote:
>>
>> Hi All,
>>
>> It is great to know that Cassandra column family can accommodate 2 billion
>> columns per row! I was reading about how Cassandra stores the secondary
>> index info internally. I now understand that the index related data are
>> stored in hidden CF and each node is responsible to store the keys of data
>> that reside on that node only.
>>
>> I have been using secondary index for a low cardinality column called
>> "product". There can only be 3 possible values for this column. I have a
>> four node cluster and process about 5000 records per second with a RF 2.
>>
>> My question here is, what happens after the number of columns in hidden
>> index CF exceeds 2 billion? How does Cassandra handle this situation? I
>> guess, one way to handle this is to add more nodes to the cluster. I am
>> interested in knowing if any other solution exist.
>>
>> Thanks,
>> Kamal
>
>


Re: Max records per node for a given secondary index value

2012-01-18 Thread Kamal Bahadur
Anyone?

On Wed, Jan 18, 2012 at 9:53 AM, Kamal Bahadur wrote:

> Hi All,
>
> It is great to know that Cassandra column family can accommodate 2 billion
> columns per row! I was reading about how Cassandra stores the secondary
> index info internally. I now understand that the index related data are
> stored in hidden CF and each node is responsible to store the keys of data
> that reside on that node only.
>
> I have been using secondary index for a low cardinality column called
> "product". There can only be 3 possible values for this column. I have a
> four node cluster and process about 5000 records per second with a RF 2.
>
> My question here is, what happens after the number of columns in hidden
> index CF exceeds 2 billion? How does Cassandra handle this situation? I
> guess, one way to handle this is to add more nodes to the cluster. I am
> interested in knowing if any other solution exist.
>
> Thanks,
> Kamal
>


Re: Incremental backups

2012-01-18 Thread aaron morton
Looks like you are on a 0.7.X release, which one exactly ? It would be a really 
good idea to at least be on 8.X, preferably 1.0

Pre 1.0 compacted SSTables were removed during JVM GC, but compacted SSTables  
have a .Compacted file created so we know they are no longer needed. 

These SSTables look like secondary index files. It may be a bug if they are not 
included in the incremental backups. 

Cheers 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/01/2012, at 12:13 AM, Michael Vaknine wrote:

> Hi,
> Thank you for response.
> I did restart for all the nodes and now I can see files in backup folders so 
> It seems like it is working.
> During this process I have noticed to something very strange
>  
> In data/City folder there are files that are not created in the snapshot 
> folder (it looks like old orphaned files)
> Is there any process of cassandta that will delete uneeded files I tried to 
> run nodetool cleanup but it did not help.
>  
> This is the files:
> -rw-r--r-- 1 cassandra cassandra 230281 2011-12-06 00:57 
> AttractionCheckins.3039706172746974696f6e-f-157-Data.db
> -rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 00:57 
> AttractionCheckins.3039706172746974696f6e-f-157-Filter.db
> -rw-r--r-- 1 cassandra cassandra 27 2011-12-06 00:57 
> AttractionCheckins.3039706172746974696f6e-f-157-Index.db
> -rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 00:57 
> AttractionCheckins.3039706172746974696f6e-f-157-Statistics.db
> -rw-r--r-- 1 cassandra cassandra   1321 2011-12-06 00:58 
> AttractionCheckins.3039706172746974696f6e-f-158-Data.db
> -rw-r--r-- 1 cassandra cassandra 16 2011-12-06 00:58 
> AttractionCheckins.3039706172746974696f6e-f-158-Filter.db
> -rw-r--r-- 1 cassandra cassandra 27 2011-12-06 00:58 
> AttractionCheckins.3039706172746974696f6e-f-158-Index.db
> -rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 00:58 
> AttractionCheckins.3039706172746974696f6e-f-158-Statistics.db
> -rw-r--r-- 1 cassandra cassandra2627100 2011-12-06 06:55 
> Attractions.3039706172746974696f6e-f-1156-Data.db
> -rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 06:55 
> Attractions.3039706172746974696f6e-f-1156-Filter.db
> -rw-r--r-- 1 cassandra cassandra 20 2011-12-06 06:55 
> Attractions.3039706172746974696f6e-f-1156-Index.db
> -rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 06:55 
> Attractions.3039706172746974696f6e-f-1156-Statistics.db
> -rw-r--r-- 1 cassandra cassandra2238358 2011-12-06 07:50 
> Attractions.3039706172746974696f6e-f-1157-Data.db
> -rw-r--r-- 1 cassandra cassandra 16 2011-12-06 07:50 
> Attractions.3039706172746974696f6e-f-1157-Filter.db
> -rw-r--r-- 1 cassandra cassandra 20 2011-12-06 07:50 
> Attractions.3039706172746974696f6e-f-1157-Index.db
> -rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50 
> Attractions.3039706172746974696f6e-f-1157-Statistics.db
> -rw-r--r-- 1 cassandra cassandra 92 2011-12-06 07:50 
> Attractions.3039706172746974696f6e-f-1158-Data.db
> -rw-r--r-- 1 cassandra cassandra 16 2011-12-06 07:50 
> Attractions.3039706172746974696f6e-f-1158-Filter.db
> -rw-r--r-- 1 cassandra cassandra 20 2011-12-06 07:50 
> Attractions.3039706172746974696f6e-f-1158-Index.db
> -rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50 
> Attractions.3039706172746974696f6e-f-1158-Statistics.db
> -rw-r--r-- 1 cassandra cassandra  44799 2011-12-06 01:25 
> CityResources.3039706172746974696f6e-f-365-Data.db
> -rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 01:25 
> CityResources.3039706172746974696f6e-f-365-Filter.db
> -rw-r--r-- 1 cassandra cassandra196 2011-12-06 01:25 
> CityResources.3039706172746974696f6e-f-365-Index.db
> -rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 01:25 
> CityResources.3039706172746974696f6e-f-365-Statistics.db
> -rw-r--r-- 1 cassandra cassandra   7647 2011-12-06 07:50 
> CityResources.3039706172746974696f6e-f-366-Data.db
> -rw-r--r-- 1 cassandra cassandra 24 2011-12-06 07:50 
> CityResources.3039706172746974696f6e-f-366-Filter.db
> -rw-r--r-- 1 cassandra cassandra 96 2011-12-06 07:50 
> CityResources.3039706172746974696f6e-f-366-Index.db
> -rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50 
> CityResources.3039706172746974696f6e-f-366-Statistics.db
>  
>  
> Thanks
> Michael
>  
> From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
> Sent: Wednesday, January 18, 2012 10:40 AM
> To: user@cassandra.apache.org
> Subject: Re: Incremental backups
>  
> As this option is in the cassandra.yaml file, you might need to perform a 
> restart of your entire cluster (a rolling restart should work).
>  
> Hope this will help.
>  
> Alain
> 
> 2012/1/18 Michael Vaknine 
> Hi,
> I am configured to do incremental backups on all my node on the cluster but 
> it is not working.
> In cassandra.yaml : incremental_backups: true
> When I check data folder the

Re: nodetool ring question

2012-01-18 Thread aaron morton
Michael, Robin

Let us know if the reported live load is increasing and diverging from 
the on disk size.

If it is can you check nodetool cfstats and find an example of a 
particular CF where Space Used Live has diverged from the on disk size. The 
provide the schema for the CF and any other info that may be handy. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/01/2012, at 10:58 PM, Michael Vaknine wrote:

> I did restart the cluster and now it is normal 5GB.
>  
> From: R. Verlangen [mailto:ro...@us2.nl] 
> Sent: Wednesday, January 18, 2012 11:32 AM
> To: user@cassandra.apache.org
> Subject: Re: nodetool ring question
>  
> I also have this problem. My data on nodes grows to roughly 30GB. After a 
> restart only 5GB remains. Is a factor 6 common for Cassandra?
> 
> 2012/1/18 aaron morton 
> Good idea Jeremiah, are you using compression Michael ? 
>  
> Scanning through the CF stats this jumps out…
>  
> Column Family: Attractions
> SSTable count: 3
> Space used (live): 27542876685
> Space used (total): 1213220387
> Thats 25Gb of live data but only 1.3GB total. 
>  
> Otherwise want to see if a restart fixes it :) Would be interesting to know 
> if it's wrong from the start or drifts during streaming or compaction. 
>  
> Cheers
>  
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 18/01/2012, at 12:04 PM, Jeremiah Jordan wrote:
> 
> 
> There were some nodetool ring load reporting issues with early version of 
> 1.0.X don't remember when they were fixed, but that could be your issue.  Are 
> you using compressed column families, a lot of the issues were with those.
> Might update to 1.0.7.
> 
> -Jeremiah
> 
> On 01/16/2012 04:04 AM, Michael Vaknine wrote:
> Hi,
>  
> I have a 4 nodes cluster 1.0.3 version
>  
> This is what I get when I run nodetool ring
>  
> Address DC  RackStatus State   LoadOwns   
>  Token
>   
>  127605887595351923798765477786913079296
> 10.8.193.87 datacenter1 rack1   Up Normal  46.47 GB25.00% 
>  0
> 10.5.7.76   datacenter1 rack1   Up Normal  48.01 GB25.00% 
>  42535295865117307932921825928971026432
> 10.8.189.197datacenter1 rack1   Up Normal  53.7 GB 25.00% 
>  85070591730234615865843651857942052864
> 10.5.3.17   datacenter1 rack1   Up Normal  43.49 GB25.00% 
>  127605887595351923798765477786913079296
>  
> I have finished running repair on all 4 nodes.
>  
> I have less then 10 GB on the /var/lib/cassandra/data/ folders
>  
> My question is Why nodetool reports almost 50 GB on each node?
>  
> Thanks
> Michael
>  



Re: Unbalanced cluster with RandomPartitioner

2012-01-18 Thread aaron morton
If you have performed any token moves the data will not be deleted until you 
run nodetool cleanup. 

To get a baseline I would run nodetool compact to do major compaction and purge 
any tomb stones as others have said. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/01/2012, at 2:19 PM, Maki Watanabe wrote:

> Are there any significant difference of number of sstables on each nodes?
> 
> 2012/1/18 Marcel Steinbach :
>> We are running regular repairs, so I don't think that's the problem.
>> And the data dir sizes match approx. the load from the nodetool.
>> Thanks for the advise, though.
>> 
>> Our keys are digits only, and all contain a few zeros at the same
>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>> would generate 'hotspots' for those kind of keys, right?
>> 
>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>> 
>> Have you tried running repair first on each node? Also, verify using
>> df -h on the data dirs
>> 
>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>>  wrote:
>> 
>> Hi,
>> 
>> 
>> we're using RP and have each node assigned the same amount of the token
>> space. The cluster looks like that:
>> 
>> 
>> Address Status State   LoadOwnsToken
>> 
>> 
>> 205648943402372032879374446248852460236
>> 
>> 1   Up Normal  310.83 GB   12.50%
>>  56775407874461455114148055497453867724
>> 
>> 2   Up Normal  470.24 GB   12.50%
>>  78043055807020109080608968461939380940
>> 
>> 3   Up Normal  271.57 GB   12.50%
>>  99310703739578763047069881426424894156
>> 
>> 4   Up Normal  282.61 GB   12.50%
>>  120578351672137417013530794390910407372
>> 
>> 5   Up Normal  248.76 GB   12.50%
>>  141845999604696070979991707355395920588
>> 
>> 6   Up Normal  164.12 GB   12.50%
>>  163113647537254724946452620319881433804
>> 
>> 7   Up Normal  76.23 GB12.50%
>>  184381295469813378912913533284366947020
>> 
>> 8   Up Normal  19.79 GB12.50%
>>  205648943402372032879374446248852460236
>> 
>> 
>> I was under the impression, the RP would distribute the load more evenly.
>> 
>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>> node. Should we just move the nodes so that the load is more even
>> distributed, or is there something off that needs to be fixed first?
>> 
>> 
>> Thanks
>> 
>> Marcel
>> 
>> 
>> 
>> chors GmbH
>> 
>> 
>> 
>> specialists in digital and direct marketing solutions
>> 
>> Haid-und-Neu-Straße 7
>> 
>> 76131 Karlsruhe, Germany
>> 
>> www.chors.com
>> 
>> Managing Directors: Dr. Volker Hatz, Markus PlattnerAmtsgericht
>> Montabaur, HRB 15029
>> 
>> This e-mail is for the intended recipient only and
>> may contain confidential or privileged information. If you have received
>> this e-mail by mistake, please contact us immediately and completely delete
>> it (and any attachments) and do not forward it or inform any other person of
>> its contents. If you send us messages by e-mail, we take this as your
>> authorization to correspond with you by e-mail. E-mail transmission cannot
>> be guaranteed to be secure or error-free as information could be
>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>> any errors or omissions in the content of this message which arise as a
>> result of its e-mail transmission. Please note that all e-mail
>> communications to and from chors GmbH may be monitored.
>> 
>> 
> 
> 
> 
> -- 
> w3m



Re: poor Memtable performance on column slices?

2012-01-18 Thread Jonathan Ellis
On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer
 wrote:
> If I do a slice without a start (i.e., get me the first column)...it seems
> to fly. GET("K", :count => 1 )

Yep, that's a totally different code path (SimpleSliceReader instead
of IndexedSliceReader) that we've done to optimize this common case.

> The same starting at the last one.  GET("K",:start
> => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , :count => 1 )
> -- 6.489683  -> Much faster than any other slice ... although not quite as
> fast as not using a start column

That's not a special code path, but I'd guess that the last column is
more likely to be still in memory instead of on disk.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: poor Memtable performance on column slices?

2012-01-18 Thread Josep Blanquer
Excellent Sylvain! Yes, that seems to remove the linear scan component of
slice read times.

FYI, I still see some interesting difference in some aspects though.

If I do a slice without a start (i.e., get me the first column)...it seems
to fly. GET("K", :count => 1 )
-- 4.832877  -->> very fast, and actually in this case I see the reading
client being the bottleneck, not cassandra (which it is at about 20% CPU
only)

If I do the same, but actually specifying the start column with the first
existing value...GET("K",:start => '144abe16-416c-11e1-9e23-2cbae9ddfe8b' ,
:count => 1 )
-- 11.084275 -->> half as fast, and using twice the CPU...hovering about
50% or more. (again Cassandra is not the bottleneck, but the significant
data is that the initial seeking seems to be doubling the time/cpu

If I do the same, starting by the middle.  GET("K",:start
=> '9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 )
-- 11.038187  --> as expensive as starting from the beginning

The same starting at the last one.  GET("K",:start
=> '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , :count => 1 )
-- 6.489683  -> Much faster than any other slice ... although not quite as
fast as not using a start column

I could see that not having to seek into whatever backing "map/structure"
is obviously faster...although I'm surprised that seeking to an initial
value results in half as slow reads. Wouldn't this mostly imply following
some links/pointers in memory to start reading ordered columns? What is the
backing store used for Memtables when column slices are performed?

I am not sure why starting at the end (without reversing or anything)
yields much better performance.

 Cheers,

Josep M.

On Wed, Jan 18, 2012 at 12:57 AM, Sylvain Lebresne wrote:

> On Wed, Jan 18, 2012 at 2:44 AM, Josep Blanquer 
> wrote:
> > Hi,
> >
> >  I've been doing some tests using wide rows recently, and I've seen some
> odd
> > performance problems that I'd like to understand.
> >
> > In particular, I've seen that the time it takes for Cassandra to perform
> a
> > column slice of a single key, solely in a Memtable, seems to be very
> > expensive, but most importantly proportional to the ordered position
> where
> > the start column of the slice lives.
> >
> > In other words:
> >  1- if I start Cassandra fresh (with an empty ColumnFamily with TimeUUID
> > comparator)
> >  2- I create a single Row with Key "K"
> >  3- Then add 200K TimeUUID columns to key "K"
> >  4- (and make sure nothing is flushed to SSTables...so it's all in the
> > Memtable)
> >
> > ...I observe the following timings (secondds to perform 1000 reads) while
> > performing multiget slices on it:  (pardon the pseudo-code, but you'll
> get
> > the gist)
> >
> > a) simply a get of the first column:  GET("K",:count=>1)
> >   --  2.351226
> >
> > b) doing a slice get, starting from the first column:  GET("K",:start =>
> > '144abe16-416c-11e1-9e23-2cbae9ddfe8b' , :count => 1 )
> >   -- 2.189224   <<- so with or without "start" doesn't seem to make much
> of
> > a difference
> >
> > c) doing a slice get, starting from the middle of the ordered
> > columns..approx starting at item number 100K:   GET("K",:start =>
> > '9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 )
> >  -- 11.849326  <<- 5 times more expensive if the start of the slice is
> 100K
> > positions away
> >
> > d) doing a slice get, starting from the last of the ordered
> columns..approx
> > position 200K:   GET("K",:start
> => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' ,
> > :count => 1 )
> >   -- 19.889741   <<- Almost twice as expensive than starting the slice at
> > position 100K, and 10 times more expensive than starting from the first
> one
> >
> > This behavior leads me to believe that there's a clear Memtable column
> scan
> > for the columns of the key.
> > If one tries a column name read on those positions (i.e., not a slice),
> the
> > performance is constant. I.e., GET("K",
> > '144abe16-416c-11e1-9e23-2cbae9ddfe8b') . Retrieving the first, middle or
> > last timeUUID is done in the same amount of time.
> >
> > Having increasingly worse performance for column slices in Memtables
> seems
> > to be a bit of a problem...aren't Memtables backed by a structure that
> has
> > some sort of column name indexing?...so that landing on the start column
> can
> > be efficient? I'm definitely observing very high CPU utilization on those
> > scans...By the way, with wide columns like this, slicing SSTables is
> quite
> > faster than slicing Memtables...I'm attributing that to the sampled
> index of
> > the SSTables, hence that's why I'm wondering if the Memtables do not have
> > such column indexing builtin and resort to linked lists of sort
> >
> > Note, that the actual timings shown are not important, it's in my laptop
> and
> > I have a small amount of debugging enabled...what it is important is the
> > difference between then.
> >
> > I'm using Cassandra trunk as of Dec 1st, but I believe I've done
> experiments
> > with 0.8 series too, 

Re: specifying initial cassandra schema

2012-01-18 Thread Ramesh Natarajan
Thanks and appreciate the responses. Will look into this.

thanks
Ramesh

On Wed, Jan 18, 2012 at 2:27 AM, aaron morton  wrote:
> check the command line help for cassandra-cli, you can pass it a file name.
>
> e.g. cassandra --host localhost --file schema.txt
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/01/2012, at 9:35 AM, Carlos Pérez Miguel wrote:
>
> Hi Ramesh
>
> You can use the schematool command. I am using it for the same
> purposes in Cassandra 0.7.9.
>
> I use the following line in my cassandra startup script:
>
> $CASSANDRA_HOME/bin/schematool HOSTNAME 8080 import
>
> where HOSTNAME is the hostname of your test machine. It will import
> the schema from your cassandra.yaml file.
> If you execute it and there is already a schema in the cassandra
> cluster, you'll get a exception from schematool but no impact to the
> cluster.
>
> Bye
>
> Carlos Pérez Miguel
>
>
>
> 2012/1/17 Ramesh Natarajan :
>
> I usually start cassandra and then use cassandra-cli to import a
>
> schema.   Is there any  automated way to load a fixed schema when
>
> cassandra starts automatically?
>
>
> I have a test setup where i run cassandra on a single node. I have a
>
> OS image packaged with cassandra and it automatically starts cassandra
>
> as a part of OS boot up.
>
>
> I saw some old references to specify schema in cassandra.yaml.  Is
>
> this still supported in Cassandra 1.x?  Are there any examples?
>
>
> thanks
>
> Ramesh
>
>


Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-18 Thread Rustam Aliyev

Hi Andrei,

As you know, we are using Whirr for ElasticInbox 
(https://github.com/elasticinbox/whirr-elasticinbox). While testing we 
encountered a few minor problems which I think could be improved. Note 
that we were using 0.6 (there were some strange bug in 0.7, maybe fixed 
already).


Although initial_token is pre-calculated to form balanced cluster, our 
tests cluster (4 nodes) was always unbalanced. There were no 
initial_token specified (just default).


Second note is AWS specific - for the performance reasons it's better to 
store data files on ephemeral drive. Currently data stored under default 
location (/var/...)


Thanks for the great work!

--
Rustam.

On 18/01/2012 13:00, Andrei Savu wrote:

Hi guys,

I just want to the let you know that  Apache Whirr trunk (the upcoming 
0.7.1 release) can deploy Cassandra 1.0.7 on AWS EC2 & Rackspace Cloud.


You can give it a try by running the following commands:
https://gist.github.com/1632893

And the last thing we would appreciate any suggestions on improving 
the deployment scripts or on improving Whirr.


Thanks,

-- Andrei Savu / andreisavu.ro 



Max records per node for a given secondary index value

2012-01-18 Thread Kamal Bahadur
Hi All,

It is great to know that Cassandra column family can accommodate 2 billion
columns per row! I was reading about how Cassandra stores the secondary
index info internally. I now understand that the index related data are
stored in hidden CF and each node is responsible to store the keys of data
that reside on that node only.

I have been using secondary index for a low cardinality column called
"product". There can only be 3 possible values for this column. I have a
four node cluster and process about 5000 records per second with a RF 2.

My question here is, what happens after the number of columns in hidden
index CF exceeds 2 billion? How does Cassandra handle this situation? I
guess, one way to handle this is to add more nodes to the cluster. I am
interested in knowing if any other solution exist.

Thanks,
Kamal


Re: How to store unique visitors in cassandra

2012-01-18 Thread Lucas de Souza Santos
Why not http://www.countandra.org/


Lucas de Souza Santos (ldss)


On Wed, Jan 18, 2012 at 3:23 PM, Alain RODRIGUEZ  wrote:

> I'm wondering how to modelize my CFs to store the number of unique
> visitors in a time period in order to be able to request it fast.
>
> I thought of sharding them by day (row = 20120118, column = visitor_id,
> value = '') and perform a getcount. This would work to get unique visitors
> per day, per week or per month but it wouldn't work if I want to get unique
> visitors between 2 specific dates because 2 rows can share the same
> visitors (same columns). I can have 1500 unique visitors today, 1000 unique
> visitors yesterday but only 2000 new visitors when aggregating these days.
>
> I could get all the columns for this 2 rows and perform an intersect with
> my client language but performance won't be good with big data.
>
> Has someone already thought about this modelization ?
>
> Thanks for your help ;)
>
> Alain
>


How to store unique visitors in cassandra

2012-01-18 Thread Alain RODRIGUEZ
I'm wondering how to modelize my CFs to store the number of unique visitors
in a time period in order to be able to request it fast.

I thought of sharding them by day (row = 20120118, column = visitor_id,
value = '') and perform a getcount. This would work to get unique visitors
per day, per week or per month but it wouldn't work if I want to get unique
visitors between 2 specific dates because 2 rows can share the same
visitors (same columns). I can have 1500 unique visitors today, 1000 unique
visitors yesterday but only 2000 new visitors when aggregating these days.

I could get all the columns for this 2 rows and perform an intersect with
my client language but performance won't be good with big data.

Has someone already thought about this modelization ?

Thanks for your help ;)

Alain


Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-18 Thread Jake Luciani
Thanks Andrei!

On Wed, Jan 18, 2012 at 8:00 AM, Andrei Savu  wrote:

> Hi guys,
>
> I just want to the let you know that  Apache Whirr trunk (the upcoming
> 0.7.1 release) can deploy Cassandra 1.0.7 on AWS EC2 & Rackspace Cloud.
>
> You can give it a try by running the following commands:
> https://gist.github.com/1632893
>
> And the last thing we would appreciate any suggestions on improving the
> deployment scripts or on improving Whirr.
>
> Thanks,
>
> -- Andrei Savu / andreisavu.ro
>
>


-- 
http://twitter.com/tjake


RE: JMX BulkLoad weirdness

2012-01-18 Thread Scott Fines
I'm running 1.0.6 on both clusters.

After running a nodetool repair on all machines, everything seems to be 
behaving correctly, and AFAIK, no data has been lost.

If what you say is true and the exception was preventing a file from being 
used, then I imagine that the nodetool repair corrected that data from replicas.

Unfortunately, the only steps I have I outlined below.

I suspect it had something to do with that particular data set, however. When I 
did the exact same steps for a different data set, the error did not appear, 
and the streaming proceeded as normal. Perhaps a particular SSTable in the set 
was corrupted?

Scott

From: aaron morton [aa...@thelastpickle.com]
Sent: Wednesday, January 18, 2012 1:52 AM
To: user@cassandra.apache.org
Subject: Re: JMX BulkLoad weirdness

I'd need the version number to be sure, but it looks like that error will stop 
the node from actually using the data that has been streamed to it. The file is 
been received, the aux files (bloom etc) are created, the file is opened but 
the exception stops the file from been used.

I've not looked at the JMX bulk load for a while. If you google around you may 
find some examples.

If you have some more steps to repo we may be able to look into it.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/01/2012, at 2:42 AM, Scott Fines wrote:

Unfortunately, I'm not doing a 1-1 migration; I'm moving data from a 15-node to 
a 6-node cluster. In this case, that means an excessive amount of time spent 
repairing data put on to the wrong machines.

Also, the bulkloader's requirement of having either a different IP address or a 
different machine is something that I don't really want to bother with, if I 
can activate it through JMX.

It seems like the JMX bulkloader works perfectly fine, however, except for the 
error that I mentioned below. So I suppose I'll ask again, is that error 
something to be concerned about?

Thanks,

Scott

From: aaron morton [aa...@thelastpickle.com]
Sent: Sunday, January 15, 2012 12:07 PM
To: user@cassandra.apache.org
Subject: Re: JMX BulkLoad weirdness

If you are doing a straight one-to-one copy from one cluster to another try…

1) nodetool snapshot on each prod node for the system and application key 
spaces.
2) rsync system and app key space snapshots
3) update the yaml files on the new cluster to have the correct initial_tokens. 
This is not necessary as they are stored in the system KS, but it is limits 
surprises later.
4) Start the new cluster.

For bulk load you will want to use the sstableloader 
http://www.datastax.com/dev/blog/bulk-loading


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/01/2012, at 3:32 AM, Scott Fines wrote:

Hi all,

I'm trying to copy a column family from our production cluster to our 
development one for testing purposes, so I thought I would try the bulkload 
API. Since I'm lazy, I'm using the Cassandra bulkLoad JMX call from one of the 
development machines. Here are the steps I followed:

1. (on production C* node): nodetool flush  
2. rsync SSTables from production C* node to development C* node
3. bulkLoad SSTables through JMX

But when I do that, on one of the development C* nodes, I keep getting this 
exception:

java.lang.NullPointerException
at org.apache.cassandra.io.sstable.SSTable.getMinimalKey(SSTable.java:156)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:334)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302)
at 
org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:156)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:88)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184)

After which, the node itself seems to stream data successfully (I'm in the 
middle of checking that right now).

Is this an error that I should be concerned about?

Thanks,

Scott



Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-18 Thread Andrei Savu
Hi guys,

I just want to the let you know that  Apache Whirr trunk (the upcoming
0.7.1 release) can deploy Cassandra 1.0.7 on AWS EC2 & Rackspace Cloud.

You can give it a try by running the following commands:
https://gist.github.com/1632893

And the last thing we would appreciate any suggestions on improving the
deployment scripts or on improving Whirr.

Thanks,

-- Andrei Savu / andreisavu.ro


RE: Incremental backups

2012-01-18 Thread Michael Vaknine
Hi,

Thank you for response.

I did restart for all the nodes and now I can see files in backup folders so
It seems like it is working.

During this process I have noticed to something very strange

 

In data/City folder there are files that are not created in the snapshot
folder (it looks like old orphaned files)

Is there any process of cassandta that will delete uneeded files I tried to
run nodetool cleanup but it did not help.

 

This is the files:

-rw-r--r-- 1 cassandra cassandra 230281 2011-12-06 00:57
AttractionCheckins.3039706172746974696f6e-f-157-Data.db

-rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 00:57
AttractionCheckins.3039706172746974696f6e-f-157-Filter.db

-rw-r--r-- 1 cassandra cassandra 27 2011-12-06 00:57
AttractionCheckins.3039706172746974696f6e-f-157-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 00:57
AttractionCheckins.3039706172746974696f6e-f-157-Statistics.db

-rw-r--r-- 1 cassandra cassandra   1321 2011-12-06 00:58
AttractionCheckins.3039706172746974696f6e-f-158-Data.db

-rw-r--r-- 1 cassandra cassandra 16 2011-12-06 00:58
AttractionCheckins.3039706172746974696f6e-f-158-Filter.db

-rw-r--r-- 1 cassandra cassandra 27 2011-12-06 00:58
AttractionCheckins.3039706172746974696f6e-f-158-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 00:58
AttractionCheckins.3039706172746974696f6e-f-158-Statistics.db

-rw-r--r-- 1 cassandra cassandra2627100 2011-12-06 06:55
Attractions.3039706172746974696f6e-f-1156-Data.db

-rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 06:55
Attractions.3039706172746974696f6e-f-1156-Filter.db

-rw-r--r-- 1 cassandra cassandra 20 2011-12-06 06:55
Attractions.3039706172746974696f6e-f-1156-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 06:55
Attractions.3039706172746974696f6e-f-1156-Statistics.db

-rw-r--r-- 1 cassandra cassandra2238358 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1157-Data.db

-rw-r--r-- 1 cassandra cassandra 16 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1157-Filter.db

-rw-r--r-- 1 cassandra cassandra 20 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1157-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1157-Statistics.db

-rw-r--r-- 1 cassandra cassandra 92 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1158-Data.db

-rw-r--r-- 1 cassandra cassandra 16 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1158-Filter.db

-rw-r--r-- 1 cassandra cassandra 20 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1158-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50
Attractions.3039706172746974696f6e-f-1158-Statistics.db

-rw-r--r-- 1 cassandra cassandra  44799 2011-12-06 01:25
CityResources.3039706172746974696f6e-f-365-Data.db

-rw-r--r-- 1 cassandra cassandra   1936 2011-12-06 01:25
CityResources.3039706172746974696f6e-f-365-Filter.db

-rw-r--r-- 1 cassandra cassandra196 2011-12-06 01:25
CityResources.3039706172746974696f6e-f-365-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 01:25
CityResources.3039706172746974696f6e-f-365-Statistics.db

-rw-r--r-- 1 cassandra cassandra   7647 2011-12-06 07:50
CityResources.3039706172746974696f6e-f-366-Data.db

-rw-r--r-- 1 cassandra cassandra 24 2011-12-06 07:50
CityResources.3039706172746974696f6e-f-366-Filter.db

-rw-r--r-- 1 cassandra cassandra 96 2011-12-06 07:50
CityResources.3039706172746974696f6e-f-366-Index.db

-rw-r--r-- 1 cassandra cassandra   4264 2011-12-06 07:50
CityResources.3039706172746974696f6e-f-366-Statistics.db

 

 

Thanks

Michael

 

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
Sent: Wednesday, January 18, 2012 10:40 AM
To: user@cassandra.apache.org
Subject: Re: Incremental backups

 

As this option is in the cassandra.yaml file, you might need to perform a
restart of your entire cluster (a rolling restart should work).

 

Hope this will help.

 

Alain

2012/1/18 Michael Vaknine 

Hi,

I am configured to do incremental backups on all my node on the cluster but
it is not working.

In cassandra.yaml : incremental_backups: true

When I check data folder there are some keyspaces that has folder backups
but empty and I suspect this is a folder created in the past when I had
0.7.6 version.

In a new creted Keyspace the folder does not exists.

Does someone know if I need to configure any thing besides cassandra.yaml
for this to work?

 

Thanks

Michael

 



Re: cassandra hit a wall: Too many open files (98567!)

2012-01-18 Thread Janne Jalkanen

1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason?

https://issues.apache.org/jira/browse/CASSANDRA-3616

/Janne

On Jan 18, 2012, at 03:52 , dir dir wrote:

> Very Interesting Why you open so many file? Actually what kind of
> system that is built by you until open so many files? would you tell us?
> Thanks...
> 
> 
> On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken  
> wrote:
> I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
> 
> ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[CompactionExecutor:2918,1,main] java.io.IOError:
> java.io.FileNotFoundException:
> /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
> open files in system)
> 
> After that it stopped working and just say there with this error
> (undestandable). I did an lsof and saw that it had 98567 open files,
> yikes! An ls in the data directory shows 234011 files. After restarting
> it spent about 5 hours compacting, then quieted down. About 173k files
> left in the data directory. I'm using leveldb (with compression). I
> looked into the json of the two large CFs and gen 0 is empty, most
> sstables are gen 3 & 4. I have a total of about 150GB of data
> (compressed). Almost all the SStables are around 3MB in size. Aren't
> they supposed to get 10x bigger at higher gen's?
> 
> This situation can't be healthy, can it? Suggestions?
> 



RE: nodetool ring question

2012-01-18 Thread Michael Vaknine
I did restart the cluster and now it is normal 5GB.

 

From: R. Verlangen [mailto:ro...@us2.nl] 
Sent: Wednesday, January 18, 2012 11:32 AM
To: user@cassandra.apache.org
Subject: Re: nodetool ring question

 

I also have this problem. My data on nodes grows to roughly 30GB. After a
restart only 5GB remains. Is a factor 6 common for Cassandra?

2012/1/18 aaron morton 

Good idea Jeremiah, are you using compression Michael ? 

 

Scanning through the CF stats this jumps out.

 

Column Family: Attractions

SSTable count: 3

Space used (live): 27542876685

Space used (total): 1213220387

Thats 25Gb of live data but only 1.3GB total. 

 

Otherwise want to see if a restart fixes it :) Would be interesting to know
if it's wrong from the start or drifts during streaming or compaction. 

 

Cheers

 

-

Aaron Morton

Freelance Developer

@aaronmorton

http://www.thelastpickle.com

 

On 18/01/2012, at 12:04 PM, Jeremiah Jordan wrote:





There were some nodetool ring load reporting issues with early version of
1.0.X don't remember when they were fixed, but that could be your issue.
Are you using compressed column families, a lot of the issues were with
those.
Might update to 1.0.7.

-Jeremiah

On 01/16/2012 04:04 AM, Michael Vaknine wrote: 

Hi,

 

I have a 4 nodes cluster 1.0.3 version

 

This is what I get when I run nodetool ring

 

Address DC  RackStatus State   LoadOwns
Token

 
127605887595351923798765477786913079296

10.8.193.87 datacenter1 rack1   Up Normal  46.47 GB
25.00%  0

10.5.7.76   datacenter1 rack1   Up Normal  48.01 GB
25.00%  42535295865117307932921825928971026432

10.8.189.197datacenter1 rack1   Up Normal  53.7 GB
25.00%  85070591730234615865843651857942052864

10.5.3.17   datacenter1 rack1   Up Normal  43.49 GB
25.00%  127605887595351923798765477786913079296

 

I have finished running repair on all 4 nodes.

 

I have less then 10 GB on the /var/lib/cassandra/data/ folders

 

My question is Why nodetool reports almost 50 GB on each node?

 

Thanks

Michael

 

 



Re: Hector + Range query problem

2012-01-18 Thread Philippe
Hi aaron

Nope: I'm using BOP...forgot to mention it in my original message.

I changed it to a multiget and it works but i think the range would be more
efficient so I'd really like to solve this.
Thanks
Le 18 janv. 2012 09:18, "aaron morton"  a écrit :

> Does this help ?
> http://wiki.apache.org/cassandra/FAQ#range_rp
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/01/2012, at 10:58 AM, Philippe wrote:
>
> Hello,
> I've been trying to retrieve rows based on key range but every single time
> I test, Hector retrieves ALL the rows, no matter the range I give it.
> What can I possibly be doing wrong ? Thanks.
>
> I'm doing a test on a single-node RF=1 cluster (c* 1.0.5) with one column
> family (I've added & truncated the CF quite a few times during my tests).
> Each row has a single column whose name is the byte value "2". The keys
> are 0,1,2,3 (shifted by a number of bits). The values are 0,1,2,3.
> list in the CLI gives me
>
> Using default limit of 100
> ---
> RowKey: 02
> => (column=02, value=00, timestamp=1326750723079000)
> ---
> RowKey: 010002
> => (column=02, value=01, timestamp=1326750723239000)
> ---
> RowKey: 020002
> => (column=02, value=02, timestamp=1326750723329000)
> ---
> RowKey: 030002
> => (column=02, value=03, timestamp=1326750723416000)
>
> 4 Rows Returned.
>
>
>
> Hector code:
>
>> RangeSlicesQuery query =
>> HFactory.createRangeSlicesQuery(keyspace, keySerializer,
>> columnNameSerializer, BytesArraySerializer
>> .get());
>> query.setColumnFamily(overlay).setKeys(keyStart, keyEnd).setColumnNames((
>> byte)2);
>
> query.execute();
>
>
> The execution log shows
>
> 1359 [main] INFO  com.sensorly.heatmap.drawing.cassandra.CassandraTileDao
>>  - Range query from TileKey [overlayName=UNSET, tilex=0, tiley=0, zoom=2]
>> to TileKey [overlayName=UNSET, tilex=1, tiley=0, zoom=2] => morton codes =
>> [02,010002]
>> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=0,
>> zoom=2] with 1 columns, morton = 02
>> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=0,
>> zoom=2] with 1 columns, morton = 010002
>> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=1,
>> zoom=2] with 1 columns, morton = 020002
>> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=1,
>> zoom=2] with 1 columns, morton = 030002
>
> => ALL rows are returned when I really expect it to only return the 1st
> one.
>
>
>
>
>
>


Re: nodetool ring question

2012-01-18 Thread R. Verlangen
I also have this problem. My data on nodes grows to roughly 30GB. After a
restart only 5GB remains. Is a factor 6 common for Cassandra?

2012/1/18 aaron morton 

> Good idea Jeremiah, are you using compression Michael ?
>
> Scanning through the CF stats this jumps out…
>
> Column Family: Attractions
> SSTable count: 3
> Space used (live): 27542876685
> Space used (total): 1213220387
> Thats 25Gb of live data but only 1.3GB total.
>
> Otherwise want to see if a restart fixes it :) Would be interesting to
> know if it's wrong from the start or drifts during streaming or compaction.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/01/2012, at 12:04 PM, Jeremiah Jordan wrote:
>
>  There were some nodetool ring load reporting issues with early version of
> 1.0.X don't remember when they were fixed, but that could be your issue.
> Are you using compressed column families, a lot of the issues were with
> those.
> Might update to 1.0.7.
>
> -Jeremiah
>
> On 01/16/2012 04:04 AM, Michael Vaknine wrote:
>
> Hi,
>
> ** **
>
> I have a 4 nodes cluster 1.0.3 version
>
> ** **
>
> This is what I get when I run nodetool ring
>
> ** **
>
> Address DC  RackStatus State   Load
> OwnsToken
>
>
> 127605887595351923798765477786913079296
>
> 10.8.193.87 datacenter1 rack1   Up Normal  46.47 GB
> 25.00%  0
>
> 10.5.7.76   datacenter1 rack1   Up Normal  48.01 GB
> 25.00%  42535295865117307932921825928971026432
>
> 10.8.189.197datacenter1 rack1   Up Normal  53.7 GB
> 25.00%  85070591730234615865843651857942052864
>
> 10.5.3.17   datacenter1 rack1   Up Normal  43.49 GB
> 25.00%  127605887595351923798765477786913079296
>
> ** **
>
> I have finished running repair on all 4 nodes.
>
> ** **
>
> I have less then 10 GB on the /var/lib/cassandra/data/ folders
>
> ** **
>
> My question is Why nodetool reports almost 50 GB on each node?
>
> ** **
>
> Thanks
>
> Michael
>
>
>


Re: cassandra hit a wall: Too many open files (98567!)

2012-01-18 Thread Sylvain Lebresne
On Fri, Jan 13, 2012 at 8:01 PM, Thorsten von Eicken  
wrote:
> I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
>
> ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[CompactionExecutor:2918,1,main] java.io.IOError:
> java.io.FileNotFoundException:
> /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
> open files in system)
>
> After that it stopped working and just say there with this error
> (undestandable). I did an lsof and saw that it had 98567 open files,
> yikes! An ls in the data directory shows 234011 files. After restarting
> it spent about 5 hours compacting, then quieted down. About 173k files
> left in the data directory. I'm using leveldb (with compression). I
> looked into the json of the two large CFs and gen 0 is empty, most
> sstables are gen 3 & 4. I have a total of about 150GB of data
> (compressed). Almost all the SStables are around 3MB in size. Aren't
> they supposed to get 10x bigger at higher gen's?

No, with leveled compaction, the (max) size of sstables is fixed
whatever the generation is (the default is 5MB, but it's 5MB of
uncompressed data (we may change that though) so 3MB sound about
right).
What changes between generations is the number of sstables it can
contain. Gen 1 can have 10 sstables (it can have more but only
temporarily), Gen 2 can have 100, Gen 3 can have 1000 etc.. So again,
that most sstables are in gen 3 and 4 is expected too.

> This situation can't be healthy, can it? Suggestions?

Leveled compaction uses lots of files (the number is proportional to
the amount of data). It is not necessarily a big problem as modern OS
deal wit big amount of open files fairly well (as far as I know at
least). I would just up the file descriptor ulimit and not worry too
much about it, unless you have reasons to believe that it's an actual
descriptor leak (but given the number of files you have, the number of
open ones doesn't seem off so I don't think there is one here) or that
this has performance impacts.

--
Sylvain


Re: poor Memtable performance on column slices?

2012-01-18 Thread Sylvain Lebresne
On Wed, Jan 18, 2012 at 2:44 AM, Josep Blanquer  wrote:
> Hi,
>
>  I've been doing some tests using wide rows recently, and I've seen some odd
> performance problems that I'd like to understand.
>
> In particular, I've seen that the time it takes for Cassandra to perform a
> column slice of a single key, solely in a Memtable, seems to be very
> expensive, but most importantly proportional to the ordered position where
> the start column of the slice lives.
>
> In other words:
>  1- if I start Cassandra fresh (with an empty ColumnFamily with TimeUUID
> comparator)
>  2- I create a single Row with Key "K"
>  3- Then add 200K TimeUUID columns to key "K"
>  4- (and make sure nothing is flushed to SSTables...so it's all in the
> Memtable)
>
> ...I observe the following timings (secondds to perform 1000 reads) while
> performing multiget slices on it:  (pardon the pseudo-code, but you'll get
> the gist)
>
> a) simply a get of the first column:  GET("K",:count=>1)
>   --  2.351226
>
> b) doing a slice get, starting from the first column:  GET("K",:start =>
> '144abe16-416c-11e1-9e23-2cbae9ddfe8b' , :count => 1 )
>   -- 2.189224   <<- so with or without "start" doesn't seem to make much of
> a difference
>
> c) doing a slice get, starting from the middle of the ordered
> columns..approx starting at item number 100K:   GET("K",:start =>
> '9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 )
>  -- 11.849326  <<- 5 times more expensive if the start of the slice is 100K
> positions away
>
> d) doing a slice get, starting from the last of the ordered columns..approx
> position 200K:   GET("K",:start => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' ,
> :count => 1 )
>   -- 19.889741   <<- Almost twice as expensive than starting the slice at
> position 100K, and 10 times more expensive than starting from the first one
>
> This behavior leads me to believe that there's a clear Memtable column scan
> for the columns of the key.
> If one tries a column name read on those positions (i.e., not a slice), the
> performance is constant. I.e., GET("K",
> '144abe16-416c-11e1-9e23-2cbae9ddfe8b') . Retrieving the first, middle or
> last timeUUID is done in the same amount of time.
>
> Having increasingly worse performance for column slices in Memtables seems
> to be a bit of a problem...aren't Memtables backed by a structure that has
> some sort of column name indexing?...so that landing on the start column can
> be efficient? I'm definitely observing very high CPU utilization on those
> scans...By the way, with wide columns like this, slicing SSTables is quite
> faster than slicing Memtables...I'm attributing that to the sampled index of
> the SSTables, hence that's why I'm wondering if the Memtables do not have
> such column indexing builtin and resort to linked lists of sort
>
> Note, that the actual timings shown are not important, it's in my laptop and
> I have a small amount of debugging enabled...what it is important is the
> difference between then.
>
> I'm using Cassandra trunk as of Dec 1st, but I believe I've done experiments
> with 0.8 series too, leading to the same issue.

You may want to retry your experiments on current trunk. We do had inefficiency
in our memtable search that was fixed by:
https://issues.apache.org/jira/browse/CASSANDRA-3545
(the name of the ticket doesn't make it clear that it's related but it is)

The issue was committed on December 8.

--
Sylvain

>
>  Cheers,
>
> Josep M.


Re: nodetool ring question

2012-01-18 Thread aaron morton
Good idea Jeremiah, are you using compression Michael ? 

Scanning through the CF stats this jumps out…

Column Family: Attractions
SSTable count: 3
Space used (live): 27542876685
Space used (total): 1213220387
Thats 25Gb of live data but only 1.3GB total. 

Otherwise want to see if a restart fixes it :) Would be interesting to know if 
it's wrong from the start or drifts during streaming or compaction. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/01/2012, at 12:04 PM, Jeremiah Jordan wrote:

> There were some nodetool ring load reporting issues with early version of 
> 1.0.X don't remember when they were fixed, but that could be your issue.  Are 
> you using compressed column families, a lot of the issues were with those.
> Might update to 1.0.7.
> 
> -Jeremiah
> 
> On 01/16/2012 04:04 AM, Michael Vaknine wrote:
>> 
>> Hi,
>>  
>> I have a 4 nodes cluster 1.0.3 version
>>  
>> This is what I get when I run nodetool ring
>>  
>> Address DC  RackStatus State   LoadOwns  
>>   Token
>>  
>>   127605887595351923798765477786913079296
>> 10.8.193.87 datacenter1 rack1   Up Normal  46.47 GB
>> 25.00%  0
>> 10.5.7.76   datacenter1 rack1   Up Normal  48.01 GB
>> 25.00%  42535295865117307932921825928971026432
>> 10.8.189.197datacenter1 rack1   Up Normal  53.7 GB 
>> 25.00%  85070591730234615865843651857942052864
>> 10.5.3.17   datacenter1 rack1   Up Normal  43.49 GB
>> 25.00%  127605887595351923798765477786913079296
>>  
>> I have finished running repair on all 4 nodes.
>>  
>> I have less then 10 GB on the /var/lib/cassandra/data/ folders
>>  
>> My question is Why nodetool reports almost 50 GB on each node?
>>  
>> Thanks
>> Michael



Re: Incremental backups

2012-01-18 Thread Alain RODRIGUEZ
As this option is in the cassandra.yaml file, you might need to perform a
restart of your entire cluster (a rolling restart should work).

Hope this will help.

Alain

2012/1/18 Michael Vaknine 

> Hi,
>
> I am configured to do incremental backups on all my node on the cluster
> but it is not working.
>
> In cassandra.yaml : incremental_backups: true
>
> When I check data folder there are some keyspaces that has folder backups
> but empty and I suspect this is a folder created in the past when I had
> 0.7.6 version.
>
> In a new creted Keyspace the folder does not exists.
>
> Does someone know if I need to configure any thing besides cassandra.yaml
> for this to work?
>
> ** **
>
> Thanks
>
> Michael
>


Re: specifying initial cassandra schema

2012-01-18 Thread aaron morton
check the command line help for cassandra-cli, you can pass it a file name. 

e.g. cassandra --host localhost --file schema.txt

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/01/2012, at 9:35 AM, Carlos Pérez Miguel wrote:

> Hi Ramesh
> 
> You can use the schematool command. I am using it for the same
> purposes in Cassandra 0.7.9.
> 
> I use the following line in my cassandra startup script:
> 
> $CASSANDRA_HOME/bin/schematool HOSTNAME 8080 import
> 
> where HOSTNAME is the hostname of your test machine. It will import
> the schema from your cassandra.yaml file.
> If you execute it and there is already a schema in the cassandra
> cluster, you'll get a exception from schematool but no impact to the
> cluster.
> 
> Bye
> 
> Carlos Pérez Miguel
> 
> 
> 
> 2012/1/17 Ramesh Natarajan :
>> I usually start cassandra and then use cassandra-cli to import a
>> schema.   Is there any  automated way to load a fixed schema when
>> cassandra starts automatically?
>> 
>> I have a test setup where i run cassandra on a single node. I have a
>> OS image packaged with cassandra and it automatically starts cassandra
>> as a part of OS boot up.
>> 
>> I saw some old references to specify schema in cassandra.yaml.  Is
>> this still supported in Cassandra 1.x?  Are there any examples?
>> 
>> thanks
>> Ramesh



Re: Hector + Range query problem

2012-01-18 Thread aaron morton
Does this help ? 
http://wiki.apache.org/cassandra/FAQ#range_rp

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/01/2012, at 10:58 AM, Philippe wrote:

> Hello,
> I've been trying to retrieve rows based on key range but every single time I 
> test, Hector retrieves ALL the rows, no matter the range I give it.
> What can I possibly be doing wrong ? Thanks.
> 
> I'm doing a test on a single-node RF=1 cluster (c* 1.0.5) with one column 
> family (I've added & truncated the CF quite a few times during my tests).
> Each row has a single column whose name is the byte value "2". The keys are 
> 0,1,2,3 (shifted by a number of bits). The values are 0,1,2,3.
> list in the CLI gives me
> 
> Using default limit of 100
> ---
> RowKey: 02
> => (column=02, value=00, timestamp=1326750723079000)
> ---
> RowKey: 010002
> => (column=02, value=01, timestamp=1326750723239000)
> ---
> RowKey: 020002
> => (column=02, value=02, timestamp=1326750723329000)
> ---
> RowKey: 030002
> => (column=02, value=03, timestamp=1326750723416000)
> 
> 4 Rows Returned.
> 
> 
> 
> Hector code:
> RangeSlicesQuery query = 
> HFactory.createRangeSlicesQuery(keyspace, keySerializer, 
> columnNameSerializer, BytesArraySerializer
> .get());
> query.setColumnFamily(overlay).setKeys(keyStart, 
> keyEnd).setColumnNames((byte)2);
> query.execute();  
> 
> 
> The execution log shows
> 
> 
> 1359 [main] INFO  com.sensorly.heatmap.drawing.cassandra.CassandraTileDao  - 
> Range query from TileKey [overlayName=UNSET, tilex=0, tiley=0, zoom=2] to 
> TileKey [overlayName=UNSET, tilex=1, tiley=0, zoom=2] => morton codes = 
> [02,010002]
> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=0, 
> zoom=2] with 1 columns, morton = 02
> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=0, 
> zoom=2] with 1 columns, morton = 010002
> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=1, 
> zoom=2] with 1 columns, morton = 020002
> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=1, 
> zoom=2] with 1 columns, morton = 030002
> => ALL rows are returned when I really expect it to only return the 1st one.
> 
> 
> 
> 
> 



Incremental backups

2012-01-18 Thread Michael Vaknine
Hi,

I am configured to do incremental backups on all my node on the cluster but
it is not working.

In cassandra.yaml : incremental_backups: true

When I check data folder there are some keyspaces that has folder backups
but empty and I suspect this is a folder created in the past when I had
0.7.6 version.

In a new creted Keyspace the folder does not exists.

Does someone know if I need to configure any thing besides cassandra.yaml
for this to work?

 

Thanks

Michael



Re: Brisk with standard C* cluster

2012-01-18 Thread aaron morton
Yes, you can add nodes in a second DC that have cassandra and brisk. This will 
keep the analytics load of the original nodes. There is some documentation here 
http://www.datastax.com/docs/0.8/brisk/index

You may have better luck with user group 
http://groups.google.com/group/brisk-users or the data stax forums 
http://www.datastax.com/support-forums/

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/01/2012, at 9:07 AM, Mohit Anchlia wrote:

> Is it possible to add Brisk only nodes to standard C* cluster? So if
> we have node A,B,C with standard C* then add Brisk node D,E,F for
> analytics?