RE: Secondary indexes and cardinality

2012-02-13 Thread Tiwari, Dushyant
Perfect, Aaron, Thanks a lot



From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, February 14, 2012 12:54 AM
To: user@cassandra.apache.org
Subject: Re: Secondary indexes and cardinality

Heard that indexing a field with high cardinality is not good.
http://www.datastax.com/docs/0.7/data_model/secondary_indexes

Will there be any performance improvement? Is this the way secondary indexes 
are maintained?
Updating secondary indexes requires a read and a write.

Also this makes me think - Will there be any lose if we have many rows in a CF 
say 10 million?
http://www.datastax.com/docs/0.7/data_model/cfs_as_indexes#indexes

Having 10 million columns in a row is not a problem by itself. It depends on 
how you want to read things.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/02/2012, at 1:11 AM, Tiwari, Dushyant wrote:


Hi Cassandra Users,

Heard that indexing a field with high cardinality is not good. If we create a 
CF to store the index information like indexed field as key and the keys of 
original CF as cols in the row. Will there be any performance improvement? Is 
this the way secondary indexes are maintained?

Also this makes me think - Will there be any lose if we have many rows in a CF 
say 10 million?

Thanks,
Dushyant



NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do not constitute, advice 
within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and 
Consumer Protection Act. If you have received this communication in error, 
please destroy all electronic and paper copies and notify the sender 
immediately. Mistransmission is not intended to waive confidentiality or 
privilege. Morgan Stanley reserves the right, to the extent permitted under 
applicable law, to monitor electronic communications. This message is subject 
to terms available at the following link: 
http://www.morganstanley.com/disclaimers. If you cannot access these links, 
please notify us by reply message and we will send the contents to you. By 
messaging with Morgan Stanley you consent to the foregoing.


--
NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do not constitute, advice 
within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and 
Consumer Protection Act. If you have received this communication in error, 
please destroy all electronic and paper copies and notify the sender 
immediately. Mistransmission is not intended to waive confidentiality or 
privilege. Morgan Stanley reserves the right, to the extent permitted under 
applicable law, to monitor electronic communications. This message is subject 
to terms available at the following link: 
http://www.morganstanley.com/disclaimers. If you cannot access these links, 
please notify us by reply message and we will send the contents to you. By 
messaging with Morgan Stanley you consent to the foregoing.


Got fatal exception after upgrade to 1.0.7 from 1.0.6

2012-02-13 Thread Roshan
Hi

I got the below exception to the system.log after upgrade to 1.0.7 from
1.0.6 version. I am using the same configuration files which I used in 1.0.6
version.

2012-02-14 10:48:12,379 ERROR [AbstractCassandraDaemon] Fatal exception in
thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at
org.cliffc.high_scale_lib.NonBlockingHashMap.hash(NonBlockingHashMap.java:113)
at
org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:553)
at
org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:348)
at
org.cliffc.high_scale_lib.NonBlockingHashMap.putIfAbsent(NonBlockingHashMap.java:319)
at
org.cliffc.high_scale_lib.NonBlockingHashSet.add(NonBlockingHashSet.java:32)
at
org.apache.cassandra.db.HintedHandOffManager.scheduleHintDelivery(HintedHandOffManager.java:409)
at
org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:394)
at
org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:84)
at
org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:119)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Could someone please help me on this? Thanks.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Got-fatal-exception-after-upgrade-to-1-0-7-from-1-0-6-tp7282462p7282462.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: active/pending queue lengths

2012-02-13 Thread Franc Carter
On Tue, Feb 14, 2012 at 6:06 AM, aaron morton wrote:

> What CL are you reading at ?
>

Quorum


>
> Write ops go to RF number of nodes, read ops go to RF number of nodes 10%
> (the default probability that Read Repair will be running) of the time and
> CL number of nodes 90% of the time. With 2 nodes and RF 2 the QUOURM is 2,
> every request will involve all nodes.
>

Yep, the thing tat confuses is the different behaviour for reading from one
node versus two


>
> As to why the pending list gets longer, do you have some more info ? What
> process are you using to measure ? It's hard to guess why. In this setup
> every node will have the data and should be able to do a local read and
> then on the other node.
>

I have four pycassa clients, two making requests to one server and two
making requests to the other (or all four making requests to the same
server). The requested keys don't overlap and I would expect/assume the
keys are in the keycache

I am looking at the output of nodetool -h tpstats

cheers


> Cheers
>
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 14/02/2012, at 12:47 AM, Franc Carter wrote:
>
>
> Hi,
>
> I've been looking at tpstats as various test queries run and I noticed
> something I don't understand.
>
> I have a two node cluster with RF=2 on which I run 4 parallel queries,
> each job goes through a list of keys doing a multiget for 2 keys at a time.
> If two of the queries go to one node and the other two go to a different
> node then the pending queue on the node gets much longer than if they all
> go to the one node.
>
> I'm clearly missing something here as I would have expected the opposite
>
> cheers
>
> --
> *Franc Carter* | Systems architect | Sirca Ltd
>  
> franc.car...@sirca.org.au | www.sirca.org.au
> Tel: +61 2 9236 9118
>  Level 9, 80 Clarence St, Sydney NSW 2000
> PO Box H58, Australia Square, Sydney NSW 1215
>
>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Querying all keys in a column family

2012-02-13 Thread Martin Arrowsmith
Hi Experts,

My program is such that it queries all keys on Cassandra. I want to do this
as quick as possible, in order to get as close to real-time as possible.

One solution I heard was to use the sstables2json tool, and read the data
in as JSON. I understand that reading from each line in Cassandra might
take longer.

Are there any other ideas for doing this ? Or can you confirm that
sstables2json is the way to go.

Querying 100 rows in Cassandra the normal way is fast enough. I'd like to
query a million rows, do some calculations on them, and spit out the result
like it's real time.

Thanks for any help you can give,

Martin


London meetup - upcoming events

2012-02-13 Thread Dave Gardner
Hi all,

Those in the UK might be interested in the next Cassandra London events:

Monday 20th February

Two talks: "Cassandra as an email storage system" and "CQL - then and now"
http://www.meetup.com/Cassandra-London/events/29569461/


Tuesday 6th March

How Netflix uses Cassandra with Adrian Cockcroft
http://www.meetup.com/Cassandra-London/events/50558912/


Dave


Querying for rows without a particular column

2012-02-13 Thread Asankha C. Perera

Hi All

I am using expiring columns in my column family, and need to search for 
the rows where a particular column expired (and no longer exists).. I am 
using Hector client. How can I make a query to find the rows of my interest?


thanks
asankha

--
Asankha C. Perera
AdroitLogic, http://adroitlogic.org

http://esbmagic.blogspot.com






Re: SSTable symlinks

2012-02-13 Thread Dan Retzlaff
Too easy. Does anybody have a more difficult approach? :) Just kidding.
Thanks, Aaron.

On Mon, Feb 13, 2012 at 11:43 AM, aaron morton wrote:

> I am nursing an overloaded 0.6 cluster
>
> Shine on you crazy diamond.
>
> If you have some additional storage available I would:
>
> 1) Allocate a data directory for each node, stop the node and add the
> directory to the config for each node in the DataFileDirectory
>   
>   /var/lib/cassandra/data
>   
>
> 2) The nodes will now create SSTables in the new directory. Bring it back
> up and compact.
>
> 3) Once you have compacted I would recommend stopping the node, moving the
> SSTables back to the local node and removing the additional data file
> directory.
>
> Hope that helps.
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 14/02/2012, at 7:10 AM, Dan Retzlaff wrote:
>
> Hi all,
>
> I am nursing an overloaded 0.6 cluster through compaction to get its disk
> usage under 50%. Many rows' content have been replaced so that after
> compaction there will be plenty of room, but a couple of nodes are
> currently at 95%.
>
> One strategy I considered is temporarily moving a couple of the larger
> SSTables to an NFS mount and putting symlinks in the data directory.
> However, Jonathan says that Cassandra is not okay with symlinked SSTables
> [1]. Can someone elaborate on why this won't work?
>
> If a hack like this is not possible, then I am at a loss for options other
> than ungracefully dropping the node from the cluster and reconstructing its
> data from other replicas. If anyone has recovered from a similar situation,
> I would appreciate your advice.
>
> Regards,
> Dan
>
> [1]
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3CBANLkTina%2BtZf9BhQwzW0Fnv-KAL7%2BKZArQ%40mail.gmail.com%3E
>
>
>
>


Re: problem with sliceQuery with composite column

2012-02-13 Thread aaron morton
If you want to get all the tick between two integers yes. 

A

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/02/2012, at 8:36 AM, Dave Brosius wrote:

> if the composite column was rearranged as 
> 
> ticks:111
> 
> wouldn't the result be as desired?
> 
> 
> - Original Message -
> From: "aaron morton"  
> Sent: Mon, February 13, 2012 13:41
> Subject: Re: problem with sliceQuery with composite column
> 
> My understanding is you expected to see 
>  
> 111:ticks
> 222:ticks
> 333:ticks
> 444:ticks
>  
> But instead you are getting 
>  
> 111:ticks
> 111:quote
> 222:ticks
> 222:quote
> 333:ticks
> 333:quote
> 444:ticks
>  
> If that is the case things are working as expected. 
>  
> The slice operation gets a column range. So if you start at 111:ticks and end 
> at 444:ticks you are asking for all the column in between. 
>  
> it is not possible to filter at each level of a composite column. 
>  
> Hope that helps.
>  
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
>  
> 
> On 10/02/2012, at 10:58 PM, Deno Vichas wrote:
> 
>> 
>>  
>> all,
>> 
>> could somebody clue me to why the below code doesn't work.  my schema is;
>> 
>> create column family StockHistory
>> with comparator = 'CompositeType(LongType,UTF8Type)'
>> and default_validation_class = 'UTF8Type'
>> and key_validation_class = 'UTF8Type';
>> 
>> 
>>  the time part works but i'm getting other column with the second half not 
>> equaling the value set.  it's like it's ignoring the string part of the 
>> composite.
>> 
>> Composite start = new Composite();
>> Composite end = new Composite();
>> start.addComponent(0, 
>> startDate.toDateTimeAtStartOfDay().toDate().getTime(), 
>> Composite.ComponentEquality.EQUAL);
>> end.addComponent(0, endDate.toDateMidnight().toDate().getTime(), 
>> Composite.ComponentEquality.EQUAL);
>> 
>> start.addComponent(1, "ticks", Composite.ComponentEquality.EQUAL);
>> end.addComponent(1, "ticks", 
>> Composite.ComponentEquality.GREATER_THAN_EQUAL);
>> 
>>   ;   SliceQuery sliceQuery =
>> HFactory.createSliceQuery(_keyspace, _stringSerializer, new 
>> CompositeSerializer(), _stringSerializer);
>> sliceQuery.setColumnFamily(CF_STOCK_HISTORY).setKey(symbol);
>> sliceQuery.setRange(start, end, false, 10);
>> 
>> QueryResult> result = 
>> sliceQuery.execute();
>> ColumnSlice cs = result.get();
>> SortedSet historyJSON = new TreeSet();
>> for ( HColumn col: cs.getColumns() ) {
>> System.out.println(col.getName().get(0, _longSerializer) +"|"+ 
>> col.getName().get(1,StringSerializer.get()));
>> }
>> 
>> 
>> this outputs the following;
>> 
>> 132703560|ticks
>> 132704640|quote
>> 132729480|ticks
>> 132730560|quote
>> 132738120|ticks
>> 132739200|quote
>> 132746760|ticks
>> 132747840|quote
>> 132755400|ticks
>> 132756480|quote
>> 
>> thanks,
>> deno
>> 
>>  



Re: problem with sliceQuery with composite column

2012-02-13 Thread Dave Brosius
if the composite column was rearranged as ticks:111wouldn't the result be as 
desired?   - Original Message -From: "aaron morton" 
>;aa...@thelastpickle.com 

Re: Hector and batch mutation

2012-02-13 Thread aaron morton
> Is the execution of the batch sequential? (in the order data is added).
No, parallel see concurrent_writes in cassandra.yaml

> Also say there are 10 operations in a batch and 3rd fails will it try the 
> remaining 7?


http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/02/2012, at 3:50 AM, Tiwari, Dushyant wrote:

> Hi Guys,
>  
> A very trivial question on batch mutation provided by Hector. Is the 
> execution of the batch sequential? (in the order data is added).
> Also say there are 10 operations in a batch and 3rd fails will it try the 
> remaining 7?
> Is execution of batch mutator multi threaded ?
>  
>  
> Regards,
> Dushyant
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions 
> or views contained herein are not intended to be, and do not constitute, 
> advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform 
> and Consumer Protection Act. If you have received this communication in 
> error, please destroy all electronic and paper copies and notify the sender 
> immediately. Mistransmission is not intended to waive confidentiality or 
> privilege. Morgan Stanley reserves the right, to the extent permitted under 
> applicable law, to monitor electronic communications. This message is subject 
> to terms available at the following link: 
> http://www.morganstanley.com/disclaimers. If you cannot access these links, 
> please notify us by reply message and we will send the contents to you. By 
> messaging with Morgan Stanley you consent to the foregoing.



Re: How to bring cluster to consistency

2012-02-13 Thread aaron morton
> Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly 
> tells me that my message looks like spam…
Send as text. 

What version are you using ? 
It looks like you are using the ByteOrderedPartitioner , is that correct ? 

I would try to get the repair done first, what was the error ? 

> I believe it is affected by data size. At least some estimation about how 
> much time and memory it could take would be of use.
You can try to reduce the memory requirements of repair by reducing the 
in_memory_compaction_limit. How much memory do you have allocated to cassandra ?

BUT… 1.5T of data is a lot for a single node. Most people work with around 200G 
to 400G per node. Otherwise things like repair and move take a very long time. 
There are also a number of structures (bloom filters, index samples) maintained 
in memory that vary with respect to the data load. 

Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/02/2012, at 2:35 AM, Nikolay Kоvshov wrote:

> Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly 
> tells me that my message looks like spam...
> 
>> 2/ both of your nodes seem to be using the same token? The output indicates 
>> that 100% of your key range is assigned to 10.111.1.141 (and therefore 
>> 10.111.1.142 holds replicas only)
> 
> Well, I didn't assign anything. I just filled nodes with data, that's 
> Cassandra itself who assigned that. I am trying to perform nodetool move now. 
> Still I didn't understand from wiki what that means (keys assigned to 
> servers). When both servers are up I can write to 1 and read from 2, or I can 
> write to 2 and read from 1 and all works perfect.
> 
>> 3/ maybe repair is being affected by above, but in my experience it can be 
>> sensitive
> 
> I believe it is affected by data size. At least some estimation about how 
> much time and memory it could take would be of use.
> 
> 13.02.2012, 17:19, "Dominic Williams" :
>> Hi Nikolay,Some points that may be useful:
>> 1/ auto_bootstrap = true is used for telling a new node to join the ring 
>> (the cluster). It has nothing to do with hinted handoff
>> 2/ both of your nodes seem to be using the same token? The output indicates 
>> that 100% of your key range is assigned to 10.111.1.141 (and therefore 
>> 10.111.1.142 holds replicas only)
>> 3/ maybe repair is being affected by above, but in my experience it can be 
>> sensitive
>> 
>> On 13 February 2012 13:06, Nikolay Kоvshov  wrote:
>>> Hello everybody
>>> 
>>> I have a very simple cluster containing 2 servers. Replication_factor = 2, 
>>> Consistency_level of reads and writes = 1
>>> 
>>> 10.111.1.141datacenter1 rack1   Up Normal  1.5 TB  
>>> 100.00% vjpigMzv4KkX3x7z
>>> 10.111.1.142datacenter1 rack1   Up Normal  1.41 TB 
>>> 0.00%   聶jpigMzv4KkX3x7z
>>> 
>>> Note the size please.
>>> 
>>> Say, server1 cassandra dies and I restart it later. Hinted_handoff = 
>>> enabled, auto_bootstrap = true
>>> 
>>> During that time server2 received reads and writes. I want changes to be 
>>> copied to server1 when it joins the cluster. As I have replication_factor 
>>> 2, I suppose each data piece should be stored on both servers. 
>>> Auto_bootstrapping doesn't seem to work that way - changed data doesn't 
>>> migrate.
>>> 
>>> I run nodetool repair and it is always killed by OOM. What else can I do to 
>>> bring cluster to consistency?
>>> 
>>> Thank you in advance



Re: Secondary indexes and cardinality

2012-02-13 Thread aaron morton
> Heard that indexing a field with high cardinality is not good. 
http://www.datastax.com/docs/0.7/data_model/secondary_indexes

> Will there be any performance improvement? Is this the way secondary indexes 
> are maintained?
Updating secondary indexes requires a read and a write. 

> Also this makes me think – Will there be any lose if we have many rows in a 
> CF say 10 million?
http://www.datastax.com/docs/0.7/data_model/cfs_as_indexes#indexes

Having 10 million columns in a row is not a problem by itself. It depends on 
how you want to read things.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/02/2012, at 1:11 AM, Tiwari, Dushyant wrote:

> Hi Cassandra Users,
>  
> Heard that indexing a field with high cardinality is not good. If we create a 
> CF to store the index information like indexed field as key and the keys of 
> original CF as cols in the row. Will there be any performance improvement? Is 
> this the way secondary indexes are maintained?
>  
> Also this makes me think – Will there be any lose if we have many rows in a 
> CF say 10 million?
>  
> Thanks,
> Dushyant
>  
>
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions 
> or views contained herein are not intended to be, and do not constitute, 
> advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform 
> and Consumer Protection Act. If you have received this communication in 
> error, please destroy all electronic and paper copies and notify the sender 
> immediately. Mistransmission is not intended to waive confidentiality or 
> privilege. Morgan Stanley reserves the right, to the extent permitted under 
> applicable law, to monitor electronic communications. This message is subject 
> to terms available at the following link: 
> http://www.morganstanley.com/disclaimers. If you cannot access these links, 
> please notify us by reply message and we will send the contents to you. By 
> messaging with Morgan Stanley you consent to the foregoing.



Re: active/pending queue lengths

2012-02-13 Thread aaron morton
What CL are you reading at ? 

Write ops go to RF number of nodes, read ops go to RF number of nodes 10% (the 
default probability that Read Repair will be running) of the time and CL number 
of nodes 90% of the time. With 2 nodes and RF 2 the QUOURM is 2, every request 
will involve all nodes. 

As to why the pending list gets longer, do you have some more info ? What 
process are you using to measure ? It's hard to guess why. In this setup every 
node will have the data and should be able to do a local read and then on the 
other node. 

Cheers 

 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/02/2012, at 12:47 AM, Franc Carter wrote:

> 
> Hi,
> 
> I've been looking at tpstats as various test queries run and I noticed 
> something I don't understand.
> 
> I have a two node cluster with RF=2 on which I run 4 parallel queries, each 
> job goes through a list of keys doing a multiget for 2 keys at a time. If two 
> of the queries go to one node and the other two go to a different node then 
> the pending queue on the node gets much longer than if they all go to the one 
> node.
> 
> I'm clearly missing something here as I would have expected the opposite
> 
> cheers
> 
> -- 
> Franc Carter | Systems architect | Sirca Ltd
> franc.car...@sirca.org.au | www.sirca.org.au
> Tel: +61 2 9236 9118 
> Level 9, 80 Clarence St, Sydney NSW 2000
> PO Box H58, Australia Square, Sydney NSW 1215
> 



Re: problem with sliceQuery with composite column

2012-02-13 Thread aaron morton
My understanding is you expected to see 

111:ticks
222:ticks
333:ticks
444:ticks

But instead you are getting 

111:ticks
111:quote
222:ticks
222:quote
333:ticks
333:quote
444:ticks

If that is the case things are working as expected. 

The slice operation gets a column range. So if you start at 111:ticks and end 
at 444:ticks you are asking for all the column in between. 

it is not possible to filter at each level of a composite column. 

Hope that helps.
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 10/02/2012, at 10:58 PM, Deno Vichas wrote:

> all,
> 
> could somebody clue me to why the below code doesn't work.  my schema is;
> 
> create column family StockHistory
> with comparator = 'CompositeType(LongType,UTF8Type)'
> and default_validation_class = 'UTF8Type'
> and key_validation_class = 'UTF8Type';
> 
> 
>  the time part works but i'm getting other column with the second half not 
> equaling the value set.  it's like it's ignoring the string part of the 
> composite.
> 
> Composite start = new Composite();
> Composite end = new Composite();
> start.addComponent(0, 
> startDate.toDateTimeAtStartOfDay().toDate().getTime(), 
> Composite.ComponentEquality.EQUAL);
> end.addComponent(0, endDate.toDateMidnight().toDate().getTime(), 
> Composite.ComponentEquality.EQUAL);
> 
> start.addComponent(1, "ticks", Composite.ComponentEquality.EQUAL);
> end.addComponent(1, "ticks", 
> Composite.ComponentEquality.GREATER_THAN_EQUAL);
> 
> SliceQuery sliceQuery =
> HFactory.createSliceQuery(_keyspace, _stringSerializer, new 
> CompositeSerializer(), _stringSerializer);
> sliceQuery.setColumnFamily(CF_STOCK_HISTORY).setKey(symbol);
> sliceQuery.setRange(start, end, false, 10);
> 
> QueryResult> result = 
> sliceQuery.execute();
> ColumnSlice cs = result.get();
> SortedSet historyJSON = new TreeSet();
> for ( HColumn col: cs.getColumns() ) {
> System.out.println(col.getName().get(0, _longSerializer) +"|"+ 
> col.getName().get(1,StringSerializer.get()));
> }
> 
> 
> this outputs the following;
> 
> 132703560|ticks
> 132704640|quote
> 132729480|ticks
> 132730560|quote
> 132738120|ticks
> 132739200|quote
> 132746760|ticks
> 132747840|quote
> 132755400|ticks
> 132756480|quote
> 
> thanks,
> deno



SSTable symlinks

2012-02-13 Thread Dan Retzlaff
Hi all,

I am nursing an overloaded 0.6 cluster through compaction to get its disk
usage under 50%. Many rows' content have been replaced so that after
compaction there will be plenty of room, but a couple of nodes are
currently at 95%.

One strategy I considered is temporarily moving a couple of the larger
SSTables to an NFS mount and putting symlinks in the data directory.
However, Jonathan says that Cassandra is not okay with symlinked SSTables
[1]. Can someone elaborate on why this won't work?

If a hack like this is not possible, then I am at a loss for options other
than ungracefully dropping the node from the cluster and reconstructing its
data from other replicas. If anyone has recovered from a similar situation,
I would appreciate your advice.

Regards,
Dan

[1]
http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3CBANLkTina%2BtZf9BhQwzW0Fnv-KAL7%2BKZArQ%40mail.gmail.com%3E


Hector and batch mutation

2012-02-13 Thread Tiwari, Dushyant
Hi Guys,

A very trivial question on batch mutation provided by Hector. Is the execution 
of the batch sequential? (in the order data is added).
Also say there are 10 operations in a batch and 3rd fails will it try the 
remaining 7?
Is execution of batch mutator multi threaded ?


Regards,
Dushyant

--
NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do not constitute, advice 
within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and 
Consumer Protection Act. If you have received this communication in error, 
please destroy all electronic and paper copies and notify the sender 
immediately. Mistransmission is not intended to waive confidentiality or 
privilege. Morgan Stanley reserves the right, to the extent permitted under 
applicable law, to monitor electronic communications. This message is subject 
to terms available at the following link: 
http://www.morganstanley.com/disclaimers. If you cannot access these links, 
please notify us by reply message and we will send the contents to you. By 
messaging with Morgan Stanley you consent to the foregoing.


Re: How to bring cluster to consistency

2012-02-13 Thread Nikolay Kоvshov
Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly 
tells me that my message looks like spam...

> 2/ both of your nodes seem to be using the same token? The output indicates 
> that 100% of your key range is assigned to 10.111.1.141 (and 
> therefore 10.111.1.142 holds replicas only)

Well, I didn't assign anything. I just filled nodes with data, that's Cassandra 
itself who assigned that. I am trying to perform nodetool move now. Still I 
didn't understand from wiki what that means (keys assigned to servers). When 
both servers are up I can write to 1 and read from 2, or I can write to 2 and 
read from 1 and all works perfect.

> 3/ maybe repair is being affected by above, but in my experience it can be 
> sensitive

I believe it is affected by data size. At least some estimation about how much 
time and memory it could take would be of use.

13.02.2012, 17:19, "Dominic Williams" :
> Hi Nikolay,Some points that may be useful:
> 1/ auto_bootstrap = true is used for telling a new node to join the ring (the 
> cluster). It has nothing to do with hinted handoff
> 2/ both of your nodes seem to be using the same token? The output indicates 
> that 100% of your key range is assigned to 10.111.1.141 (and 
> therefore 10.111.1.142 holds replicas only)
> 3/ maybe repair is being affected by above, but in my experience it can be 
> sensitive
>
> On 13 February 2012 13:06, Nikolay Kоvshov  wrote:
>> Hello everybody
>>
>> I have a very simple cluster containing 2 servers. Replication_factor = 2, 
>> Consistency_level of reads and writes = 1
>>
>> 10.111.1.141    datacenter1 rack1       Up     Normal  1.5 TB          
>> 100.00% vjpigMzv4KkX3x7z
>> 10.111.1.142    datacenter1 rack1       Up     Normal  1.41 TB         0.00% 
>>   聶jpigMzv4KkX3x7z
>>
>> Note the size please.
>>
>> Say, server1 cassandra dies and I restart it later. Hinted_handoff = 
>> enabled, auto_bootstrap = true
>>
>> During that time server2 received reads and writes. I want changes to be 
>> copied to server1 when it joins the cluster. As I have replication_factor 2, 
>> I suppose each data piece should be stored on both servers. 
>> Auto_bootstrapping doesn't seem to work that way - changed data doesn't 
>> migrate.
>>
>> I run nodetool repair and it is always killed by OOM. What else can I do to 
>> bring cluster to consistency?
>>
>> Thank you in advance


Re: How to bring cluster to consistency

2012-02-13 Thread Dominic Williams
Hi Nikolay,

Some points that may be useful:

1/ auto_bootstrap = true is used for telling a new node to join the ring
(the cluster). It has nothing to do with hinted handoff

2/ both of your nodes seem to be using the same token? The output indicates
that 100% of your key range is assigned to 10.111.1.141 (and
therefore 10.111.1.142 holds replicas only)

3/ maybe repair is being affected by above, but in my experience it can be
sensitive

On 13 February 2012 13:06, Nikolay Kоvshov  wrote:

> Hello everybody
>
> I have a very simple cluster containing 2 servers. Replication_factor = 2,
> Consistency_level of reads and writes = 1
>
> 10.111.1.141datacenter1 rack1   Up Normal  1.5 TB
>  100.00% vjpigMzv4KkX3x7z
> 10.111.1.142datacenter1 rack1   Up Normal  1.41 TB
> 0.00%   聶jpigMzv4KkX3x7z
>
> Note the size please.
>
> Say, server1 cassandra dies and I restart it later. Hinted_handoff =
> enabled, auto_bootstrap = true
>
> During that time server2 received reads and writes. I want changes to be
> copied to server1 when it joins the cluster. As I have replication_factor
> 2, I suppose each data piece should be stored on both servers.
> Auto_bootstrapping doesn't seem to work that way - changed data doesn't
> migrate.
>
> I run nodetool repair and it is always killed by OOM. What else can I do
> to bring cluster to consistency?
>
> Thank you in advance
>


How to bring cluster to consistency

2012-02-13 Thread Nikolay Kоvshov
Hello everybody

I have a very simple cluster containing 2 servers. Replication_factor = 2, 
Consistency_level of reads and writes = 1

10.111.1.141datacenter1 rack1   Up Normal  1.5 TB  100.00% 
vjpigMzv4KkX3x7z
10.111.1.142datacenter1 rack1   Up Normal  1.41 TB 0.00%   
聶jpigMzv4KkX3x7z

Note the size please.

Say, server1 cassandra dies and I restart it later. Hinted_handoff = enabled, 
auto_bootstrap = true

During that time server2 received reads and writes. I want changes to be copied 
to server1 when it joins the cluster. As I have replication_factor 2, I suppose 
each data piece should be stored on both servers. Auto_bootstrapping doesn't 
seem to work that way - changed data doesn't migrate.

I run nodetool repair and it is always killed by OOM. What else can I do to 
bring cluster to consistency? 

Thank you in advance


Secondary indexes and cardinality

2012-02-13 Thread Tiwari, Dushyant
Hi Cassandra Users,

Heard that indexing a field with high cardinality is not good. If we create a 
CF to store the index information like indexed field as key and the keys of 
original CF as cols in the row. Will there be any performance improvement? Is 
this the way secondary indexes are maintained?

Also this makes me think - Will there be any lose if we have many rows in a CF 
say 10 million?

Thanks,
Dushyant



--
NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do not constitute, advice 
within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and 
Consumer Protection Act. If you have received this communication in error, 
please destroy all electronic and paper copies and notify the sender 
immediately. Mistransmission is not intended to waive confidentiality or 
privilege. Morgan Stanley reserves the right, to the extent permitted under 
applicable law, to monitor electronic communications. This message is subject 
to terms available at the following link: 
http://www.morganstanley.com/disclaimers. If you cannot access these links, 
please notify us by reply message and we will send the contents to you. By 
messaging with Morgan Stanley you consent to the foregoing.


active/pending queue lengths

2012-02-13 Thread Franc Carter
Hi,

I've been looking at tpstats as various test queries run and I noticed
something I don't understand.

I have a two node cluster with RF=2 on which I run 4 parallel queries, each
job goes through a list of keys doing a multiget for 2 keys at a time. If
two of the queries go to one node and the other two go to a different node
then the pending queue on the node gets much longer than if they all go to
the one node.

I'm clearly missing something here as I would have expected the opposite

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


[RELEASE] Apache Cassandra 0.8.10 released

2012-02-13 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 0.8.10.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] for the 0.8 branch. Please
pay attention to the release notes[2] before upgrading and let us know[3] if
you were to encounter any problem.

Have fun!


[1]: http://goo.gl/V1M1q (CHANGES.txt)
[2]: http://goo.gl/AojHc (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: murmurhash partitioner

2012-02-13 Thread Sylvain Lebresne
https://issues.apache.org/jira/browse/CASSANDRA-3772

2012/2/13 Radim Kolar :
> Are there plans to write partitioner based on faster hash alg. instead of
> MD5? I did cassandra profiling and lot of time is spent inside MD5 function.


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:15 PM, Peter Schuller  wrote:

> > 2 Node cluster, 7.9GB of ram (ec2 m1.large)
> > RF=2
> > 11GB per node
> > Quorum reads
> > 122 million keys
> > heap size is 1867M (default from the AMI I am running)
> > I'm reading about 900k keys
>
> Ok, so basically a very significant portion of the data fits in page
> cache, but not all.
>

yep


>
> > As I was just going through cfstats - I noticed something I don't
> understand
> >
> > Key cache capacity: 906897
> > Key cache size: 906897
> >
> > I set the key cache to 2million, it's somehow got to a rather odd number
>
> You're on 1.0 +?


yep 1.07


> Nowadays there is code to actively make caches
> smaller if Cassandra detects that you seem to be running low on heap.
> Watch cassandra.log for messages to that effect (don't remember the
> exact message right now).
>
>
I just grep'd the logs and couldn't see anything that looked like that


>  --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


murmurhash partitioner

2012-02-13 Thread Radim Kolar
Are there plans to write partitioner based on faster hash alg. instead 
of MD5? I did cassandra profiling and lot of time is spent inside MD5 
function.


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:09 PM, Peter Schuller  wrote:

> > the servers spending >50% of the time in io-wait
>
> Note that I/O wait is not necessarily a good indicator, depending on
> situation. In particular if you have multiple drives, I/O wait can
> mostly be ignored. Similarly if you have non-trivial CPU usage in
> addition to disk I/O, it is also not a good indicator. I/O wait is
> essentially giving you the amount of time CPU:s spend doing nothing
> because the only processes that would otherwise be runnable are
> waiting on disk I/O. But even a single process waiting on disk I/O ->
> lots of I/O wait even if you have 24 drives.
>

Yep - user space cpu is <20% or much worse when the io-wait goes in to the
90's - looks a great deal like IO bottleknecks


>
> The per-disk % utilization is generally a much better indicator
> (assuming no hardware raid device, and assuming no SSD), along with
> the average queue size.
>

I doubt that figure is available sensibly in an ec2 instance


>
> >> In general, if you have queries that come in at some rate that
> >> is determined by outside sources (rather than by the time the last
> >> query took to execute),
> >
> > That's an interesting approach - is that likely to give close to optimal
> > performance ?
>
> I just mean that it all depends on the situation. If you have, for
> example, some N number of clients that are doing work as fast as they
> can, bottlenecking only on Cassandra, you're essentially saturating
> the Cassandra cluster no matter what (until the client/network becomes
> a bottleneck). Under such conditions (saturation) you generally never
> should expect good latencies.
>
> For most non-batch job production use-cases, you tend to have incoming
> requests driven by something external such as user behavior or
> automated systems not related to the Cassandra cluster. In this cases,
> you tend to have a certain amount of incoming requests at any given
> time that you must serve within a reasonable time frame, and that's
> where the question comes in of how much I/O you're doing in relation
> to maximum. For good latencies, you always want to be significantly
> below maximum - particularly when platter based disk I/O is involved.
>
> > That may well explain it - I'll have to think about what that means for
> our
> > use case as load will be extremely bursty
>
> To be clear though, even your typical un-bursty load is still bursty
> once you look at it at sufficient resolution, unless you have
> something specifically ensuring that it is entirely smooth. A
> completely random distribution over time for example would look very
> even on almost any graph you can imagine unless you have sub-second
> resolution, but would still exhibit un-evenness and have an affect on
> latency.
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> 2 Node cluster, 7.9GB of ram (ec2 m1.large)
> RF=2
> 11GB per node
> Quorum reads
> 122 million keys
> heap size is 1867M (default from the AMI I am running)
> I'm reading about 900k keys

Ok, so basically a very significant portion of the data fits in page
cache, but not all.

> As I was just going through cfstats - I noticed something I don't understand
>
>                 Key cache capacity: 906897
>                 Key cache size: 906897
>
> I set the key cache to 2million, it's somehow got to a rather odd number

You're on 1.0 +? Nowadays there is code to actively make caches
smaller if Cassandra detects that you seem to be running low on heap.
Watch cassandra.log for messages to that effect (don't remember the
exact message right now).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:00 PM, Peter Schuller  wrote:

> What is your total data size (nodetool info/nodetool ring) per node,
> your heap size, and the amount of memory on the system?
>

2 Node cluster, 7.9GB of ram (ec2 m1.large)
RF=2
11GB per node
Quorum reads
122 million keys
heap size is 1867M (default from the AMI I am running)
I'm reading about 900k keys

As I was just going through cfstats - I noticed something I don't understand

Key cache capacity: 906897
Key cache size: 906897

I set the key cache to 2million, it's somehow got to a rather odd number



>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> Yep, the readstage is backlogging consistently - but the thing I am trying
> to explain s why it is good sometimes in an environment that is pretty well
> controlled - other than being on ec2

So pending is constantly > 0? What are the clients? Is it batch jobs
or something similar where there is a feedback mechanism implicit in
that the higher latencies of the cluster are slowing down the clients,
thus reaching an equilibrium? Or are you just teetering on the edge,
dropping requests constantly?

Under typical live-traffic conditions, you never want to be running
with read stage pending backing up constantly. If on the other hand
these are batch jobs where throughput is the concern, it's not
relevant.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> the servers spending >50% of the time in io-wait

Note that I/O wait is not necessarily a good indicator, depending on
situation. In particular if you have multiple drives, I/O wait can
mostly be ignored. Similarly if you have non-trivial CPU usage in
addition to disk I/O, it is also not a good indicator. I/O wait is
essentially giving you the amount of time CPU:s spend doing nothing
because the only processes that would otherwise be runnable are
waiting on disk I/O. But even a single process waiting on disk I/O ->
lots of I/O wait even if you have 24 drives.

The per-disk % utilization is generally a much better indicator
(assuming no hardware raid device, and assuming no SSD), along with
the average queue size.

>> In general, if you have queries that come in at some rate that
>> is determined by outside sources (rather than by the time the last
>> query took to execute),
>
> That's an interesting approach - is that likely to give close to optimal
> performance ?

I just mean that it all depends on the situation. If you have, for
example, some N number of clients that are doing work as fast as they
can, bottlenecking only on Cassandra, you're essentially saturating
the Cassandra cluster no matter what (until the client/network becomes
a bottleneck). Under such conditions (saturation) you generally never
should expect good latencies.

For most non-batch job production use-cases, you tend to have incoming
requests driven by something external such as user behavior or
automated systems not related to the Cassandra cluster. In this cases,
you tend to have a certain amount of incoming requests at any given
time that you must serve within a reasonable time frame, and that's
where the question comes in of how much I/O you're doing in relation
to maximum. For good latencies, you always want to be significantly
below maximum - particularly when platter based disk I/O is involved.

> That may well explain it - I'll have to think about what that means for our
> use case as load will be extremely bursty

To be clear though, even your typical un-bursty load is still bursty
once you look at it at sufficient resolution, unless you have
something specifically ensuring that it is entirely smooth. A
completely random distribution over time for example would look very
even on almost any graph you can imagine unless you have sub-second
resolution, but would still exhibit un-evenness and have an affect on
latency.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:51 PM, Peter Schuller  wrote:

> For one thing, what does ReadStage's pending look like if you
> repeatedly run "nodetool tpstats" on these nodes? If you're simply
> bottlenecking on I/O on reads, that is the most easy and direct way to
> observe this empirically. If you're saturated, you'll see active close
> to maximum at all times, and pending racking up consistently. If
> you're just close, you'll likely see spikes sometimes.
>

Yep, the readstage is backlogging consistently - but the thing I am trying
to explain s why it is good sometimes in an environment that is pretty well
controlled - other than being on ec2




>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:48 PM, Peter Schuller  wrote:

> > Yep - I've been looking at these - I don't see anything in iostat/dstat
> etc
> > that point strongly to a problem. There is quite a bit of I/O load, but
> it
> > looks roughly uniform on slow and fast instances of the queries. The last
> > compaction ran 4 days ago - which was before I started seeing variable
> > performance
>
> [snip]
>
> > I now why it is slow - it's clearly I/O bound. I am trying to hunt down
> why
> > it is sometimes much faster even though I have (tried) to replicate  the
> > same conditions
>
> What does clearly I/O bound mean, and what is "quite a bit" of I/O
> load?


the servers spending >50% of the time in io-wait


> In general, if you have queries that come in at some rate that
> is determined by outside sources (rather than by the time the last
> query took to execute),


That's an interesting approach - is that likely to give close to optimal
performance ?


> you will typically either get more queries
> than your cluster can take, or fewer. If fewer, there is a
> non-trivially sized grey area where overall I/O throughput needed is
> lower than that available, but the closer you are to capacity the more
> often requests have to wait for other I/O to complete, for purely
> statistical reasons.
>
> If you're running close to maximum capacity, it would be expected that
> the variation in query latency is high.
>

That may well explain it - I'll have to think about what that means for our
use case as load will be extremely bursty


>
> That said, if you're seeing consistently bad latencies for a while
> where you sometimes see consistently good latencies, that sounds
> different but would hopefully be observable somehow.
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
What is your total data size (nodetool info/nodetool ring) per node,
your heap size, and the amount of memory on the system?


-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:49 PM, Peter Schuller  wrote:

> > I'm making an assumption . . .  I don't yet know enough about cassandra
> to
> > prove they are in the cache. I have my keycache set to 2 million, and am
> > only querying ~900,000 keys. so after the first time I'm assuming they
> are
> > in the cache.
>
> Note that the key cache only caches the index positions in the data
> file, and not the actual data. The key cache will only ever eliminate
> the I/O that would have been required to lookup the index entry; it
> doesn't help to eliminate seeking to get the data (but as usual, it
> may still be in the operating system page cache).
>

Yep - I haven't enabled row caches, my calculations at the moment indicate
that the hit-ratio won't be great - but I'll be testing that later


>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
For one thing, what does ReadStage's pending look like if you
repeatedly run "nodetool tpstats" on these nodes? If you're simply
bottlenecking on I/O on reads, that is the most easy and direct way to
observe this empirically. If you're saturated, you'll see active close
to maximum at all times, and pending racking up consistently. If
you're just close, you'll likely see spikes sometimes.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> I'm making an assumption . . .  I don't yet know enough about cassandra to
> prove they are in the cache. I have my keycache set to 2 million, and am
> only querying ~900,000 keys. so after the first time I'm assuming they are
> in the cache.

Note that the key cache only caches the index positions in the data
file, and not the actual data. The key cache will only ever eliminate
the I/O that would have been required to lookup the index entry; it
doesn't help to eliminate seeking to get the data (but as usual, it
may still be in the operating system page cache).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> Yep - I've been looking at these - I don't see anything in iostat/dstat etc
> that point strongly to a problem. There is quite a bit of I/O load, but it
> looks roughly uniform on slow and fast instances of the queries. The last
> compaction ran 4 days ago - which was before I started seeing variable
> performance

[snip]

> I now why it is slow - it's clearly I/O bound. I am trying to hunt down why
> it is sometimes much faster even though I have (tried) to replicate  the
> same conditions

What does clearly I/O bound mean, and what is "quite a bit" of I/O
load? In general, if you have queries that come in at some rate that
is determined by outside sources (rather than by the time the last
query took to execute), you will typically either get more queries
than your cluster can take, or fewer. If fewer, there is a
non-trivially sized grey area where overall I/O throughput needed is
lower than that available, but the closer you are to capacity the more
often requests have to wait for other I/O to complete, for purely
statistical reasons.

If you're running close to maximum capacity, it would be expected that
the variation in query latency is high.

That said, if you're seeing consistently bad latencies for a while
where you sometimes see consistently good latencies, that sounds
different but would hopefully be observable somehow.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
2012/2/13 R. Verlangen 

> I also noticed that, Cassandra appears to perform better under a continues
> load.
>
> Are you sure the rows you're quering are actually in the cache?
>

I'm making an assumption . . .  I don't yet know enough about cassandra to
prove they are in the cache. I have my keycache set to 2 million, and am
only querying ~900,000 keys. so after the first time I'm assuming they are
in the cache.

cheers


>
>
> 2012/2/13 Franc Carter 
>
>> 2012/2/13 R. Verlangen 
>>
>>> This is because of the "warm up" of Cassandra as it starts. On a start
>>> it will start fetching the rows that were cached: this will have to be
>>> loaded from the disk, as there is nothing in the cache yet. You can read
>>> more about this at
>>> http://wiki.apache.org/cassandra/LargeDataSetConsiderations
>>
>>
>> I actually has the opposite 'problem'. I have a pair of servers that have
>> been static since mid last week, but have seen performance vary
>> significantly (x10) for exactly the same query. I hypothesised it was
>> various caches so I shut down Cassandra, flushed the O/S buffer cache and
>> then bought it back up. The performance wasn't significantly different to
>> the pre-flush performance
>>
>> cheers
>>
>>
>>>
>>>
>>> 2012/2/13 Franc Carter 
>>>
 On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng wrote:

> **
>
> I think the keycaches and rowcahches are bothe persisted to disk when
> shutdown, and restored from disk when restart, then improve the 
> performance.
>

 Thanks - that would explain at least some of what I am seeing

 cheers


>
> 2012-02-13
> --
>  zhangcheng
> --
> *发件人:* Franc Carter
> *发送时间:* 2012-02-13  13:53:56
> *收件人:* user
> *抄送:*
> *主题:* keycache persisted to disk ?
>
> Hi,
>
> I am testing Cassandra on Amazon and finding performance can vary
> fairly wildly. I'm leaning towards it being an artifact of the AWS I/O
> system but have one other possibility.
>
> Are keycaches persisted to disk and restored on a clean shutdown and
> restart ?
>
> cheers
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
> 
>
> franc.car...@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 9236 9118
>
> Level 9, 80 Clarence St, Sydney NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>


 --

 *Franc Carter* | Systems architect | Sirca Ltd
  

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215


>>>
>>
>>
>> --
>>
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  
>>
>> franc.car...@sirca.org.au | www.sirca.org.au
>>
>> Tel: +61 2 9236 9118
>>
>> Level 9, 80 Clarence St, Sydney NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:21 PM, Peter Schuller  wrote:

> > I actually has the opposite 'problem'. I have a pair of servers that have
> > been static since mid last week, but have seen performance vary
> > significantly (x10) for exactly the same query. I hypothesised it was
> > various caches so I shut down Cassandra, flushed the O/S buffer cache and
> > then bought it back up. The performance wasn't significantly different to
> > the pre-flush performance
>
> I don't get this thread at all :)
>
> Why would restarting with clean caches be expected to *improve*
> performance?


I was expecting it to reduce performance due to cleaning of keycache and
O/S buffer cache - performance stayed roughly the same


> And why is key cache loading involved other than to delay
> start-up and hopefully pre-populating caches for better (not worse)
> performance?
>
> If you want to figure out why queries seem to be slow relative to
> normal, you'll need to monitor the behavior of the nodes. Look at disk
> I/O statistics primarily (everyone reading this running Cassandra who
> aren't intimately familiar with "iostat -x -k 1" should go and read up
> on it right away; make sure you understand the utilization and avg
> queue size columns), CPU usage, weather compaction is happening, etc.
>

Yep - I've been looking at these - I don't see anything in iostat/dstat etc
that point strongly to a problem. There is quite a bit of I/O load, but it
looks roughly uniform on slow and fast instances of the queries. The last
compaction ran 4 days ago - which was before I started seeing variable
performance



> One easy way to see sudden bursts of poor behavior is to be heavily
> reliant on cache, and then have sudden decreases in performance due to
> compaction evicting data from page cache while also generating more
> I/O.
>

Unlikely to be a cache issue - In one case an immediate second run of
exactly the same query performed significantly worse.


>
> But that's total speculation. It is also the case that you cannot
> expect consistent performance on EC2 and that might be it.
>

Variable performance from ec2 is my lead theory at the moment.


>
> But my #1 advise: Log into the node while it is being slow, and
> observe. Figure out what the bottleneck is. iostat, top, nodetool
> tpstats, nodetool netstats, nodetool compactionstats.
>

I now why it is slow - it's clearly I/O bound. I am trying to hunt down why
it is sometimes much faster even though I have (tried) to replicate  the
same conditions


>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> I actually has the opposite 'problem'. I have a pair of servers that have
> been static since mid last week, but have seen performance vary
> significantly (x10) for exactly the same query. I hypothesised it was
> various caches so I shut down Cassandra, flushed the O/S buffer cache and
> then bought it back up. The performance wasn't significantly different to
> the pre-flush performance

I don't get this thread at all :)

Why would restarting with clean caches be expected to *improve*
performance? And why is key cache loading involved other than to delay
start-up and hopefully pre-populating caches for better (not worse)
performance?

If you want to figure out why queries seem to be slow relative to
normal, you'll need to monitor the behavior of the nodes. Look at disk
I/O statistics primarily (everyone reading this running Cassandra who
aren't intimately familiar with "iostat -x -k 1" should go and read up
on it right away; make sure you understand the utilization and avg
queue size columns), CPU usage, weather compaction is happening, etc.

One easy way to see sudden bursts of poor behavior is to be heavily
reliant on cache, and then have sudden decreases in performance due to
compaction evicting data from page cache while also generating more
I/O.

But that's total speculation. It is also the case that you cannot
expect consistent performance on EC2 and that might be it.

But my #1 advise: Log into the node while it is being slow, and
observe. Figure out what the bottleneck is. iostat, top, nodetool
tpstats, nodetool netstats, nodetool compactionstats.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: keycache persisted to disk ?

2012-02-13 Thread R. Verlangen
I also noticed that, Cassandra appears to perform better under a continues
load.

Are you sure the rows you're quering are actually in the cache?

2012/2/13 Franc Carter 

> 2012/2/13 R. Verlangen 
>
>> This is because of the "warm up" of Cassandra as it starts. On a start it
>> will start fetching the rows that were cached: this will have to be loaded
>> from the disk, as there is nothing in the cache yet. You can read more
>> about this at
>> http://wiki.apache.org/cassandra/LargeDataSetConsiderations
>
>
> I actually has the opposite 'problem'. I have a pair of servers that have
> been static since mid last week, but have seen performance vary
> significantly (x10) for exactly the same query. I hypothesised it was
> various caches so I shut down Cassandra, flushed the O/S buffer cache and
> then bought it back up. The performance wasn't significantly different to
> the pre-flush performance
>
> cheers
>
>
>>
>>
>> 2012/2/13 Franc Carter 
>>
>>> On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng  wrote:
>>>
 **

 I think the keycaches and rowcahches are bothe persisted to disk when
 shutdown, and restored from disk when restart, then improve the 
 performance.

>>>
>>> Thanks - that would explain at least some of what I am seeing
>>>
>>> cheers
>>>
>>>

 2012-02-13
 --
  zhangcheng
 --
 *发件人:* Franc Carter
 *发送时间:* 2012-02-13  13:53:56
 *收件人:* user
 *抄送:*
 *主题:* keycache persisted to disk ?

 Hi,

 I am testing Cassandra on Amazon and finding performance can vary
 fairly wildly. I'm leaning towards it being an artifact of the AWS I/O
 system but have one other possibility.

 Are keycaches persisted to disk and restored on a clean shutdown and
 restart ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
 

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215


>>>
>>>
>>> --
>>>
>>> *Franc Carter* | Systems architect | Sirca Ltd
>>>  
>>>
>>> franc.car...@sirca.org.au | www.sirca.org.au
>>>
>>> Tel: +61 2 9236 9118
>>>
>>> Level 9, 80 Clarence St, Sydney NSW 2000
>>>
>>> PO Box H58, Australia Square, Sydney NSW 1215
>>>
>>>
>>
>
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  
>
> franc.car...@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 9236 9118
>
> Level 9, 80 Clarence St, Sydney NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>


Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
2012/2/13 R. Verlangen 

> This is because of the "warm up" of Cassandra as it starts. On a start it
> will start fetching the rows that were cached: this will have to be loaded
> from the disk, as there is nothing in the cache yet. You can read more
> about this at  http://wiki.apache.org/cassandra/LargeDataSetConsiderations
>


I actually has the opposite 'problem'. I have a pair of servers that have
been static since mid last week, but have seen performance vary
significantly (x10) for exactly the same query. I hypothesised it was
various caches so I shut down Cassandra, flushed the O/S buffer cache and
then bought it back up. The performance wasn't significantly different to
the pre-flush performance

cheers


>
>
> 2012/2/13 Franc Carter 
>
>> On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng  wrote:
>>
>>> **
>>>
>>> I think the keycaches and rowcahches are bothe persisted to disk when
>>> shutdown, and restored from disk when restart, then improve the performance.
>>>
>>
>> Thanks - that would explain at least some of what I am seeing
>>
>> cheers
>>
>>
>>>
>>> 2012-02-13
>>> --
>>>  zhangcheng
>>> --
>>> *发件人:* Franc Carter
>>> *发送时间:* 2012-02-13  13:53:56
>>> *收件人:* user
>>> *抄送:*
>>> *主题:* keycache persisted to disk ?
>>>
>>> Hi,
>>>
>>> I am testing Cassandra on Amazon and finding performance can vary fairly
>>> wildly. I'm leaning towards it being an artifact of the AWS I/O system but
>>> have one other possibility.
>>>
>>> Are keycaches persisted to disk and restored on a clean shutdown and
>>> restart ?
>>>
>>> cheers
>>>
>>> --
>>>
>>> *Franc Carter* | Systems architect | Sirca Ltd
>>> 
>>>
>>> franc.car...@sirca.org.au | www.sirca.org.au
>>>
>>> Tel: +61 2 9236 9118
>>>
>>> Level 9, 80 Clarence St, Sydney NSW 2000
>>>
>>> PO Box H58, Australia Square, Sydney NSW 1215
>>>
>>>
>>
>>
>> --
>>
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  
>>
>> franc.car...@sirca.org.au | www.sirca.org.au
>>
>> Tel: +61 2 9236 9118
>>
>> Level 9, 80 Clarence St, Sydney NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: keycache persisted to disk ?

2012-02-13 Thread R. Verlangen
This is because of the "warm up" of Cassandra as it starts. On a start it
will start fetching the rows that were cached: this will have to be loaded
from the disk, as there is nothing in the cache yet. You can read more
about this at  http://wiki.apache.org/cassandra/LargeDataSetConsiderations

2012/2/13 Franc Carter 

> On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng  wrote:
>
>> **
>>
>> I think the keycaches and rowcahches are bothe persisted to disk when
>> shutdown, and restored from disk when restart, then improve the performance.
>>
>
> Thanks - that would explain at least some of what I am seeing
>
> cheers
>
>
>>
>> 2012-02-13
>> --
>>  zhangcheng
>> --
>> *发件人:* Franc Carter
>> *发送时间:* 2012-02-13  13:53:56
>> *收件人:* user
>> *抄送:*
>> *主题:* keycache persisted to disk ?
>>
>> Hi,
>>
>> I am testing Cassandra on Amazon and finding performance can vary fairly
>> wildly. I'm leaning towards it being an artifact of the AWS I/O system but
>> have one other possibility.
>>
>> Are keycaches persisted to disk and restored on a clean shutdown and
>> restart ?
>>
>> cheers
>>
>> --
>>
>> *Franc Carter* | Systems architect | Sirca Ltd
>> 
>>
>> franc.car...@sirca.org.au | www.sirca.org.au
>>
>> Tel: +61 2 9236 9118
>>
>> Level 9, 80 Clarence St, Sydney NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  
>
> franc.car...@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 9236 9118
>
> Level 9, 80 Clarence St, Sydney NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>