Re: get_slice slow

2010-08-24 Thread Artie Copeland
Have you tried using a super column, it seems that having a row with over
100K columns and growing would be alot for cassandra to deserialize?  what
is iostat and jmeter telling you? it would be interesting to see that data.
 also what are you using for you key or row caching?  do you need to use a
quorum consistency as that can slow down reads as well, can you use a lower
consistency level?

Artie
On Tue, Aug 24, 2010 at 9:14 PM, B. Todd Burruss  wrote:

> i am using get_slice to pull columns from a row to emulate a queue.  column
> names are TimeUUID and the values are small, < 32 bytes.  simple
> ColumnFamily.
>
> i am using SlicePredicate like this to pull the first ("oldest") column in
> the row:
>
>SlicePredicate predicate = new SlicePredicate();
>predicate.setSlice_range(new SliceRange(new byte[] {}, new byte[]
> {}, false, 1));
>
>get_slice(rowKey, colParent, predicate, QUORUM);
>
> once i get the column i remove it.  so there are a lot of gets and mutates,
> leaving lots of deleted columns.
>
> get_slice starts off performing just fine, but then falls off dramatically
> as the number of columns grows.  at its peak there are 100,000 columns and
> get_slice is taking over 100ms to return.
>
> i am running a single instance of cassandra 0.7 on localhost, default
> config.  i've done some googling and can't find any tweaks or tuning
> suggestions specific to get_slice.  i already know about separating
> commitlog and data, watching iostat, GC, etc.
>
> any low hanging tuning fruit anyone can think of?  in 0.6 i recall an index
> for columns, maybe that is what i need?
>
> thx
>



-- 
http://yeslinux.org
http://yestech.org


script to manually generate tokens

2010-08-22 Thread Artie Copeland
i created a simple python script to ask for a cluster size and then generate
tokens for each node

http://github.com/yestech/yessandbox/blob/master/cassandra-gen-tokens.py

it is derived from ben black's cassandra talk:

http://www.slideshare.net/benjaminblack/cassandra-summit-2010-operations-troubleshooting-intro

i can add it to the wiki if people think it is valuable.

-- 
http://yeslinux.org
http://yestech.org


cache sizes using percentages

2010-08-17 Thread Artie Copeland
if i set a key cache size of 100% the way i understand how that works is:

- the cache is not write through, but read through
- a key gets added to the cache on the first read if not already available
- the size of the cache will always increase for ever item read.  so if you
have 100mil items your key cache will grow to 100mil

Here are my questions:

if that is the case then what happens if you only have enough mem to store
10mil items in your key cache?
do you lose the other 90% how is it determined what is removed?
will the server keep adding til it gets OOM?
if you add a row cache as well how does that affect your percentage?
if there a priority between the cache? or are they independant so both will
try to be satisfied which would result in an OOM?

thanx,
artie

-- 
http://yeslinux.org
http://yestech.org


move data between clusters

2010-08-17 Thread Artie Copeland
what is the best way to move data between clusters.  we currently have a 4
node prod cluster with 80G of data and want to move it to a dev env with 3
nodes.  we have plenty of disk were looking into nodetool snapshot, but it
look like that wont work because of the system tables.  sstabletojson does
look like it would work as it would miss the index files.  am i missing
something?  have others tried to do the same and been successful.

thanx
artie

-- 
http://yeslinux.org
http://yestech.org


Re: backport of pre cache load

2010-08-09 Thread Artie Copeland
No we aren't caching 100%, we cache over 20 - 30 million which only starts
to get a high hit rate overtime so to have a useful cache can take over a
week of running.  We would love to store the complete CF in memory but know
know of a server that can hold that much data in memory while still being
commodity.  Our data set is currently over 100GB.

On Fri, Aug 6, 2010 at 5:54 PM, Jonathan Ellis  wrote:

> are you caching 100% of the CF?
>
> if not this is not super useful.
>
> On Fri, Aug 6, 2010 at 7:10 PM, Artie Copeland 
> wrote:
> > would it be possible to backport the 0.7 feature, the ability to safe and
> > preload row caches after a restart.  i think that is a very nice and
> > important feature that would help users with very large caches, that take
> a
> > long time to get the proper hot set.  for example we can get pretty good
> > cache row cache hits if we run the servers for a month or more as the
> data
> > tends to settle down.
> > --
> > http://yeslinux.org
> > http://yestech.org
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 
http://yeslinux.org
http://yestech.org


Re: row cache during bootstrap

2010-08-09 Thread Artie Copeland
On Sun, Aug 8, 2010 at 5:24 AM, aaron morton wrote:

> Not sure how feasible it is or if it's planned. But it would probably
> require that the nodes are able so share the state of their row cache so as
> to know which parts to warm. Otherwise it sounds like you're assuming the
> node can hold the entire data set in memory.
>
> Im not assuming the node can hold the entire data set in cassandra in
memory, if thats what you meant. I was thinking of sharing the state of the
row cache, but only those keys that are being moved for the token.  the
other keys can stay hidden to the node.


> If you know in your application when you would like data to be in the
> cache, you can send a query like get_range_slices to the cluster and ask for
> 0 columns. That will warm the row cache for the keys it hits.
>

This is a tuff one as our row cache is over 20 million and takes a while to
get a large hit ratio. so while we try to preload it is taking requests.  If
it were possible to bring up a node that doesnt announce its availability to
the cluster that would help us manually warm the cache.  I know this feature
is in the issue tracker currently, but didnt look like it would come out
anytime before 0.8.

>
> I have heard it mentioned that the coordinator node will take action to
> when one node is considered to be running slow. So it may be able to work
> around the new node until it gets warmed up.
>

That is interesting i haven't heard that one.  I think with the parallel
reads that are happening it makes sense that it would be possible.  That is
unless the data is local.  I believe in that case it always prefers to read
local vs over the network, so if the local machine is the slow node that
wouldnt help.

>
> Are you adding nodes often?
>
Currently not that often.  The main issue is we have very stringent latency
requirements and anything that would affect those we have to understand the
worst case cost to see if we can avoid them.

>
> Aaron
>
> On 7 Aug 2010, at 11:17, Artie Copeland wrote:
>
> the way i understand how row caches work is that each node has an
> independent cache, in that they do not push there cache contents with other
> nodes.  if that the case is it also true that when a new node is added to
> the cluster it has to build up its own cache.  if thats the case i see that
> as a possible performance bottle neck once the node starts to accept
> requests.  since there is no way i know of to warm the cache without adding
> the node to the cluster.  would it be infeasible to have part of the
> bootstrap process not only stream data from nodes but also cached rows that
> are associated with those same keys?  that would allow the new nodes to be
> able to provide the best performance once the bootstrap process finishes.
>
> --
> http://yeslinux.org
> http://yestech.org
>
>
>


-- 
http://yeslinux.org
http://yestech.org


row cache during bootstrap

2010-08-06 Thread Artie Copeland
the way i understand how row caches work is that each node has an
independent cache, in that they do not push there cache contents with other
nodes.  if that the case is it also true that when a new node is added to
the cluster it has to build up its own cache.  if thats the case i see that
as a possible performance bottle neck once the node starts to accept
requests.  since there is no way i know of to warm the cache without adding
the node to the cluster.  would it be infeasible to have part of the
bootstrap process not only stream data from nodes but also cached rows that
are associated with those same keys?  that would allow the new nodes to be
able to provide the best performance once the bootstrap process finishes.

-- 
http://yeslinux.org
http://yestech.org


backport of pre cache load

2010-08-06 Thread Artie Copeland
would it be possible to backport the 0.7 feature, the ability to safe and
preload row caches after a restart.  i think that is a very nice and
important feature that would help users with very large caches, that take a
long time to get the proper hot set.  for example we can get pretty good
cache row cache hits if we run the servers for a month or more as the data
tends to settle down.

-- 
http://yeslinux.org
http://yestech.org


Re: when should new nodes be added to a cluster

2010-08-03 Thread Artie Copeland
Thanx for the insight

On Mon, Aug 2, 2010 at 4:48 PM, Benjamin Black  wrote:

> you have insufficient i/o bandwidth and are seeing reads suffer due to
> competition from memtable flushes and compaction.  adding additional
> nodes will help some, but i recommend increasing the disk i/o
> bandwidth, regardless.
>
>
> b
>
> On Mon, Aug 2, 2010 at 11:47 AM, Artie Copeland 
> wrote:
> > i have a question on what are the signs from cassandra that new nodes
> should
> > be added to the cluster.  We are currently seeing long read times from
> the
> > one node that has about 70GB of data with 60GB in one column family.  we
> are
> > using a replication factor of 3.  I have tracked down the slow to occur
> when
> > either row-read-stage or message-deserializer-pool is high like atleast
> > 4000.  my systems are 16core, 3 TB, 48GB mem servers.  we would like to
> be
> > able to use more of the server than just 70GB.
> > The system is a realtime system that needs to scale quite large.  Our
> > current heap size is 25GB and are getting atleast 50% row cache hit
> rates.
> >  Does it seem strange that cassandra is not able to handle the work load?
> >  We perform multislice gets when reading similar to twissandra does.
>  this
> > is to cut down on the network ops.  Looking at iostat it doesnt appear to
> > have alot of queued reads.
> > What are others seeing when they have to add new nodes?  What data sizes
> are
> > they seeing?  This is needed so we can plan our growth and server
> purchase
> > strategy.
> > thanx
> > Artie
> >
> > --
> > http://yeslinux.org
> > http://yestech.org
> >
>



-- 
http://yeslinux.org
http://yestech.org


Re: when should new nodes be added to a cluster

2010-08-02 Thread Artie Copeland
On Mon, Aug 2, 2010 at 2:39 PM, Aaron Morton wrote:

> You may need to provide some more information on how many reads your
> sending to the cluster. Also...
>
> How many nodes do you have in the cluster ?
>

We have a cluster of 4 nodes.


> When you are seeing high response times on one node, what's the load like
> on the others ?
>

They are low and until recently when that node was removed performance would
increase, but as of today the problem appeared to just move to another node
once the original faulty node was removed.


> Is the data load evenly distributed around the cluster ?
>

No it is not it look like this:

Address   Status Load  Range
 Ring

153186065170709351569621555259205461067
10.4.45.22Up 60.6 GB
23543694856340775179323589033850348191 |<--|
10.4.45.21Up 58.67 GB
 64044280785277646901574566535858757214 |   |
10.4.44.22Down   76.27 GB
 145455238521487150744455174232451506694|   |
10.4.44.21Up 67.45 GB
 153186065170709351569621555259205461067|-->|

the down node is the original culprit.  then once that was down it moved
to 10.4.44.21.  Or setup is using rackaware with the 2 nodes in each switch.
 we tried to use 2 nics one for thrift and one for gossip but couldnt get
that working.  so now we just use on nic for all traffic.

Are your clients connecting to different nodes in the cluster ?
>
> Yes we use Pelops and have all 4 nodes in the pool


> Perhaps that node is somehow out of sync with the others...
>
Dont understand what you mean?


> Anything odd happened in the cluster recently, such as one node going down
> ?
>
Yes the node is a test server so it has gone down to update jvm and
storage-conf setting but only for a short amount of time.


> When was the last time you ran repair?
>
We just ran it today and it didn't have any change.  Almost immediately the
row-read-start goes over 4000

heres what iostats looks like:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.340.001.00   33.930.00   61.73

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sdb 335.50 0.00 70.50  0.00  3248.00 0.0046.07
0.78   11.01   7.40  52.20
sdc 330.00 0.00 70.00  0.50  3180.00 4.0045.16
2.10   29.65  13.09  92.30
sdd 310.00 0.00 93.50  0.50  3040.00 8.0032.43
 33.78  350.13  10.64 100.05
dm-0  0.00 0.00 97.00  0.50  9712.00 4.0099.65
 38.28  384.30  10.26 100.05

Drives are 3-1TB (sdb - sdd) drives stripped using lvm raid 0 and 1TB (sda)
for commit log

Anything else i can provide that might help diagnose.

Thanx
Artie

>
> Aaron
>
>
> On 03 Aug, 2010,at 06:47 AM, Artie Copeland 
> wrote:
>
> i have a question on what are the signs from cassandra that new nodes
> should be added to the cluster.  We are currently seeing long read times
> from the one node that has about 70GB of data with 60GB in one column
> family.  we are using a replication factor of 3.  I have tracked down the
> slow to occur when either row-read-stage or message-deserializer-pool is
> high like atleast 4000.  my systems are 16core, 3 TB, 48GB mem servers.  we
> would like to be able to use more of the server than just 70GB.
>
> The system is a realtime system that needs to scale quite large.  Our
> current heap size is 25GB and are getting atleast 50% row cache hit rates.
>  Does it seem strange that cassandra is not able to handle the work load?
>  We perform multislice gets when reading similar to twissandra does.  this
> is to cut down on the network ops.  Looking at iostat it doesnt appear to
> have alot of queued reads.
>
> What are others seeing when they have to add new nodes?  What data sizes
> are they seeing?  This is needed so we can plan our growth and server
> purchase strategy.
>
> thanx
> Artie
>
> --
> http://yeslinux.org <http://yeslinuxorg>
> http://yestech.org
>
>


-- 
http://yeslinux.org
http://yestech.org


when should new nodes be added to a cluster

2010-08-02 Thread Artie Copeland
i have a question on what are the signs from cassandra that new nodes should
be added to the cluster.  We are currently seeing long read times from the
one node that has about 70GB of data with 60GB in one column family.  we are
using a replication factor of 3.  I have tracked down the slow to occur when
either row-read-stage or message-deserializer-pool is high like atleast
4000.  my systems are 16core, 3 TB, 48GB mem servers.  we would like to be
able to use more of the server than just 70GB.

The system is a realtime system that needs to scale quite large.  Our
current heap size is 25GB and are getting atleast 50% row cache hit rates.
 Does it seem strange that cassandra is not able to handle the work load?
 We perform multislice gets when reading similar to twissandra does.  this
is to cut down on the network ops.  Looking at iostat it doesnt appear to
have alot of queued reads.

What are others seeing when they have to add new nodes?  What data sizes are
they seeing?  This is needed so we can plan our growth and server purchase
strategy.

thanx
Artie

-- 
http://yeslinux.org
http://yestech.org


Re: live nodes list in ring

2010-07-13 Thread Artie Copeland
Benjamin,

Yes i have seen this when adding a new node into the cluster.  the new node
doesnt see the complete ring through nodetool, but the strange part is that
looking at the ring through jconsole shows the complete ring.  it as if
there is a big in nodetool publishing the actual ring.  has anyone seen that
scenario.  while in this situation is does appear that the cluster is
functioning correctly with replicating data, just cant trust nodetools ring
information.


Artie

2010/6/30 Benjamin Black 

> Does this happen after you have changed the ring topology, especially
> adding nodes?
>
> 2010/6/30 Stephen Hamer :
> > When this happens to me I have to do a full cluster restart. Even doing a
> > rolling restart across the cluster doesn't seem to fix them, all of the
> > nodes need to be stopped at the same time. After bringing everything back
> up
> > the ring is correct.
> >
> >
> >
> > Does anyone know how a cluster gets into this state?
> >
> >
> >
> > Stephen
> >
> >
> >
> > From: aaron morton [mailto:aa...@thelastpickle.com]
> > Sent: Wednesday, June 30, 2010 1:42 PM
> > To: user@cassandra.apache.org
> > Cc: 'huzhonghua'; 'GongJianTao(宫建涛)'
> > Subject: Re: live nodes list in ring
> >
> >
> >
> > At start up do you see log lines like this
> >
> >
> >
> > Gossiper.java (line 576) Node /192.168.34.30 is now part of the cluster
> >
> >
> >
> > Are all the nodes listed?
> >
> >
> >
> > aaron
> >
> > On 30 Jun 2010, at 22:50, 王一锋 wrote:
> >
> > Hi,
> >
> >
> >
> > In a cassandra cluster, when issueing ring command on every nodes, some
> can
> > show all nodes in the cluster but some can only show some other nodes.
> >
> > All nodes share the same seed list.
> >
> > And even some of the nodes in the seed list have this problem.
> >
> > Restarting the problematic nodes won't solve it.
> >
> > Try closing firewalls with following commands
> >
> >
> >
> > service iptables stop
> >
> >
> >
> > Still won't work.
> >
> >
> >
> > Anyone got a clue?
> >
> >
> >
> > Thanks very much.
> >
> >
> >
> > Yifeng
> >
> >
>