Re: get_slice slow
Have you tried using a super column, it seems that having a row with over 100K columns and growing would be alot for cassandra to deserialize? what is iostat and jmeter telling you? it would be interesting to see that data. also what are you using for you key or row caching? do you need to use a quorum consistency as that can slow down reads as well, can you use a lower consistency level? Artie On Tue, Aug 24, 2010 at 9:14 PM, B. Todd Burruss wrote: > i am using get_slice to pull columns from a row to emulate a queue. column > names are TimeUUID and the values are small, < 32 bytes. simple > ColumnFamily. > > i am using SlicePredicate like this to pull the first ("oldest") column in > the row: > >SlicePredicate predicate = new SlicePredicate(); >predicate.setSlice_range(new SliceRange(new byte[] {}, new byte[] > {}, false, 1)); > >get_slice(rowKey, colParent, predicate, QUORUM); > > once i get the column i remove it. so there are a lot of gets and mutates, > leaving lots of deleted columns. > > get_slice starts off performing just fine, but then falls off dramatically > as the number of columns grows. at its peak there are 100,000 columns and > get_slice is taking over 100ms to return. > > i am running a single instance of cassandra 0.7 on localhost, default > config. i've done some googling and can't find any tweaks or tuning > suggestions specific to get_slice. i already know about separating > commitlog and data, watching iostat, GC, etc. > > any low hanging tuning fruit anyone can think of? in 0.6 i recall an index > for columns, maybe that is what i need? > > thx > -- http://yeslinux.org http://yestech.org
script to manually generate tokens
i created a simple python script to ask for a cluster size and then generate tokens for each node http://github.com/yestech/yessandbox/blob/master/cassandra-gen-tokens.py it is derived from ben black's cassandra talk: http://www.slideshare.net/benjaminblack/cassandra-summit-2010-operations-troubleshooting-intro i can add it to the wiki if people think it is valuable. -- http://yeslinux.org http://yestech.org
cache sizes using percentages
if i set a key cache size of 100% the way i understand how that works is: - the cache is not write through, but read through - a key gets added to the cache on the first read if not already available - the size of the cache will always increase for ever item read. so if you have 100mil items your key cache will grow to 100mil Here are my questions: if that is the case then what happens if you only have enough mem to store 10mil items in your key cache? do you lose the other 90% how is it determined what is removed? will the server keep adding til it gets OOM? if you add a row cache as well how does that affect your percentage? if there a priority between the cache? or are they independant so both will try to be satisfied which would result in an OOM? thanx, artie -- http://yeslinux.org http://yestech.org
move data between clusters
what is the best way to move data between clusters. we currently have a 4 node prod cluster with 80G of data and want to move it to a dev env with 3 nodes. we have plenty of disk were looking into nodetool snapshot, but it look like that wont work because of the system tables. sstabletojson does look like it would work as it would miss the index files. am i missing something? have others tried to do the same and been successful. thanx artie -- http://yeslinux.org http://yestech.org
Re: backport of pre cache load
No we aren't caching 100%, we cache over 20 - 30 million which only starts to get a high hit rate overtime so to have a useful cache can take over a week of running. We would love to store the complete CF in memory but know know of a server that can hold that much data in memory while still being commodity. Our data set is currently over 100GB. On Fri, Aug 6, 2010 at 5:54 PM, Jonathan Ellis wrote: > are you caching 100% of the CF? > > if not this is not super useful. > > On Fri, Aug 6, 2010 at 7:10 PM, Artie Copeland > wrote: > > would it be possible to backport the 0.7 feature, the ability to safe and > > preload row caches after a restart. i think that is a very nice and > > important feature that would help users with very large caches, that take > a > > long time to get the proper hot set. for example we can get pretty good > > cache row cache hits if we run the servers for a month or more as the > data > > tends to settle down. > > -- > > http://yeslinux.org > > http://yestech.org > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > -- http://yeslinux.org http://yestech.org
Re: row cache during bootstrap
On Sun, Aug 8, 2010 at 5:24 AM, aaron morton wrote: > Not sure how feasible it is or if it's planned. But it would probably > require that the nodes are able so share the state of their row cache so as > to know which parts to warm. Otherwise it sounds like you're assuming the > node can hold the entire data set in memory. > > Im not assuming the node can hold the entire data set in cassandra in memory, if thats what you meant. I was thinking of sharing the state of the row cache, but only those keys that are being moved for the token. the other keys can stay hidden to the node. > If you know in your application when you would like data to be in the > cache, you can send a query like get_range_slices to the cluster and ask for > 0 columns. That will warm the row cache for the keys it hits. > This is a tuff one as our row cache is over 20 million and takes a while to get a large hit ratio. so while we try to preload it is taking requests. If it were possible to bring up a node that doesnt announce its availability to the cluster that would help us manually warm the cache. I know this feature is in the issue tracker currently, but didnt look like it would come out anytime before 0.8. > > I have heard it mentioned that the coordinator node will take action to > when one node is considered to be running slow. So it may be able to work > around the new node until it gets warmed up. > That is interesting i haven't heard that one. I think with the parallel reads that are happening it makes sense that it would be possible. That is unless the data is local. I believe in that case it always prefers to read local vs over the network, so if the local machine is the slow node that wouldnt help. > > Are you adding nodes often? > Currently not that often. The main issue is we have very stringent latency requirements and anything that would affect those we have to understand the worst case cost to see if we can avoid them. > > Aaron > > On 7 Aug 2010, at 11:17, Artie Copeland wrote: > > the way i understand how row caches work is that each node has an > independent cache, in that they do not push there cache contents with other > nodes. if that the case is it also true that when a new node is added to > the cluster it has to build up its own cache. if thats the case i see that > as a possible performance bottle neck once the node starts to accept > requests. since there is no way i know of to warm the cache without adding > the node to the cluster. would it be infeasible to have part of the > bootstrap process not only stream data from nodes but also cached rows that > are associated with those same keys? that would allow the new nodes to be > able to provide the best performance once the bootstrap process finishes. > > -- > http://yeslinux.org > http://yestech.org > > > -- http://yeslinux.org http://yestech.org
row cache during bootstrap
the way i understand how row caches work is that each node has an independent cache, in that they do not push there cache contents with other nodes. if that the case is it also true that when a new node is added to the cluster it has to build up its own cache. if thats the case i see that as a possible performance bottle neck once the node starts to accept requests. since there is no way i know of to warm the cache without adding the node to the cluster. would it be infeasible to have part of the bootstrap process not only stream data from nodes but also cached rows that are associated with those same keys? that would allow the new nodes to be able to provide the best performance once the bootstrap process finishes. -- http://yeslinux.org http://yestech.org
backport of pre cache load
would it be possible to backport the 0.7 feature, the ability to safe and preload row caches after a restart. i think that is a very nice and important feature that would help users with very large caches, that take a long time to get the proper hot set. for example we can get pretty good cache row cache hits if we run the servers for a month or more as the data tends to settle down. -- http://yeslinux.org http://yestech.org
Re: when should new nodes be added to a cluster
Thanx for the insight On Mon, Aug 2, 2010 at 4:48 PM, Benjamin Black wrote: > you have insufficient i/o bandwidth and are seeing reads suffer due to > competition from memtable flushes and compaction. adding additional > nodes will help some, but i recommend increasing the disk i/o > bandwidth, regardless. > > > b > > On Mon, Aug 2, 2010 at 11:47 AM, Artie Copeland > wrote: > > i have a question on what are the signs from cassandra that new nodes > should > > be added to the cluster. We are currently seeing long read times from > the > > one node that has about 70GB of data with 60GB in one column family. we > are > > using a replication factor of 3. I have tracked down the slow to occur > when > > either row-read-stage or message-deserializer-pool is high like atleast > > 4000. my systems are 16core, 3 TB, 48GB mem servers. we would like to > be > > able to use more of the server than just 70GB. > > The system is a realtime system that needs to scale quite large. Our > > current heap size is 25GB and are getting atleast 50% row cache hit > rates. > > Does it seem strange that cassandra is not able to handle the work load? > > We perform multislice gets when reading similar to twissandra does. > this > > is to cut down on the network ops. Looking at iostat it doesnt appear to > > have alot of queued reads. > > What are others seeing when they have to add new nodes? What data sizes > are > > they seeing? This is needed so we can plan our growth and server > purchase > > strategy. > > thanx > > Artie > > > > -- > > http://yeslinux.org > > http://yestech.org > > > -- http://yeslinux.org http://yestech.org
Re: when should new nodes be added to a cluster
On Mon, Aug 2, 2010 at 2:39 PM, Aaron Morton wrote: > You may need to provide some more information on how many reads your > sending to the cluster. Also... > > How many nodes do you have in the cluster ? > We have a cluster of 4 nodes. > When you are seeing high response times on one node, what's the load like > on the others ? > They are low and until recently when that node was removed performance would increase, but as of today the problem appeared to just move to another node once the original faulty node was removed. > Is the data load evenly distributed around the cluster ? > No it is not it look like this: Address Status Load Range Ring 153186065170709351569621555259205461067 10.4.45.22Up 60.6 GB 23543694856340775179323589033850348191 |<--| 10.4.45.21Up 58.67 GB 64044280785277646901574566535858757214 | | 10.4.44.22Down 76.27 GB 145455238521487150744455174232451506694| | 10.4.44.21Up 67.45 GB 153186065170709351569621555259205461067|-->| the down node is the original culprit. then once that was down it moved to 10.4.44.21. Or setup is using rackaware with the 2 nodes in each switch. we tried to use 2 nics one for thrift and one for gossip but couldnt get that working. so now we just use on nic for all traffic. Are your clients connecting to different nodes in the cluster ? > > Yes we use Pelops and have all 4 nodes in the pool > Perhaps that node is somehow out of sync with the others... > Dont understand what you mean? > Anything odd happened in the cluster recently, such as one node going down > ? > Yes the node is a test server so it has gone down to update jvm and storage-conf setting but only for a short amount of time. > When was the last time you ran repair? > We just ran it today and it didn't have any change. Almost immediately the row-read-start goes over 4000 heres what iostats looks like: avg-cpu: %user %nice %system %iowait %steal %idle 3.340.001.00 33.930.00 61.73 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 335.50 0.00 70.50 0.00 3248.00 0.0046.07 0.78 11.01 7.40 52.20 sdc 330.00 0.00 70.00 0.50 3180.00 4.0045.16 2.10 29.65 13.09 92.30 sdd 310.00 0.00 93.50 0.50 3040.00 8.0032.43 33.78 350.13 10.64 100.05 dm-0 0.00 0.00 97.00 0.50 9712.00 4.0099.65 38.28 384.30 10.26 100.05 Drives are 3-1TB (sdb - sdd) drives stripped using lvm raid 0 and 1TB (sda) for commit log Anything else i can provide that might help diagnose. Thanx Artie > > Aaron > > > On 03 Aug, 2010,at 06:47 AM, Artie Copeland > wrote: > > i have a question on what are the signs from cassandra that new nodes > should be added to the cluster. We are currently seeing long read times > from the one node that has about 70GB of data with 60GB in one column > family. we are using a replication factor of 3. I have tracked down the > slow to occur when either row-read-stage or message-deserializer-pool is > high like atleast 4000. my systems are 16core, 3 TB, 48GB mem servers. we > would like to be able to use more of the server than just 70GB. > > The system is a realtime system that needs to scale quite large. Our > current heap size is 25GB and are getting atleast 50% row cache hit rates. > Does it seem strange that cassandra is not able to handle the work load? > We perform multislice gets when reading similar to twissandra does. this > is to cut down on the network ops. Looking at iostat it doesnt appear to > have alot of queued reads. > > What are others seeing when they have to add new nodes? What data sizes > are they seeing? This is needed so we can plan our growth and server > purchase strategy. > > thanx > Artie > > -- > http://yeslinux.org <http://yeslinuxorg> > http://yestech.org > > -- http://yeslinux.org http://yestech.org
when should new nodes be added to a cluster
i have a question on what are the signs from cassandra that new nodes should be added to the cluster. We are currently seeing long read times from the one node that has about 70GB of data with 60GB in one column family. we are using a replication factor of 3. I have tracked down the slow to occur when either row-read-stage or message-deserializer-pool is high like atleast 4000. my systems are 16core, 3 TB, 48GB mem servers. we would like to be able to use more of the server than just 70GB. The system is a realtime system that needs to scale quite large. Our current heap size is 25GB and are getting atleast 50% row cache hit rates. Does it seem strange that cassandra is not able to handle the work load? We perform multislice gets when reading similar to twissandra does. this is to cut down on the network ops. Looking at iostat it doesnt appear to have alot of queued reads. What are others seeing when they have to add new nodes? What data sizes are they seeing? This is needed so we can plan our growth and server purchase strategy. thanx Artie -- http://yeslinux.org http://yestech.org
Re: live nodes list in ring
Benjamin, Yes i have seen this when adding a new node into the cluster. the new node doesnt see the complete ring through nodetool, but the strange part is that looking at the ring through jconsole shows the complete ring. it as if there is a big in nodetool publishing the actual ring. has anyone seen that scenario. while in this situation is does appear that the cluster is functioning correctly with replicating data, just cant trust nodetools ring information. Artie 2010/6/30 Benjamin Black > Does this happen after you have changed the ring topology, especially > adding nodes? > > 2010/6/30 Stephen Hamer : > > When this happens to me I have to do a full cluster restart. Even doing a > > rolling restart across the cluster doesn't seem to fix them, all of the > > nodes need to be stopped at the same time. After bringing everything back > up > > the ring is correct. > > > > > > > > Does anyone know how a cluster gets into this state? > > > > > > > > Stephen > > > > > > > > From: aaron morton [mailto:aa...@thelastpickle.com] > > Sent: Wednesday, June 30, 2010 1:42 PM > > To: user@cassandra.apache.org > > Cc: 'huzhonghua'; 'GongJianTao(宫建涛)' > > Subject: Re: live nodes list in ring > > > > > > > > At start up do you see log lines like this > > > > > > > > Gossiper.java (line 576) Node /192.168.34.30 is now part of the cluster > > > > > > > > Are all the nodes listed? > > > > > > > > aaron > > > > On 30 Jun 2010, at 22:50, 王一锋 wrote: > > > > Hi, > > > > > > > > In a cassandra cluster, when issueing ring command on every nodes, some > can > > show all nodes in the cluster but some can only show some other nodes. > > > > All nodes share the same seed list. > > > > And even some of the nodes in the seed list have this problem. > > > > Restarting the problematic nodes won't solve it. > > > > Try closing firewalls with following commands > > > > > > > > service iptables stop > > > > > > > > Still won't work. > > > > > > > > Anyone got a clue? > > > > > > > > Thanks very much. > > > > > > > > Yifeng > > > > >