Knowing when there is a *real* need to add nodes
Hi I'm trying to predict when my cluster would soon be needing new nodes added, i want a continuous graph telling my of my cluster health so that when i see my cluster becomes more and more busy (I want numbers measurments) i would be able to know i need to start purchasing more machines and get them into my cluster, so i want to know of that beforehand. I'm writing here what I came with after doing some research over net. I would highly appreciate any additional gauge measurements and ranges in order to test my cluster health and to know beforehand when i'm going to soon need more nodes.Although i'm writing down green gauge,yellow gauge,red gauge, i'm also trying to find a continuous graph where i can tell where our cluster stand (as much as possible...) Also my recommendation is always before adding new nodes: 1. Make sure all nodes are balanced and if not balance them. 2. Separate commit log drive from data (SSTables) drive 3. use mmap index only in memory and not auto 4. Increase disk IO if possible. 5. Avoid swapping as much as possible. As for my gauge tests for when to add new nodes: test: nodetool tpstats -h cassandra_host green gauge: No pending column with number higher yellow gauge: pending columns 100-2000 red gauge:Larger than 3000 test: iostat -x -n -p -z 5 10 and iostat -xcn 5 green gauge: kw/s + kr/s reaches is below 25% capacity of disk io yellow gauge: 20%-50% red gauge: 50%+ test: ostat -x -n -p -z 5 10 and check %b column green gauge: less than 10% yellow gauge: 10%-80% red gauge: 90%+ test: nodetool cfstats --host localhost green gauge: “SSTable count” item does not continually grow over time yellow gauge: red gauge: “SSTable count” item continually grows over time test: ./nodetool cfstats --host localhost | grep -i pending green gauge: 0-2 yellow gauge: 3-100 red gauge: 101+ I would highly appreciate any additional gauge measurements and ranges in order to test my cluster health and to know ***beforehand*** when i'm going to soon need more nodes.
Re: network topology issue
On Thu, May 12, 2011 at 1:58 AM, Anurag Gujral anurag.guj...@gmail.com wrote: Hi All, I am testing network topology strategy in cassandra I am using two nodes , one node each in different data center. Since the nodes are in different dc I assigned token 0 to both the nodes. I added both the nodes as seeds in the cassandra.yaml and I am using properyfilesnitch as endpoint snitch where I have specified the colo details. I started first node then I when I restarted second node I got an error that token 0 is already being used.Why am I getting this error. You cannot have two nodes with the same token, so you'll have to use 0 and 1 for instance. It's true that with NTS you have to think of each datacenter as a separate ring, but there is still this restriction that each token must be different across the whole cluster. Second Question: I already have cassandra running in two different data centers I want to add a new keyspace which uses networkTopology strategy in the light of above errors how can I accomplish this. Thanks Anurag
Import/Export of Schema Migrations
My use case is like this: I have a development cluster, a staging cluster and a production cluster. When I finish a set of migrations (i.e. changes) on the development cluster, I want to apply them to the staging cluster, and eventually the production cluster. I don't want to do it by hand, because it's a painful and error-prone process. What I would like to do is export the last N migrations from the development cluster as a text file, with exactly the same format as the original text commands, and import them to the staging and production clusters. I think the best place to do this might be the CLI, since you would probably want to view your migrations before exporting them. Something like this: show migrations N;Shows the last N migrations. export migrations N fileName; Exports the last N migrations to file fileName. import migrations fileName; Imports migrations from fileName. The import process would apply the migrations one at a time giving you feedback like, applying migration: update column family If a migration fails, the process should give an appropriate message and stop. Is anyone else interested in this? I have created a Jira ticket for it here: https://issues.apache.org/jira/browse/CASSANDRA-2636
Re: Excessive allocation during hinted handoff
Just out of curiosity is this on the receiver or sender side? I have been wondering a bit if the hint playback could need some adjustment. There is potentially quite big differences on how much is sent per throttle delay time depending on what your data looks like. Early 0.7 releases also built up hints very easily under load due to nodes quickly getting marked as down due to gossip sharing the same thread as many other operations. Terje On Thu, May 12, 2011 at 1:28 PM, Jonathan Ellis jbel...@gmail.com wrote: Doesn't really look abnormal to me for a heavy write load situation which is what receiving hints is. On Wed, May 11, 2011 at 1:55 PM, Gabriel Tataranu gabr...@wajam.com wrote: Greetings, I'm experiencing some issues with 2 nodes (out of more than 10). Right after startup (Listening for thrift clients...) the nodes will create objects at high rate using all available CPU cores: INFO 18:13:15,350 GC for PS Scavenge: 292 ms, 494902976 reclaimed leaving 2024909864 used; max is 6658457600 INFO 18:13:20,393 GC for PS Scavenge: 252 ms, 478691280 reclaimed leaving 2184252600 used; max is 6658457600 INFO 18:15:23,909 GC for PS Scavenge: 283 ms, 452943472 reclaimed leaving 5523891120 used; max is 6658457600 INFO 18:15:24,912 GC for PS Scavenge: 273 ms, 466157568 reclaimed leaving 5594606128 used; max is 6658457600 This will eventually trigger old-gen GC and then the process repeats until hinted handoff finishes. The build version was updated from 0.7.2 to 0.7.5 but the behavior was exactly the same. Thank you. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Excessive allocation during hinted handoff
An if you have 10 nodes, do all of them happen to send hints to the two with GC? Terje On Thu, May 12, 2011 at 6:10 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Just out of curiosity is this on the receiver or sender side? I have been wondering a bit if the hint playback could need some adjustment. There is potentially quite big differences on how much is sent per throttle delay time depending on what your data looks like. Early 0.7 releases also built up hints very easily under load due to nodes quickly getting marked as down due to gossip sharing the same thread as many other operations. Terje On Thu, May 12, 2011 at 1:28 PM, Jonathan Ellis jbel...@gmail.com wrote: Doesn't really look abnormal to me for a heavy write load situation which is what receiving hints is. On Wed, May 11, 2011 at 1:55 PM, Gabriel Tataranu gabr...@wajam.com wrote: Greetings, I'm experiencing some issues with 2 nodes (out of more than 10). Right after startup (Listening for thrift clients...) the nodes will create objects at high rate using all available CPU cores: INFO 18:13:15,350 GC for PS Scavenge: 292 ms, 494902976 reclaimed leaving 2024909864 used; max is 6658457600 INFO 18:13:20,393 GC for PS Scavenge: 252 ms, 478691280 reclaimed leaving 2184252600 used; max is 6658457600 INFO 18:15:23,909 GC for PS Scavenge: 283 ms, 452943472 reclaimed leaving 5523891120 used; max is 6658457600 INFO 18:15:24,912 GC for PS Scavenge: 273 ms, 466157568 reclaimed leaving 5594606128 used; max is 6658457600 This will eventually trigger old-gen GC and then the process repeats until hinted handoff finishes. The build version was updated from 0.7.2 to 0.7.5 but the behavior was exactly the same. Thank you. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Cassandra causing very high load on CPU's 0.6.3
Hi All I am experience some problem with me two Cassandra node with RF=2,Both node CPU's usage is very high,load average: 9.47, 5.72, 5.11 and this causing my application to time out .I have xeon with 8 processor and 16 GB of Ram.and LVM setup for Cassandra.How can i trace the main issue of load i have no swap and swappies also set to 0.I am running Centos 5.5 64 Bit. tpstats -- Pool NameActive Pending Completed STREAM-STAGE 0 0 0 RESPONSE-STAGE0 0 18439 ROW-READ-STAGE0 0 16235 LB-OPERATIONS 0 0 0 MESSAGE-DESERIALIZER-POOL 0 0 63673 GMFD 0 0632 LB-TARGET 0 0 0 CONSISTENCY-MANAGER 0 0 4008 ROW-MUTATION-STAGE1 1 70414 MESSAGE-STREAMING-POOL0 0 0 LOAD-BALANCER-STAGE 0 0 0 FLUSH-SORTER-POOL 0 0 0 MEMTABLE-POST-FLUSHER 0 0 32 FLUSH-WRITER-POOL 0 0 32 AE-SERVICE-STAGE 0 0 0 HINTED-HANDOFF-POOL 1 1 0 -- S.Ali Ahsan
CounterColumn increments gone after restart
Hi guys, I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the way it should be but: - I create a ColumnFamily named Counters - do a few increments on a column. - kill cassandra - start cassandra When I look at the counter column, the value is 1. See the following pastebin please: http://pastebin.com/9jYdDiRY
Re: Excessive allocation during hinted handoff
Greetings, Just out of curiosity is this on the receiver or sender side? Looks like sender side, although the 2 nodes were replicating to each other so it's hard to tell. I have been wondering a bit if the hint playback could need some adjustment. There is potentially quite big differences on how much is sent per throttle delay time depending on what your data looks like. Early 0.7 releases also built up hints very easily under load due to nodes quickly getting marked as down due to gossip sharing the same thread as many other operations. Like I said, cassandra was updated to 0.7.5 (latest build as of today) following the advice on IRC. There was no change in behavior. Best, Gabriel
Re: Excessive allocation during hinted handoff
An if you have 10 nodes, do all of them happen to send hints to the two with GC? The 2 nodes are adjacent in token range. They are replicating to each other. Other nodes have no data to replicate so there's no proof one way or another. Best, Gabriel
Re: CounterColumn increments gone after restart
see the ticket https://issues.apache.org/jira/browse/CASSANDRA-2642 please On Thu, May 12, 2011 at 3:28 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Hi guys, I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the way it should be but: - I create a ColumnFamily named Counters - do a few increments on a column. - kill cassandra - start cassandra When I look at the counter column, the value is 1. See the following pastebin please: http://pastebin.com/9jYdDiRY
Re: Excessive allocation during hinted handoff
Greetings, Doesn't really look abnormal to me for a heavy write load situation which is what receiving hints is. I would agree with you but this raises some questions about write performance. Plus I've only seen this kind of behavior recently and only on 2 adjacent nodes. So I have good reason to believe this is the exception and not the rule. Best, Gabriel
Re: Excessive allocation during hinted handoff
I'm assuming the two nodes are the ones receiving the HH after they were down. Adjacent, so yes. Are there a lot of hints collected while they are down ? you can check the HintedHandOffManager MBean in JConsole There wasn't any downtime - that's something else that's weird. What does the TPStats look like on the nodes under pressure ? And how many nodes are delivering hints to the nodes when they restart? TPStats do show activity on the HH. I'll have some examples latter if the nodes decide to do this again. Finally hinted_handoff_throttle_delay_in_ms in conf/cassandra.yaml will let you slow down the delivery rate if HH is indeed the problem. That's useful info. Thanks. Best, Gabriel
Re: Excessive allocation during hinted handoff
What does the TPStats look like on the nodes under pressure ? And how many nodes are delivering hints to the nodes when they restart? $nodetool -h 127.0.0.1 tpstats Pool NameActive Pending Completed ReadStage 1 11992475 RequestResponseStage 0 02247486 MutationStage 0 01631349 ReadRepairStage 0 0 583432 GossipStage 0 0 241324 AntiEntropyStage 0 0 0 MigrationStage0 0 0 MemtablePostFlusher 0 0 46 StreamStage 0 0 0 FlushWriter 0 0 46 MiscStage 0 0 0 FlushSorter 0 0 0 InternalResponseStage 0 0 0 HintedHandoff 1 5152 dstat -cmdln during the event: total-cpu-usage --memory-usage- ---load-avg--- -dsk/total- -net/total- usr sys idl wai hiq siq| used buff cach free| 1m 5m 15m | read writ| recv send 87 6 6 0 0 1|6890M 32.1M 1001M 42.8M|2.36 2.87 1.73| 0 0 | 75k 144k 88 10 2 0 0 0|6889M 32.2M 1002M 41.6M|3.05 3.00 1.78| 0 0 | 60k 102k 89 9 2 0 0 0|6890M 32.2M 1003M 41.0M|3.05 3.00 1.78| 0 0 | 38k 70k 89 10 1 0 0 0|6890M 32.2M 1003M 40.7M|3.05 3.00 1.78| 0 0 | 26k 24k 93 6 2 0 0 0|6890M 32.2M 1003M 40.9M|3.05 3.00 1.78| 0 0 | 37k 31k 90 8 2 0 0 0|6890M 32.2M 1003M 39.9M|3.05 3.00 1.78| 0 0 | 67k 69k 87 8 4 0 0 1|6890M 32.2M 1004M 38.7M|4.09 3.22 1.85| 0 0 | 123k 262k 83 13 2 0 0 2|6890M 32.2M 1004M 38.3M|4.09 3.22 1.85| 0 0 | 445k 18M 90 6 3 0 0 0|6890M 32.2M 1005M 38.2M|4.09 3.22 1.85| 0 0 | 72k 91k 40 7 25 27 0 0|6890M 32.2M 1005M 37.8M|4.09 3.22 1.85| 0 0 | 246k 8034k 0 0 59 41 0 0|6890M 32.2M 1005M 37.7M|4.09 3.22 1.85| 0 0 | 19k 6490B 1 2 45 52 0 0|6891M 32.2M 999M 43.1M|4.00 3.21 1.86| 0 0 | 29k 18k 72 8 15 3 0 1|6892M 32.2M 999M 41.6M|4.00 3.21 1.86| 0 0 | 431k 11M 88 9 2 0 0 1|6907M 32.0M 985M 41.1M|4.00 3.21 1.86| 0 0 | 99k 77k 88 10 1 0 0 1|6913M 31.9M 977M 44.1M|4.00 3.21 1.86| 0 0 | 112k 619k 89 9 1 0 0 1|6892M 31.9M 977M 64.4M|4.00 3.21 1.86| 0 0 | 109k 369k 90 8 1 0 0 0|6892M 31.9M 979M 62.5M|4.80 3.39 1.92| 0 0 | 130k 97k 83 13 1 0 0 3|6893M 32.0M 981M 59.8M|4.80 3.39 1.92| 0 0 | 503k 18M 78 11 10 0 0 0|6893M 32.0M 981M 59.5M|4.80 3.39 1.92| 0 0 | 102k 110k The low cpu periods are due to major GC (JVM frozen). TPStats do show activity on the HH. I'll have some examples latter if the nodes decide to do this again. Finally hinted_handoff_throttle_delay_in_ms in conf/cassandra.yaml will let you slow down the delivery rate if HH is indeed the problem. Best, Gabriel
Re: Cassandra causing very high load on CPU's 0.6.3
On 05/12/2011 04:08 PM, Ali Ahsan wrote: Hi All I am experience some problem with me two Cassandra node with RF=2,Both node CPU's usage is very high,load average: 9.47, 5.72, 5.11 and this causing my application to time out .I have xeon with 8 processor and 16 GB of Ram.and LVM setup for Cassandra.How can i trace the main issue of load i have no swap and swappies also set to 0.I am running Centos 5.5 64 Bit. Add to this i am using openjdk not sunjdk will this be an issue ?
Re: Cassandra causing very high load on CPU's 0.6.3
On Thu, May 12, 2011 at 6:04 PM, Ali Ahsan ali.ah...@panasiangroup.com wrote: On 05/12/2011 04:08 PM, Ali Ahsan wrote: Hi All I am experience some problem with me two Cassandra node with RF=2,Both node CPU's usage is very high,load average: 9.47, 5.72, 5.11 and this causing my application to time out .I have xeon with 8 processor and 16 GB of Ram.and LVM setup for Cassandra.How can i trace the main issue of load i have no swap and swappies also set to 0.I am running Centos 5.5 64 Bit. Add to this i am using openjdk not sunjdk will this be an issue ? It is indeed advised to use sunjdk as openjdk is a bit behind as far as bug fixes are concerned. Moreover, 0.6.3 is pretty old now and we do have fixed a number of issue related to load spikes, so before investigating further the best advice I can give you is to upgrade (either to 0.6.13 if you really feel like staying on 0.6, or to 0.7.5). -- Sylvain
Monitoring bytes read per cf
Hi all got a question for folks with some code insight again. To be able to better understand where our IO load is coming from we want to monitor the number of bytes read from disc per cf. (we love stats) What I have done is wrapping the FileDataInput in SSTableReader to sum the bytes read in CFS. This will only record data file access but that would be good enough for us. It seems to work fine but maybe someone here knows that this is not a good idea Cheers, Daniel Some code: SSTableReader: private static final boolean KEEP_IO_STATISICS = Boolean.getBoolean(cassandra.keepIOStats); public FileDataInput getFileDataInput(DecoratedKey decoratedKey, int bufferSize) { long position = getPosition(decoratedKey, Operator.EQ); if (position 0) return null; FileDataInput segment = dfile.getSegment(position, bufferSize); return (KEEP_IO_STATISICS) ? new MonitoringFileDataIInput(metadata, segment) : segment; } with MonitoringFileDataIInput public class MonitoringFileDataIInput implements FileDataInput, Closeable { private final FileDataInput fileDataInput; private final ColumnFamilyStore columnFamilyStore; public MonitoringFileDataIInput(CFMetaData cfMetaData, FileDataInput fileDataInput) { columnFamilyStore = Table.open(cfMetaData.tableName).getColumnFamilyStore(cfMetaData.cfId); this.fileDataInput = fileDataInput; } @Override public boolean readBoolean() throws IOException { columnFamilyStore.addBytesRead(1); return fileDataInput.readBoolean(); } // ... etc and ColumnFamilyStore private final AtomicLong bytesRead = new AtomicLong(0L); @Override // ColumnFamilyStoreMBean public long getBytesRead() { return bytesRead.get(); } public void addBytesRead(int num) { bytesRead.addAndGet(num); }
Re: Cassandra causing very high load on CPU's 0.6.3
https://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.6.13/CHANGES.txt On Thu, May 12, 2011 at 11:56 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: It is indeed advised to use sunjdk as openjdk is a bit behind as far as bug fixes are concerned. Moreover, 0.6.3 is pretty old now and we do have fixed a number of issue related to load spikes, so before investigating further the best advice I can give you is to upgrade (either to 0.6.13 if you really feel like staying on 0.6, or to 0.7.5). Thanks let me discussed that with my team.How many changes and where do we need. -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: network topology issue
Thanks everyone for your responses. On Thu, May 12, 2011 at 1:18 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Thu, May 12, 2011 at 1:58 AM, Anurag Gujral anurag.guj...@gmail.com wrote: Hi All, I am testing network topology strategy in cassandra I am using two nodes , one node each in different data center. Since the nodes are in different dc I assigned token 0 to both the nodes. I added both the nodes as seeds in the cassandra.yaml and I am using properyfilesnitch as endpoint snitch where I have specified the colo details. I started first node then I when I restarted second node I got an error that token 0 is already being used.Why am I getting this error. You cannot have two nodes with the same token, so you'll have to use 0 and 1 for instance. It's true that with NTS you have to think of each datacenter as a separate ring, but there is still this restriction that each token must be different across the whole cluster. Second Question: I already have cassandra running in two different data centers I want to add a new keyspace which uses networkTopology strategy in the light of above errors how can I accomplish this. Thanks Anurag
Commitlog Disk Full
Hey guys, I have a ec2 debian cluster consisting of several nodes running 0.7.5 on ephimeral disks. These are fresh installs and not upgrades. The commitlog is set to the smaller of the disks which is around 10G in size and the datadir is set to the bigger disk. The config file is basically the same as the one supplied by the default installation. Our applications write to the cluster. After about a day of writing we started noticing the commitlog disk filling up. Soon we went over the disk limit and writes started failing. At this point we stopped the cluster. Over the course of the day we inserted around 25G of data. Our columns values are pretty small. I understand that cassandra periodically cleans up the commitlog directories by generating sstables in datadir. Is there any way to speed up this movement from commitog to datadir? Thanks!
Re: Commitlog Disk Full
I understand that cassandra periodically cleans up the commitlog directories by generating sstables in datadir. Is there any way to speed up this movement from commitog to datadir? commitlog_rotation_threshold_in_mb could cause problems if it was set very very high, but with the default of 128mb it should not be an issue. I suspect the most likely reason is that you have a column family whose memtable flush settings are extreme. A commit log segment cannot be removed until the corresponding data has been flushed to an sstable. For high-throughput memtables where you flush regularly this should happen often. For idle or almost idle memtables you may be waiting on the timeout criteria to trigger. So in general, having a memtable with a long expiry time will have the potential to generate commit logs of whatever size is implied by the write traffic during that periods. The memtable setting in question is the memtable_flush_after setting. Do you have that set to something very high on one of your column families? You can use describe keyspace name_of_keyspace in cassandra-cli to check current settings. -- / Peter Schuller
Hinted Handoff
Hi All, I have two questions: a) Is there a way to turn on and off hinted handoff per keyspace rather than for multiple keyspaces. b)It looks like cassandra stores hinted handoff data in one row.Is it true? .Does having one row for hinted handoff implies if nodes are down for longer period of time not all the data which needs to be replicated will be on the node which is alive. Thanks Anurag
running TPC-C on cassandra clusters
Hi all, My partner and I currently using cassandra cluster to run TPC-C. We first use 2 ec2 nodes to load 20 warehouses. One(client node) has 8 cores, the other(worker node) has 4 cores. During the loading time, either the client node or the worker node will down(cannot be detected) randomly and then up again in a short time. If the two nodes both down, we failed in loading. If only one of them down, we can continue to load data. The problem is if we use multiple threads(we write multiprocess code), say 4 clients threads, some of them might be stop at the point one of the nodes first down, and the dead threads will never come back This will not only enlarge our loading time, but also effect the amount of data we can load. So we need to figure out why the nodes continue to be up and down and fix this problem. Thanks for any help! Best, Xiaowei
Re: Hinted Handoff
I'm not sure about your first question. I believe the internal system keyspace holds the hinted handoff information. In 0.6 and earlier, HintedHandoffManager.sendMessage used to read the entire row into memory and then send the row back to the client in a single message. As of 0.7, Cassandra pages within a single hinted row instead (which improves performance for wide rows). On Thu, May 12, 2011 at 11:48 AM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I have two questions: a) Is there a way to turn on and off hinted handoff per keyspace rather than for multiple keyspaces. b)It looks like cassandra stores hinted handoff data in one row.Is it true? .Does having one row for hinted handoff implies if nodes are down for longer period of time not all the data which needs to be replicated will be on the node which is alive. Thanks Anurag
Re: running TPC-C on cassandra clusters
I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6) On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Hi all, My partner and I currently using cassandra cluster to run TPC-C. We first use 2 ec2 nodes to load 20 warehouses. One(client node) has 8 cores, the other(worker node) has 4 cores. During the loading time, either the client node or the worker node will down(cannot be detected) randomly and then up again in a short time. If the two nodes both down, we failed in loading. If only one of them down, we can continue to load data. The problem is if we use multiple threads(we write multiprocess code), say 4 clients threads, some of them might be stop at the point one of the nodes first down, and the dead threads will never come back This will not only enlarge our loading time, but also effect the amount of data we can load. So we need to figure out why the nodes continue to be up and down and fix this problem. Thanks for any help! Best, Xiaowei -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Crash when uploading large data sets
I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB unique data), to a cluster of 10 servers. I'm using batch_mutate, and breaking the data up into chunks of about 10k records. Each record is about 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data set, everything works fine. When I upload the 20 GB data set, servers will occasionally crash. Currently I have my client code automatically detect this and restart the server, but that is less than ideal. I'm not sure what information to gather to determine what's going on here. Here is a sample of a log file from when a crash occurred. The crash was immediately after the log entry tagged 2011-05-12 19:02:19,377. Any idea what's going on here? Any other info I can gather to try to debug this? INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 7774142464 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 50) Creating new commitlog segment /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 7774142464 INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) Completed flushing /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 bytes) INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) Discarding obsolete commit log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log) INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) GC for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 7774142464 INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 operations) INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) Writing Memtable-Standard1@479849353(51941121 bytes, 1115783 operations) INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61 INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) Logging initialized INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) Heap size: 7634681856/7635730432 INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. Native methods will be disabled. INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) Loading settings from file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Schema-f-1 INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Schema-f-2 INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1 INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2 INFO [main] 2011-05-12 19:02:21,342 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-2 INFO [main] 2011-05-12 19:02:21,344 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-1 INFO
Re: Crash when uploading large data sets
The key JVM options for Cassandra are in cassandra.in.sh. What is your min and max heap size? The default setting of max heap size is 1GB. How much RAM do your nodes have? You may want to increase this setting. You can also set the -Xmx and -Xms options to the same value to keep Java from having to manage heap growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a lot more on 64-bit. Try messing with some of the other settings in the cassandra.in.sh file. You may not have DEBUG mode turned on for Cassandra and therefore may not be getting the full details of what's going on when the server crashes. In the cassandra-home/conf/log4j-server.properties file, set this line from the default of INFO to DEBUG: log4j.rootLogger=INFO,stdout,R Also, you haven't configured JNA on this server. Here's some info about it and how to configure it: JNA provides Java programs easy access to native shared libraries without writing anything but Java code. Note from Cassandra developers for why JNA is needed: *Linux aggressively swaps out infrequently used memory to make more room for its file system buffer cache. Unfortunately, modern generational garbage collectors like the JVM's leave parts of its heap un-touched for relatively large amounts of time, leading Linux to swap it out. When the JVM finally goes to use or GC that memory, swap hell ensues. Setting swappiness to zero can mitigate this behavior but does not eliminate it entirely. Turning off swap entirely is effective. But to avoid surprising people who don't know about this behavior, the best solution is to tell Linux not to swap out the JVM, and that is what we do now with mlockall via JNA. Because of licensing issues, we can't distribute JNA with Cassandra, so you must manually add it to the Cassandra lib/ directory or otherwise place it on the classpath. If the JNA jar is not present, Cassandra will continue as before.* Get JNA with: *cd ~ wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb* To install: *techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb (Reading database ... 44334 files and directories currently installed.) Preparing to replace libjna-java 3.2.4-2 (using libjna-java_3.2.7-0~nmu.2_amd64.deb) ... Unpacking replacement libjna-java ... Setting up libjna-java (3.2.7-0~nmu.2) ...* The deb package will install the JNA jar file to /usr/share/java/jna.jar, but Cassandra only loads it if its in the class path. The easy way to do this is just create a symlink into your Cassandra lib directory (note: replace /home/techlabs with your home dir location): *ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib* Research: http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/ - Sameer On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote: I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB unique data), to a cluster of 10 servers. I'm using batch_mutate, and breaking the data up into chunks of about 10k records. Each record is about 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data set, everything works fine. When I upload the 20 GB data set, servers will occasionally crash. Currently I have my client code automatically detect this and restart the server, but that is less than ideal. I'm not sure what information to gather to determine what's going on here. Here is a sample of a log file from when a crash occurred. The crash was immediately after the log entry tagged 2011-05-12 19:02:19,377. Any idea what's going on here? Any other info I can gather to try to debug this? INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 7774142464 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 50) Creating new commitlog segment /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 7774142464 INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) Completed flushing
Re: Unable to add columns to empty row in Column family: Cassandra
Can u share the code? On Mon, May 2, 2011 at 11:34 PM, anuya joshi anu...@gmail.com wrote: Hello, I am using Cassandra for my application.My Cassandra client uses Thrift APIs directly. The problem I am facing currently is as follows: 1) I added a row and columns in it dynamically via Thrift API Client 2) Next, I used command line client to delete row which actually deleted all the columns in it, leaving empty row with original row id. 3) Now, I am trying to add columns dynamically using client program into this empty row with same row key However, columns are not being inserted. But, when tried from command line client, it worked correctly. Any pointer on this would be of great use Thanks in advance, Regards, Anuya -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Crash when uploading large data sets
It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my physical memory. These are 15GB VMs, so that's 7.5GB for Cassandra. I would have expected that to work, but I will override to 13 GB just to see what happens. I've also got the JNA thing set up. Do you think this would cause the crashes, or is it just a performance improvement? On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote: The key JVM options for Cassandra are in cassandra.in.sh. What is your min and max heap size? The default setting of max heap size is 1GB. How much RAM do your nodes have? You may want to increase this setting. You can also set the -Xmx and -Xms options to the same value to keep Java from having to manage heap growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a lot more on 64-bit. Try messing with some of the other settings in the cassandra.in.sh file. You may not have DEBUG mode turned on for Cassandra and therefore may not be getting the full details of what's going on when the server crashes. In the cassandra-home/conf/log4j-server.properties file, set this line from the default of INFO to DEBUG: log4j.rootLogger=INFO,stdout,R Also, you haven't configured JNA on this server. Here's some info about it and how to configure it: JNA provides Java programs easy access to native shared libraries without writing anything but Java code. Note from Cassandra developers for why JNA is needed: Linux aggressively swaps out infrequently used memory to make more room for its file system buffer cache. Unfortunately, modern generational garbage collectors like the JVM's leave parts of its heap un-touched for relatively large amounts of time, leading Linux to swap it out. When the JVM finally goes to use or GC that memory, swap hell ensues. Setting swappiness to zero can mitigate this behavior but does not eliminate it entirely. Turning off swap entirely is effective. But to avoid surprising people who don't know about this behavior, the best solution is to tell Linux not to swap out the JVM, and that is what we do now with mlockall via JNA. Because of licensing issues, we can't distribute JNA with Cassandra, so you must manually add it to the Cassandra lib/ directory or otherwise place it on the classpath. If the JNA jar is not present, Cassandra will continue as before. Get JNA with: cd ~ wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb To install: techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb (Reading database ... 44334 files and directories currently installed.) Preparing to replace libjna-java 3.2.4-2 (using libjna-java_3.2.7-0~nmu.2_amd64.deb) ... Unpacking replacement libjna-java ... Setting up libjna-java (3.2.7-0~nmu.2) ... The deb package will install the JNA jar file to /usr/share/java/jna.jar, but Cassandra only loads it if its in the class path. The easy way to do this is just create a symlink into your Cassandra lib directory (note: replace /home/techlabs with your home dir location): ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib Research: http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/ - Sameer On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote: I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB unique data), to a cluster of 10 servers. I'm using batch_mutate, and breaking the data up into chunks of about 10k records. Each record is about 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data set, everything works fine. When I upload the 20 GB data set, servers will occasionally crash. Currently I have my client code automatically detect this and restart the server, but that is less than ideal. I'm not sure what information to gather to determine what's going on here. Here is a sample of a log file from when a crash occurred. The crash was immediately after the log entry tagged 2011-05-12 19:02:19,377. Any idea what's going on here? Any other info I can gather to try to debug this? INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 7774142464 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 50) Creating new commitlog segment /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158)
Re: Crash when uploading large data sets
Oh, forgot this detail: I have no swap configured, so swapping is not the cause of the crash. Could it be that I'm running out of memory on a 15GB machine? That seems unlikely. I grepped dmesg for oom and didn't see anything from the oom killer, and I used the instructions from the following web page and didn't see that the oom killer had killed anything. http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer jcipar@172-19-149-62:~$ sudo cat /var/log/messages | grep --ignore-case killed process jcipar@172-19-149-62:~$ Also, this is pretty subjective, so I can't say for sure until it finishes, but this seems to be running *much* slower after setting the heap size and setting up JNA. On May 12, 2011, at 7:52 PM, James Cipar wrote: It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my physical memory. These are 15GB VMs, so that's 7.5GB for Cassandra. I would have expected that to work, but I will override to 13 GB just to see what happens. I've also got the JNA thing set up. Do you think this would cause the crashes, or is it just a performance improvement? On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote: The key JVM options for Cassandra are in cassandra.in.sh. What is your min and max heap size? The default setting of max heap size is 1GB. How much RAM do your nodes have? You may want to increase this setting. You can also set the -Xmx and -Xms options to the same value to keep Java from having to manage heap growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a lot more on 64-bit. Try messing with some of the other settings in the cassandra.in.sh file. You may not have DEBUG mode turned on for Cassandra and therefore may not be getting the full details of what's going on when the server crashes. In the cassandra-home/conf/log4j-server.properties file, set this line from the default of INFO to DEBUG: log4j.rootLogger=INFO,stdout,R Also, you haven't configured JNA on this server. Here's some info about it and how to configure it: JNA provides Java programs easy access to native shared libraries without writing anything but Java code. Note from Cassandra developers for why JNA is needed: Linux aggressively swaps out infrequently used memory to make more room for its file system buffer cache. Unfortunately, modern generational garbage collectors like the JVM's leave parts of its heap un-touched for relatively large amounts of time, leading Linux to swap it out. When the JVM finally goes to use or GC that memory, swap hell ensues. Setting swappiness to zero can mitigate this behavior but does not eliminate it entirely. Turning off swap entirely is effective. But to avoid surprising people who don't know about this behavior, the best solution is to tell Linux not to swap out the JVM, and that is what we do now with mlockall via JNA. Because of licensing issues, we can't distribute JNA with Cassandra, so you must manually add it to the Cassandra lib/ directory or otherwise place it on the classpath. If the JNA jar is not present, Cassandra will continue as before. Get JNA with: cd ~ wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb To install: techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb (Reading database ... 44334 files and directories currently installed.) Preparing to replace libjna-java 3.2.4-2 (using libjna-java_3.2.7-0~nmu.2_amd64.deb) ... Unpacking replacement libjna-java ... Setting up libjna-java (3.2.7-0~nmu.2) ... The deb package will install the JNA jar file to /usr/share/java/jna.jar, but Cassandra only loads it if its in the class path. The easy way to do this is just create a symlink into your Cassandra lib directory (note: replace /home/techlabs with your home dir location): ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib Research: http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/ - Sameer On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote: I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB unique data), to a cluster of 10 servers. I'm using batch_mutate, and breaking the data up into chunks of about 10k records. Each record is about 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data set, everything works fine. When I upload the 20 GB data set, servers will occasionally crash. Currently I have my client code automatically detect this and restart the server, but that is less than ideal. I'm not sure what information to gather to determine what's going on here. Here is a sample of a log file from when a crash occurred. The crash was immediately after the log entry tagged 2011-05-12 19:02:19,377. Any idea what's going on here? Any other info I can
Re: Crash when uploading large data sets
If it's a jvm crash there should be a hs_err_pid.log file left around in the directory you started Cassandra from. On Thu, May 12, 2011 at 6:15 PM, James Cipar jci...@cmu.edu wrote: I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB unique data), to a cluster of 10 servers. I'm using batch_mutate, and breaking the data up into chunks of about 10k records. Each record is about 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data set, everything works fine. When I upload the 20 GB data set, servers will occasionally crash. Currently I have my client code automatically detect this and restart the server, but that is less than ideal. I'm not sure what information to gather to determine what's going on here. Here is a sample of a log file from when a crash occurred. The crash was immediately after the log entry tagged 2011-05-12 19:02:19,377. Any idea what's going on here? Any other info I can gather to try to debug this? INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 7774142464 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 50) Creating new commitlog segment /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 7774142464 INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 7774142464 INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) Completed flushing /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 bytes) INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) Discarding obsolete commit log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log) INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) GC for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 7774142464 INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 operations) INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) Writing Memtable-Standard1@479849353(51941121 bytes, 1115783 operations) INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61 INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) Logging initialized INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) Heap size: 7634681856/7635730432 INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. Native methods will be disabled. INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) Loading settings from file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Schema-f-1 INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Schema-f-2 INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1 INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2 INFO [main]
Re: Crash when uploading large data sets
If this a 64bit VM? A 32bit Java VM with default c-heap settings can only actually use about 2GB of Java Heap. On Thu, May 12, 2011 at 8:08 PM, James Cipar jci...@cmu.edu wrote: Oh, forgot this detail: I have no swap configured, so swapping is not the cause of the crash. Could it be that I'm running out of memory on a 15GB machine? That seems unlikely. I grepped dmesg for oom and didn't see anything from the oom killer, and I used the instructions from the following web page and didn't see that the oom killer had killed anything. http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer jcipar@172-19-149-62:~$ sudo cat /var/log/messages | grep --ignore-case killed process jcipar@172-19-149-62:~$ Also, this is pretty subjective, so I can't say for sure until it finishes, but this seems to be running *much* slower after setting the heap size and setting up JNA. On May 12, 2011, at 7:52 PM, James Cipar wrote: It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my physical memory. These are 15GB VMs, so that's 7.5GB for Cassandra. I would have expected that to work, but I will override to 13 GB just to see what happens. I've also got the JNA thing set up. Do you think this would cause the crashes, or is it just a performance improvement? On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote: The key JVM options for Cassandra are in cassandra.in.sh. What is your min and max heap size? The default setting of max heap size is 1GB. How much RAM do your nodes have? You may want to increase this setting. You can also set the -Xmx and -Xms options to the same value to keep Java from having to manage heap growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a lot more on 64-bit. Try messing with some of the other settings in the cassandra.in.sh file. You may not have DEBUG mode turned on for Cassandra and therefore may not be getting the full details of what's going on when the server crashes. In the cassandra-home/conf/log4j-server.properties file, set this line from the default of INFO to DEBUG: log4j.rootLogger=INFO,stdout,R Also, you haven't configured JNA on this server. Here's some info about it and how to configure it: JNA provides Java programs easy access to native shared libraries without writing anything but Java code. Note from Cassandra developers for why JNA is needed: Linux aggressively swaps out infrequently used memory to make more room for its file system buffer cache. Unfortunately, modern generational garbage collectors like the JVM's leave parts of its heap un-touched for relatively large amounts of time, leading Linux to swap it out. When the JVM finally goes to use or GC that memory, swap hell ensues. Setting swappiness to zero can mitigate this behavior but does not eliminate it entirely. Turning off swap entirely is effective. But to avoid surprising people who don't know about this behavior, the best solution is to tell Linux not to swap out the JVM, and that is what we do now with mlockall via JNA. Because of licensing issues, we can't distribute JNA with Cassandra, so you must manually add it to the Cassandra lib/ directory or otherwise place it on the classpath. If the JNA jar is not present, Cassandra will continue as before. Get JNA with: cd ~ wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb To install: techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb (Reading database ... 44334 files and directories currently installed.) Preparing to replace libjna-java 3.2.4-2 (using libjna-java_3.2.7-0~nmu.2_amd64.deb) ... Unpacking replacement libjna-java ... Setting up libjna-java (3.2.7-0~nmu.2) ... The deb package will install the JNA jar file to /usr/share/java/jna.jar, but Cassandra only loads it if its in the class path. The easy way to do this is just create a symlink into your Cassandra lib directory (note: replace /home/techlabs with your home dir location): ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib Research: http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/ - Sameer On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote: I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB unique data), to a cluster of 10 servers. I'm using batch_mutate, and breaking the data up into chunks of about 10k records. Each record is about 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data set, everything works fine. When I upload the 20 GB data set, servers will occasionally crash. Currently I have my client code automatically detect this and restart the server, but that is less than ideal. I'm not sure what information to gather to determine what's going on here. Here is a sample of a log
Re: running TPC-C on cassandra clusters
Thanks Jonathan, but can you provide some links about 0.7 svn branch? 2011/5/12 Jonathan Ellis jbel...@gmail.com I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6) On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Hi all, My partner and I currently using cassandra cluster to run TPC-C. We first use 2 ec2 nodes to load 20 warehouses. One(client node) has 8 cores, the other(worker node) has 4 cores. During the loading time, either the client node or the worker node will down(cannot be detected) randomly and then up again in a short time. If the two nodes both down, we failed in loading. If only one of them down, we can continue to load data. The problem is if we use multiple threads(we write multiprocess code), say 4 clients threads, some of them might be stop at the point one of the nodes first down, and the dead threads will never come back This will not only enlarge our loading time, but also effect the amount of data we can load. So we need to figure out why the nodes continue to be up and down and fix this problem. Thanks for any help! Best, Xiaowei -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: running TPC-C on cassandra clusters
https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7 On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Thanks Jonathan, but can you provide some links about 0.7 svn branch? 2011/5/12 Jonathan Ellis jbel...@gmail.com I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6) On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Hi all, My partner and I currently using cassandra cluster to run TPC-C. We first use 2 ec2 nodes to load 20 warehouses. One(client node) has 8 cores, the other(worker node) has 4 cores. During the loading time, either the client node or the worker node will down(cannot be detected) randomly and then up again in a short time. If the two nodes both down, we failed in loading. If only one of them down, we can continue to load data. The problem is if we use multiple threads(we write multiprocess code), say 4 clients threads, some of them might be stop at the point one of the nodes first down, and the dead threads will never come back This will not only enlarge our loading time, but also effect the amount of data we can load. So we need to figure out why the nodes continue to be up and down and fix this problem. Thanks for any help! Best, Xiaowei -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: running TPC-C on cassandra clusters
Oh sorry, we use cassandra-0.7.4 already. Is the version fine? 2011/5/12 Jonathan Ellis jbel...@gmail.com https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7 On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Thanks Jonathan, but can you provide some links about 0.7 svn branch? 2011/5/12 Jonathan Ellis jbel...@gmail.com I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6) On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Hi all, My partner and I currently using cassandra cluster to run TPC-C. We first use 2 ec2 nodes to load 20 warehouses. One(client node) has 8 cores, the other(worker node) has 4 cores. During the loading time, either the client node or the worker node will down(cannot be detected) randomly and then up again in a short time. If the two nodes both down, we failed in loading. If only one of them down, we can continue to load data. The problem is if we use multiple threads(we write multiprocess code), say 4 clients threads, some of them might be stop at the point one of the nodes first down, and the dead threads will never come back This will not only enlarge our loading time, but also effect the amount of data we can load. So we need to figure out why the nodes continue to be up and down and fix this problem. Thanks for any help! Best, Xiaowei -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
assertion error in cassandra when doing nodetool move
Hi All, I run following command on one of my nodes to move the token from 0 to 2. /usr/cassandra/cassandra/bin/nodetool -h 10.170.195.204 -p 8080 move 2. I dont understand why is this happening? I am getting the following assertion error: Exception in thread main java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:389) at org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:414) at org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:94) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:929) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:895) at org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1595) at org.apache.cassandra.service.StorageService.move(StorageService.java:1733) at org.apache.cassandra.service.StorageService.move(StorageService.java:1708) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Thanks Anurag
Re: running TPC-C on cassandra clusters
Not if you want the pausing/marking down fixes that were done more recently. :) On Thu, May 12, 2011 at 8:39 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Oh sorry, we use cassandra-0.7.4 already. Is the version fine? 2011/5/12 Jonathan Ellis jbel...@gmail.com https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7 On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Thanks Jonathan, but can you provide some links about 0.7 svn branch? 2011/5/12 Jonathan Ellis jbel...@gmail.com I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6) On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Hi all, My partner and I currently using cassandra cluster to run TPC-C. We first use 2 ec2 nodes to load 20 warehouses. One(client node) has 8 cores, the other(worker node) has 4 cores. During the loading time, either the client node or the worker node will down(cannot be detected) randomly and then up again in a short time. If the two nodes both down, we failed in loading. If only one of them down, we can continue to load data. The problem is if we use multiple threads(we write multiprocess code), say 4 clients threads, some of them might be stop at the point one of the nodes first down, and the dead threads will never come back This will not only enlarge our loading time, but also effect the amount of data we can load. So we need to figure out why the nodes continue to be up and down and fix this problem. Thanks for any help! Best, Xiaowei -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: running TPC-C on cassandra clusters
Thanks! 2011/5/12 Jonathan Ellis jbel...@gmail.com Not if you want the pausing/marking down fixes that were done more recently. :) On Thu, May 12, 2011 at 8:39 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Oh sorry, we use cassandra-0.7.4 already. Is the version fine? 2011/5/12 Jonathan Ellis jbel...@gmail.com https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7 On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Thanks Jonathan, but can you provide some links about 0.7 svn branch? 2011/5/12 Jonathan Ellis jbel...@gmail.com I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6) On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com wrote: Hi all, My partner and I currently using cassandra cluster to run TPC-C. We first use 2 ec2 nodes to load 20 warehouses. One(client node) has 8 cores, the other(worker node) has 4 cores. During the loading time, either the client node or the worker node will down(cannot be detected) randomly and then up again in a short time. If the two nodes both down, we failed in loading. If only one of them down, we can continue to load data. The problem is if we use multiple threads(we write multiprocess code), say 4 clients threads, some of them might be stop at the point one of the nodes first down, and the dead threads will never come back This will not only enlarge our loading time, but also effect the amount of data we can load. So we need to figure out why the nodes continue to be up and down and fix this problem. Thanks for any help! Best, Xiaowei -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com