where is cassandra debian packages?
Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
Re: RE where is cassandra debian packages?
no, i got 404 error. 2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr: Hi, The url you mentioned is OK: e.g. http://www.apache.org/dist/cassandra/debian/dists/11x/ ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 : Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
Re: RE where is cassandra debian packages?
Hm, from erope servere cassandra packages prestn, but from russian servers absent. 2012/8/24 Michal Michalski mich...@opera.com: Well, Works for me. W dniu 24.08.2012 11:43, ruslan usifov pisze: no, i got 404 error. 2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr: Hi, The url you mentioned is OK: e.g. http://www.apache.org/dist/cassandra/debian/dists/11x/ ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 : Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high
Hello We was under ddos attack, and as result we got high ksoftirqd activity - as result cassandra begin answer very slow. But when ddos was gone high ksoftirqd activity still exists, and dissaper when i stop cassandra daemon, and repeat again when i start cassadra daemon, the fully resolution of problem is full reboot of server. What this can be (why ksoftirqd begin work very intensive when cassandra runing - we disable all working traffic to cluster but this doesn't help so this is can't be due heavy load )? And how to solve this? PS: OS ubuntu 10.0.4 (2.6.32.41) cassandra 1.0.10 java 1.6.32 (from oracle)
Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high
2012/7/1 David Daeschler david.daesch...@gmail.com: Good afternoon, This again looks like it could be the leap second issue: This looks like the problem a bunch of us were having yesterday that isn't cleared without a reboot or a date command. It seems to be related to the leap second that was added between the 30th June and the 1st of July. See the mailing list thread with subject High CPU usage as of 8pm eastern time If you are seeing high CPU usage and a stall after restarting cassandra still, and you are on Linux, try: date; date `date +%m%d%H%M%C%y.%S`; date; In a terminal and see if everything starts working again. I hope this helps. Please spread the word if you see others having issues with unresponsive kernels/high CPU. Hello, this realy helps. In our case two problems cross each other-(( and we doesn't have assumed that might be a kernel problem. On one data cluster we simply reboot it, and in seccond apply date solution and everything is fine, thanks
cassandra 1.0.x and java 1.7
Hello! Is it safe to use java 1.7 with cassandra 1.0.x Reason why i want do that, is that in java 1.7 appear options for rotate GC log: http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ff824681055961e1f62393b68deb5?bug_id=6941923
Re: kswapd0 causing read timeouts
Upgrade java (version 1.6.21 have memleaks) to latest 1.6.32. Its abnormally that on 80Gigs you have 15Gigs of index vfs_cache_pressure - used for inodes and dentrys Also to check that you have memleaks use drop_cache sysctl 2012/6/14 Gurpreet Singh gurpreet.si...@gmail.com: JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions on this.. 1. Is there a way to find out if mlockall really worked other than just the mlockall successful log message? 2. Does cassandra only mlock the jvm heap or also the mmaped memory? I disabled mmap completely, and things look so much better. latency is surprisingly half of what i see when i have mmap enabled. Its funny that i keep reading tall claims abt mmap, but in practise a lot of ppl have problems with it, especially when it uses up all the memory. We have tried mmap for different purposes in our company before,and had finally ended up disabling it, because it just doesnt handle things right when memory is low. Maybe the proc/sys/vm needs to be configured right, but thats not the easiest of configurations to get right. Right now, i am handling only 80 gigs of data. kernel version is 2.6.26. java version is 1.6.21 /G On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey a...@ooyala.com wrote: I would check /etc/sysctl.conf and get the values of /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. If you don't have JNA enabled (which Cassandra uses to fadvise) and swappiness is at its default of 60, the Linux kernel will happily swap out your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd shouldn't be doing much unless you have a too-large heap or some other app using up memory on the system. On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it in favour
Re: kswapd0 causing read timeouts
2012/6/14 Gurpreet Singh gurpreet.si...@gmail.com: JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions on this.. 1. Is there a way to find out if mlockall really worked other than just the mlockall successful log message? yes you must see something like this (from our test server): INFO [main] 2012-06-14 02:03:14,745 DatabaseDescriptor.java (line 233) Global memtable threshold is enabled at 512MB 2. Does cassandra only mlock the jvm heap or also the mmaped memory? Cassandra obviously mlock only heap, and doesn't mmaped sstables I disabled mmap completely, and things look so much better. latency is surprisingly half of what i see when i have mmap enabled. Its funny that i keep reading tall claims abt mmap, but in practise a lot of ppl have problems with it, especially when it uses up all the memory. We have tried mmap for different purposes in our company before,and had finally ended up disabling it, because it just doesnt handle things right when memory is low. Maybe the proc/sys/vm needs to be configured right, but thats not the easiest of configurations to get right. Right now, i am handling only 80 gigs of data. kernel version is 2.6.26. java version is 1.6.21 /G On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey a...@ooyala.com wrote: I would check /etc/sysctl.conf and get the values of /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. If you don't have JNA enabled (which Cassandra uses to fadvise) and swappiness is at its default of 60, the Linux kernel will happily swap out your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd shouldn't be doing much unless you have a too-large heap or some other app using up memory on the system. On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict
Re: kswapd0 causing read timeouts
Soory i mistaken,here is right string INFO [main] 2012-06-14 02:03:14,520 CLibrary.java (line 109) JNA mlockall successful 2012/6/15 ruslan usifov ruslan.usi...@gmail.com: 2012/6/14 Gurpreet Singh gurpreet.si...@gmail.com: JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions on this.. 1. Is there a way to find out if mlockall really worked other than just the mlockall successful log message? yes you must see something like this (from our test server): INFO [main] 2012-06-14 02:03:14,745 DatabaseDescriptor.java (line 233) Global memtable threshold is enabled at 512MB 2. Does cassandra only mlock the jvm heap or also the mmaped memory? Cassandra obviously mlock only heap, and doesn't mmaped sstables I disabled mmap completely, and things look so much better. latency is surprisingly half of what i see when i have mmap enabled. Its funny that i keep reading tall claims abt mmap, but in practise a lot of ppl have problems with it, especially when it uses up all the memory. We have tried mmap for different purposes in our company before,and had finally ended up disabling it, because it just doesnt handle things right when memory is low. Maybe the proc/sys/vm needs to be configured right, but thats not the easiest of configurations to get right. Right now, i am handling only 80 gigs of data. kernel version is 2.6.26. java version is 1.6.21 /G On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey a...@ooyala.com wrote: I would check /etc/sysctl.conf and get the values of /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. If you don't have JNA enabled (which Cassandra uses to fadvise) and swappiness is at its default of 60, the Linux kernel will happily swap out your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd shouldn't be doing much unless you have a too-large heap or some other app using up memory on the system. On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb
Re: kswapd0 causing read timeouts
Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it in favour of mmap? Also, mmapping data files would basically cause not only the data (asked for) to be read into main memory, but also a bunch of extra pages (readahead), which would not be very useful, right? The same thing for index would actually be more useful, as there would be more index entries in the readahead part.. and the index files being small wouldnt cause memory pressure that page cache would be evicted. mmapping the data files would make sense if the data size is smaller than the RAM or the hot data set is smaller than the RAM, otherwise just the index would probably be a better thing to mmap, no?. In my case data size is 85 gigs, while available RAM is 16 gigs (only 8 gigs after heap). /G On Fri, Jun 8, 2012 at 11:44 AM, aaron morton aa...@thelastpickle.com wrote: Ruslan, Why did you suggest changing the disk_access_mode ? Gurpreet, I would leave the disk_access_mode with the default until you have a reason to change it. 8 core, 16 gb ram, 6 data disks raid0, no swap configured is swap disabled ? Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts 70% of one core or 70% of all cores ? Check the server logs, is there GC activity ? check nodetool cfstats to see the read latency for the cf. Take a look at vmstat to see if you are swapping, and look at iostats to see if io is the problem http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote: Thanks Ruslan. I will try the mmap_index_only. Is there any guideline as to when to leave it to auto and when to use mmap_index_only? /G On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov ruslan.usi...@gmail.com wrote: disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only
Re: kswapd0 causing read timeouts
disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only in cassandra yaml 2012/6/8 Gurpreet Singh gurpreet.si...@gmail.com: Hi, I am testing cassandra 1.1 on a 1 node cluster. 8 core, 16 gb ram, 6 data disks raid0, no swap configured cassandra 1.1.1 heap size: 8 gigs key cache size in mb: 800 (used only 200mb till now) memtable_total_space_in_mb : 2048 I am running a read workload.. about 30 reads/second. no writes at all. The system runs fine for roughly 12 hours. jconsole shows that my heap size has hardly touched 4 gigs. top shows - SHR increasing slowly from 100 mb to 6.6 gigs in these 12 hrs RES increases slowly from 6 gigs all the way to 15 gigs buffers are at a healthy 25 mb at some point and that goes down to 2 mb in these 12 hrs VIRT stays at 85 gigs I understand that SHR goes up because of mmap, RES goes up because it is showing SHR value as well. After around 10-12 hrs, the cpu utilization of the system starts increasing, and i notice that kswapd0 process starts becoming more active. Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts. The fact that the buffers went down from 20 mb to 2 mb suggests that kswapd0 is probably swapping out the pagecache. Is there a way out of this to avoid the kswapd0 starting to do things even when there is no swap configured? This is very easily reproducible for me, and would like a way out of this situation. Do i need to adjust vm memory management stuff like pagecache, vfs_cache_pressure.. things like that? just some extra information, jna is installed, mlockall is successful. there is no compaction running. would appreciate any help on this. Thanks Gurpreet
Re: kswapd0 causing read timeouts
2012/6/8 aaron morton aa...@thelastpickle.com: Ruslan, Why did you suggest changing the disk_access_mode ? Because this bring problems on empty seat, in any case for me mmap bring similar problem and i doesn't have find any solution to resolve it, only change disk_access_mode:-((. For me also will be interesting hear results of author of this theme Gurpreet, I would leave the disk_access_mode with the default until you have a reason to change it. 8 core, 16 gb ram, 6 data disks raid0, no swap configured is swap disabled ? Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts 70% of one core or 70% of all cores ? Check the server logs, is there GC activity ? check nodetool cfstats to see the read latency for the cf. Take a look at vmstat to see if you are swapping, and look at iostats to see if io is the problem http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote: Thanks Ruslan. I will try the mmap_index_only. Is there any guideline as to when to leave it to auto and when to use mmap_index_only? /G On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov ruslan.usi...@gmail.com wrote: disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only in cassandra yaml 2012/6/8 Gurpreet Singh gurpreet.si...@gmail.com: Hi, I am testing cassandra 1.1 on a 1 node cluster. 8 core, 16 gb ram, 6 data disks raid0, no swap configured cassandra 1.1.1 heap size: 8 gigs key cache size in mb: 800 (used only 200mb till now) memtable_total_space_in_mb : 2048 I am running a read workload.. about 30 reads/second. no writes at all. The system runs fine for roughly 12 hours. jconsole shows that my heap size has hardly touched 4 gigs. top shows - SHR increasing slowly from 100 mb to 6.6 gigs in these 12 hrs RES increases slowly from 6 gigs all the way to 15 gigs buffers are at a healthy 25 mb at some point and that goes down to 2 mb in these 12 hrs VIRT stays at 85 gigs I understand that SHR goes up because of mmap, RES goes up because it is showing SHR value as well. After around 10-12 hrs, the cpu utilization of the system starts increasing, and i notice that kswapd0 process starts becoming more active. Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts. The fact that the buffers went down from 20 mb to 2 mb suggests that kswapd0 is probably swapping out the pagecache. Is there a way out of this to avoid the kswapd0 starting to do things even when there is no swap configured? This is very easily reproducible for me, and would like a way out of this situation. Do i need to adjust vm memory management stuff like pagecache, vfs_cache_pressure.. things like that? just some extra information, jna is installed, mlockall is successful. there is no compaction running. would appreciate any help on this. Thanks Gurpreet
Re: nodetool repair -- should I schedule a weekly one ?
Yes, for ONE you cant got inconsistent read in case when one of you nodes are die, and dinamyc snitch doesn't do it job 2012/6/7 Oleg Dulin oleg.du...@gmail.com: We have a 3-node cluster. We use RF of 3 and CL of ONE for both reads and writes…. Is there a reason I should schedule a regular nodetool repair job ? Thanks, Oleg
Re: nodetool repair -- should I schedule a weekly one ?
Sorry no dinamic snitch, but hinted handoff. Remember casaandra is evently consistent 2012/6/8 ruslan usifov ruslan.usi...@gmail.com: Yes, for ONE you cant got inconsistent read in case when one of you nodes are die, and dinamyc snitch doesn't do it job 2012/6/7 Oleg Dulin oleg.du...@gmail.com: We have a 3-node cluster. We use RF of 3 and CL of ONE for both reads and writes…. Is there a reason I should schedule a regular nodetool repair job ? Thanks, Oleg
Re: row_cache_provider = 'SerializingCacheProvider'
I have setup 5GB of JavaHeap wit follow tuning: MAX_HEAP_SIZE=5G HEAP_NEWSIZE=800M JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=5 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=65 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:CMSFullGCsBeforeCompaction=1 Also I set up 2GB to memtables (memtable_total_space_in_mb: 2048) My avg heap usage (nodetool -h localhost info): 3G Based on nodetool -h localhost cfhistograms i calc avg row size 70KB I setup row cache only for one CF with follow settings: update column family building with rows_cached=1 and row_cache_provider='SerializingCacheProvider'; When i setup row cache i got promotion failure in GC (with stop the world pause about 30secs) with almost HEAP filled. I very confused with this behavior. PS: i use cassandra 1.0.10, with JNA 3.4.0 on ubuntu lucid (kernel 2.6.32-41) 2012/6/4 aaron morton aa...@thelastpickle.com: Yes SerializingCacheProvider is the off heap caching provider. Can you do some more digging into what is using the heap ? Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/06/2012, at 9:52 PM, ruslan usifov wrote: Hello I begin use SerializingCacheProvider for rows cashing, and got extremely JAVA heap grows. But i think that this cache provider doesn't use JAVA heap
Re: row_cache_provider = 'SerializingCacheProvider'
I think that SerializingCacheProvider have more JAVA HEAP footprint, then i think 2012/6/4 ruslan usifov ruslan.usi...@gmail.com: I have setup 5GB of JavaHeap wit follow tuning: MAX_HEAP_SIZE=5G HEAP_NEWSIZE=800M JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=5 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=65 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:CMSFullGCsBeforeCompaction=1 Also I set up 2GB to memtables (memtable_total_space_in_mb: 2048) My avg heap usage (nodetool -h localhost info): 3G Based on nodetool -h localhost cfhistograms i calc avg row size 70KB I setup row cache only for one CF with follow settings: update column family building with rows_cached=1 and row_cache_provider='SerializingCacheProvider'; When i setup row cache i got promotion failure in GC (with stop the world pause about 30secs) with almost HEAP filled. I very confused with this behavior. PS: i use cassandra 1.0.10, with JNA 3.4.0 on ubuntu lucid (kernel 2.6.32-41) 2012/6/4 aaron morton aa...@thelastpickle.com: Yes SerializingCacheProvider is the off heap caching provider. Can you do some more digging into what is using the heap ? Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/06/2012, at 9:52 PM, ruslan usifov wrote: Hello I begin use SerializingCacheProvider for rows cashing, and got extremely JAVA heap grows. But i think that this cache provider doesn't use JAVA heap
row_cache_provider = 'SerializingCacheProvider'
Hello I begin use SerializingCacheProvider for rows cashing, and got extremely JAVA heap grows. But i think that this cache provider doesn't use JAVA heap
Re: Exception when truncate
It's look s very strange but yes. Now i can't reproduce this 2012/5/22 aaron morton aa...@thelastpickle.com: The first part of the name is the current system time in milliseconds. If you run it twice do you get log messages about failing to create the same directory twice ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/05/2012, at 5:09 AM, ruslan usifov wrote: I think as you, but this is not true, there are not any permissions issue. And as i said before, cassandra try to create directory for snapshort that already exists 2012/5/19 Jonathan Ellis jbel...@gmail.com: Sounds like you have a permissions problem. Cassandra creates a subdirectory for each snapshot. On Thu, May 17, 2012 at 4:57 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello I have follow situation on our test server: from cassandra-cli i try to use truncate purchase_history; 3 times i got: [default@township_6waves] truncate purchase_history; null UnavailableException() at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212) at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077) at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052) at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220) at org.apache.cassandra.cli.CliMain.main(CliMain.java:348) So this looks that truncate goes very slow and too long, than rpc_timeout_in_ms: 1 (this can happens because we have very slow disck on test machine) But in in cassandra system log i see follow exception: ERROR [MutationStage:7022] 2012-05-17 12:19:14,356 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:7022,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140) at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409) ... 7 more Also i see that in snapshort dir already exists 1337242754356-purchase_history directory, so i think that snapshort names that generate cassandra not uniquely. PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Exception when truncate
I think as you, but this is not true, there are not any permissions issue. And as i said before, cassandra try to create directory for snapshort that already exists 2012/5/19 Jonathan Ellis jbel...@gmail.com: Sounds like you have a permissions problem. Cassandra creates a subdirectory for each snapshot. On Thu, May 17, 2012 at 4:57 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello I have follow situation on our test server: from cassandra-cli i try to use truncate purchase_history; 3 times i got: [default@township_6waves] truncate purchase_history; null UnavailableException() at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212) at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077) at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052) at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220) at org.apache.cassandra.cli.CliMain.main(CliMain.java:348) So this looks that truncate goes very slow and too long, than rpc_timeout_in_ms: 1 (this can happens because we have very slow disck on test machine) But in in cassandra system log i see follow exception: ERROR [MutationStage:7022] 2012-05-17 12:19:14,356 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:7022,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140) at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409) ... 7 more Also i see that in snapshort dir already exists 1337242754356-purchase_history directory, so i think that snapshort names that generate cassandra not uniquely. PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Exception when truncate
Hello I have follow situation on our test server: from cassandra-cli i try to use truncate purchase_history; 3 times i got: [default@township_6waves] truncate purchase_history; null UnavailableException() at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212) at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077) at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052) at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220) at org.apache.cassandra.cli.CliMain.main(CliMain.java:348) So this looks that truncate goes very slow and too long, than rpc_timeout_in_ms: 1 (this can happens because we have very slow disck on test machine) But in in cassandra system log i see follow exception: ERROR [MutationStage:7022] 2012-05-17 12:19:14,356 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:7022,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140) at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409) ... 7 more Also i see that in snapshort dir already exists 1337242754356-purchase_history directory, so i think that snapshort names that generate cassandra not uniquely. PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS
Re: Exception when truncate
Also i miss understand why on empty CF(no any SStable) truncate heavy loads disk?? 2012/5/17 ruslan usifov ruslan.usi...@gmail.com: Hello I have follow situation on our test server: from cassandra-cli i try to use truncate purchase_history; 3 times i got: [default@township_6waves] truncate purchase_history; null UnavailableException() at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212) at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077) at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052) at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220) at org.apache.cassandra.cli.CliMain.main(CliMain.java:348) So this looks that truncate goes very slow and too long, than rpc_timeout_in_ms: 1 (this can happens because we have very slow disck on test machine) But in in cassandra system log i see follow exception: ERROR [MutationStage:7022] 2012-05-17 12:19:14,356 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:7022,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140) at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409) ... 7 more Also i see that in snapshort dir already exists 1337242754356-purchase_history directory, so i think that snapshort names that generate cassandra not uniquely. PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS
Re: Exception when truncate
Maybe, something changes in cassandra 1.0.x for truncate mechanism, because in cassandra 0.8 truncate executes much faster on the same data 2012/5/17 Viktor Jevdokimov viktor.jevdoki...@adform.com: Truncate flushes all memtables to free up commit logs, and that on all nodes. So this takes time. Discussed on this list not so long ago. Watch for: https://issues.apache.org/jira/browse/CASSANDRA-3651 https://issues.apache.org/jira/browse/CASSANDRA-4006 Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -Original Message- From: ruslan usifov [mailto:ruslan.usi...@gmail.com] Sent: Thursday, May 17, 2012 13:06 To: user@cassandra.apache.org Subject: Re: Exception when truncate Also i miss understand why on empty CF(no any SStable) truncate heavy loads disk?? 2012/5/17 ruslan usifov ruslan.usi...@gmail.com: Hello I have follow situation on our test server: from cassandra-cli i try to use truncate purchase_history; 3 times i got: [default@township_6waves] truncate purchase_history; null UnavailableException() at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.j ava:20212) at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.j ava:1077) at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1 052) at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445 ) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java: 272) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.j ava:220) at org.apache.cassandra.cli.CliMain.main(CliMain.java:348) So this looks that truncate goes very slow and too long, than rpc_timeout_in_ms: 1 (this can happens because we have very slow disck on test machine) But in in cassandra system log i see follow exception: ERROR [MutationStage:7022] 2012-05-17 12:19:14,356 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:7022,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356- pur chase_history at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column F amilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.j ava:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.j ava:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandle r .java:50) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j ava:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu tor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356- pur chase_history at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java: 140) at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java: 131) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column F amilyStore.java:1409) ... 7 more Also i see that in snapshort dir already exists 1337242754356-purchase_history directory, so i think that snapshort names that generate cassandra not uniquely. PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS
Re: Exception when truncate
Its our test machine with one node in cluster:-) 2012/5/17 Jeremy Hanna jeremy.hanna1...@gmail.com: when doing a truncate, it has to talk to all of the nodes in the ring to perform the operation. by the error, it looks like one of the nodes was unreachable for some reason. you might do a nodetool ring in the cli do a 'describe cluster;' and see if your ring is okay. So I think the operation is just as fast, it just looks like it times out (20 seconds or something) when trying to perform the command against all of the nodes in the cluster. On May 17, 2012, at 9:36 AM, ruslan usifov wrote: Maybe, something changes in cassandra 1.0.x for truncate mechanism, because in cassandra 0.8 truncate executes much faster on the same data 2012/5/17 Viktor Jevdokimov viktor.jevdoki...@adform.com: Truncate flushes all memtables to free up commit logs, and that on all nodes. So this takes time. Discussed on this list not so long ago. Watch for: https://issues.apache.org/jira/browse/CASSANDRA-3651 https://issues.apache.org/jira/browse/CASSANDRA-4006 Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -Original Message- From: ruslan usifov [mailto:ruslan.usi...@gmail.com] Sent: Thursday, May 17, 2012 13:06 To: user@cassandra.apache.org Subject: Re: Exception when truncate Also i miss understand why on empty CF(no any SStable) truncate heavy loads disk?? 2012/5/17 ruslan usifov ruslan.usi...@gmail.com: Hello I have follow situation on our test server: from cassandra-cli i try to use truncate purchase_history; 3 times i got: [default@township_6waves] truncate purchase_history; null UnavailableException() at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.j ava:20212) at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.j ava:1077) at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1 052) at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445 ) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java: 272) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.j ava:220) at org.apache.cassandra.cli.CliMain.main(CliMain.java:348) So this looks that truncate goes very slow and too long, than rpc_timeout_in_ms: 1 (this can happens because we have very slow disck on test machine) But in in cassandra system log i see follow exception: ERROR [MutationStage:7022] 2012-05-17 12:19:14,356 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:7022,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356- pur chase_history at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column F amilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.j ava:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.j ava:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandle r .java:50) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j ava:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu tor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356- pur chase_history at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java: 140) at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java: 131) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column F amilyStore.java:1409) ... 7 more Also i see that in snapshort dir already exists 1337242754356-purchase_history directory, so i think that snapshort names that generate cassandra not uniquely. PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS
get dinamicsnith info from php
Hello I want to route request from php client to minimaly loaded node, so i need dinamicsnitch info and gosip, how can i get this info fro php. Perhaps need some daemon that can communicate with cassandra gosip and translate this info to php (socket for example)???
Re: get dinamicsnith info from php
Sorry for my bad english. I want to solve follow problem. For example we down one node for maintenance reason, for a long time (30 min). Now we use TSocketPool for polling connection to cassandra, but this poll implementation is as i think not so good, it have a custom parameter setRetryInterval, with allow off broken node (now we set i to 10sec), but this mean that every 10sec pool will try to connet down node (i repeat we shutdown node for maintance reason), because it doesn't know node dead or node, but cassandra cluster know this, and this connection attempt is senselessly, also when node make compact it can be heavy loaded, and can't serve client reqest very good (at this moment we can got little increase of avg backend responce time) 2012/5/14 Viktor Jevdokimov viktor.jevdoki...@adform.com I’m not sure, that selecting node upon DS is a good idea. First of all every node has values about every node, including self. Self DS values are always better than others. ** ** For example, 3 nodes RF=2: ** ** N1 N2 N3 N1 0.5ms 2ms 2ms N2 2ms 0.5ms 2ms N3 2ms 2ms 0.5ms ** ** We have monitored many Cassandra counters, including DS values for every node, and graphs shows that latencies is not about load. ** ** So the strategy should be based on use case, node count, RF, replica placement strategy, read repair chance, and more, and more… ** ** What do you want to achieve? ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* ruslan usifov [mailto:ruslan.usi...@gmail.com] *Sent:* Monday, May 14, 2012 16:58 *To:* user@cassandra.apache.org *Subject:* get dinamicsnith info from php ** ** Hello I want to route request from php client to minimaly loaded node, so i need dinamicsnitch info and gosip, how can i get this info fro php. Perhaps need some daemon that can communicate with cassandra gosip and translate this info to php (socket for example)??? signature-logo29.png
Re: Thrift error occurred during processing of message
Looks like you used TBUfferedTransport, but sinve 1.0.x cassandra support only framed 2011/12/19 Tamil selvan R.S tamil.3...@gmail.com Hi, We are using PHPCassa to connect to Cassandra 1.0.2. After we installed the thrift extension we started noticing the following in the error logs. [We didn't notice this when we were running raw thrift library with out extension]. ERROR [pool-2-thread-5314] 2011-12-05 20:26:47,729 CustomTThreadPoolServer.java (line 201) Thrift error occurred during processing of message. org.apache.thrift.protocol. TProtocolException: Missing version in readMessageBegin, old client? at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Is there any issue with the thrift protocol compatibilty? Regards, Tamil
Map reduce without hdfs
Hello to all! It it possible to launch only hadoop mapreduce task tracker and job tracker against cassandra cluster, and doesn't launch HDFS (use for shared storage something else)?? Thanks
Re: swap grows
Thanks for link. But for me still present question about free memory. In out cluster we have 200 IOPS in peaks, but still have about 3GB of free memory on each server (cluster have 6 nodes tho there are 3*6=18 GB of unused memry). I think that OS must fill all memory with pagecache (we do backups throw DirectIO) of SStables, but it doesn't do that and i doesn't understand why. I can't find any sysctl that can tune pagecache thresholds or ratio. Any suggestion 2012/4/18 Jonathan Ellis jbel...@gmail.com what-is-the-linux-kernel-parameter-vm-swappinesshttp://www.linuxvox.com/2009/10/what-is-the-linux-kernel-parameter-vm-swappiness
Re: swap grows
Not i don't sure about this:-)) I don't have very good knowns about Linux VM managment. And for me looks very stange that swap grows, but not any swap activity (i monitor it throw follow vmstat -s | grep 'pages swapped out' | awk '{ print $1 }' and vmstat -s | grep 'pages swapped in' | awk '{ print $1 }'). So looks like that you right, and i has filled up my knowledge:-)) 2012/4/15 Віталій Тимчишин tiv...@gmail.com BTW: Are you sure system doing wrong? System may save some pages to swap not removing them from RAM simply to have possibility to remove them later fast if needed. 2012/4/14 ruslan usifov ruslan.usi...@gmail.com Hello We have 6 node cluster (cassandra 0.8.10). On one node i increase java heap size to 6GB, and now at this node begin grows swap, but system have about 3GB of free memory: root@6wd003:~# free total used free sharedbuffers cached Mem: 24733664 217028123030852 0 6792 13794724 -/+ buffers/cache:7901296 16832368 Swap: 1998840 23521996488 And swap space slowly grows, but i misunderstand why? PS: We have JNA mlock, and set vm.swappiness = 0 PS: OS ubuntu 10.0.4(2.6.32-40-generic) -- Best regards, Vitalii Tymchyshyn
Re: swap grows
I know:-) but this is not answer:-(. I found that on other nodes there still about 3GB (on node with JAVA_HEAP=6GB free memory also 3GB) of free memory but there JAVA_HEAP=5G, so this looks like some sysctl (/proc/sys/vm???) ratio (about 10%(3 / 24 * 100)), i don't known which, anybody can explain this situation 2012/4/14 R. Verlangen ro...@us2.nl Its recommended to disable swap entirely when you run Cassandra on a server. 2012/4/14 ruslan usifov ruslan.usi...@gmail.com I forgot to say that system have 24GB of phis memory 2012/4/14 ruslan usifov ruslan.usi...@gmail.com Hello We have 6 node cluster (cassandra 0.8.10). On one node i increase java heap size to 6GB, and now at this node begin grows swap, but system have about 3GB of free memory: root@6wd003:~# free total used free sharedbuffers cached Mem: 24733664 217028123030852 0 6792 13794724 -/+ buffers/cache:7901296 16832368 Swap: 1998840 23521996488 And swap space slowly grows, but i misunderstand why? PS: We have JNA mlock, and set vm.swappiness = 0 PS: OS ubuntu 10.0.4(2.6.32-40-generic) -- With kind regards, Robin Verlangen www.robinverlangen.nl
swap grows
Hello We have 6 node cluster (cassandra 0.8.10). On one node i increase java heap size to 6GB, and now at this node begin grows swap, but system have about 3GB of free memory: root@6wd003:~# free total used free sharedbuffers cached Mem: 24733664 217028123030852 0 6792 13794724 -/+ buffers/cache:7901296 16832368 Swap: 1998840 23521996488 And swap space slowly grows, but i misunderstand why? PS: We have JNA mlock, and set vm.swappiness = 0 PS: OS ubuntu 10.0.4(2.6.32-40-generic)
Re: swap grows
I forgot to say that system have 24GB of phis memory 2012/4/14 ruslan usifov ruslan.usi...@gmail.com Hello We have 6 node cluster (cassandra 0.8.10). On one node i increase java heap size to 6GB, and now at this node begin grows swap, but system have about 3GB of free memory: root@6wd003:~# free total used free sharedbuffers cached Mem: 24733664 217028123030852 0 6792 13794724 -/+ buffers/cache:7901296 16832368 Swap: 1998840 23521996488 And swap space slowly grows, but i misunderstand why? PS: We have JNA mlock, and set vm.swappiness = 0 PS: OS ubuntu 10.0.4(2.6.32-40-generic)
need of regular nodetool repair
Hello I have follow question, if we Read and write to cassandra claster with QUORUM consistency level, does this allow to us do not call nodetool repair regular? (i.e. every GCGraceSeconds)
Re: need of regular nodetool repair
Sorry fo my bad english, so QUORUM allow doesn't make repair regularity? But form your anser it does not follow 2012/4/11 R. Verlangen ro...@us2.nl Yes, I personally have configured it to perform a repair once a week, as the GCGraceSeconds is at 10 days. This is also what's in the manual http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data (point 2) 2012/4/11 ruslan usifov ruslan.usi...@gmail.com Hello I have follow question, if we Read and write to cassandra claster with QUORUM consistency level, does this allow to us do not call nodetool repair regular? (i.e. every GCGraceSeconds) -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: need of regular nodetool repair
HH - this is hinted handoff? 2012/4/11 Igor i...@4friends.od.ua On 04/11/2012 11:49 AM, R. Verlangen wrote: Not everything, just HH :) I hope this works for me for the next reasons: I have quite large RF (6 datacenters, each carry one replica of all dataset), read and write at CL ONE, relatively small TTL - 10 days, I have no deletes, servers almost never go down for hour. So I expect that even if I loose some HH then some other replica will reply with data. Is it correct? Hope this works for me, but can not work for others. Well, if everything works 100% at any time there should be nothing to repair, however with a distributed cluster it would be pretty rare for that to occur. At least that is how I interpret this. 2012/4/11 Igor i...@4friends.od.ua BTW, I heard that we don't need to run repair if all your data have TTL, all HH works, and you never delete your data. On 04/11/2012 11:34 AM, ruslan usifov wrote: Sorry fo my bad english, so QUORUM allow doesn't make repair regularity? But form your anser it does not follow 2012/4/11 R. Verlangen ro...@us2.nl Yes, I personally have configured it to perform a repair once a week, as the GCGraceSeconds is at 10 days. This is also what's in the manual http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data (point 2) 2012/4/11 ruslan usifov ruslan.usi...@gmail.com Hello I have follow question, if we Read and write to cassandra claster with QUORUM consistency level, does this allow to us do not call nodetool repair regular? (i.e. every GCGraceSeconds) -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Resident size growth
mmap doesn't depend on jna 2012/4/9 Jeremiah Jordan jeremiah.jor...@morningstar.com He says he disabled JNA. You can't mmap without JNA can you? On Apr 9, 2012, at 4:52 AM, aaron morton wrote: see http://wiki.apache.org/cassandra/FAQ#mmap Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/04/2012, at 5:09 AM, ruslan usifov wrote: mmap sstables? It's normal 2012/4/5 Omid Aladini omidalad...@gmail.com Hi, I'm experiencing a steady growth in resident size of JVM running Cassandra 1.0.7. I disabled JNA and off-heap row cache, tested with and without mlockall disabling paging, and upgraded to JRE 1.6.0_31 to prevent this bug [1] to leak memory. Still JVM's resident set size grows steadily. A process with Xmx=2048M has grown to 6GB resident size and one with Xmx=8192M to 16GB in a few hours and increasing. Has anyone experienced this? Any idea how to deal with this issue? Thanks, Omid [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129
Re: Resident size growth
also i suggest to setup disk_access_mode: mmap_index_only 2012/4/9 Omid Aladini omidalad...@gmail.com Thanks. Yes it's due to mmappd SSTables pages that count as resident size. Jeremiah: mmap isn't through JNA, it's via java.nio.MappedByteBuffer I think. -- Omid On Mon, Apr 9, 2012 at 4:15 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: He says he disabled JNA. You can't mmap without JNA can you? On Apr 9, 2012, at 4:52 AM, aaron morton wrote: see http://wiki.apache.org/cassandra/FAQ#mmap Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/04/2012, at 5:09 AM, ruslan usifov wrote: mmap sstables? It's normal 2012/4/5 Omid Aladini omidalad...@gmail.com Hi, I'm experiencing a steady growth in resident size of JVM running Cassandra 1.0.7. I disabled JNA and off-heap row cache, tested with and without mlockall disabling paging, and upgraded to JRE 1.6.0_31 to prevent this bug [1] to leak memory. Still JVM's resident set size grows steadily. A process with Xmx=2048M has grown to 6GB resident size and one with Xmx=8192M to 16GB in a few hours and increasing. Has anyone experienced this? Any idea how to deal with this issue? Thanks, Omid [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129
Re: Resident size growth
mmap sstables? It's normal 2012/4/5 Omid Aladini omidalad...@gmail.com Hi, I'm experiencing a steady growth in resident size of JVM running Cassandra 1.0.7. I disabled JNA and off-heap row cache, tested with and without mlockall disabling paging, and upgraded to JRE 1.6.0_31 to prevent this bug [1] to leak memory. Still JVM's resident set size grows steadily. A process with Xmx=2048M has grown to 6GB resident size and one with Xmx=8192M to 16GB in a few hours and increasing. Has anyone experienced this? Any idea how to deal with this issue? Thanks, Omid [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129
upgrade from cassandra 0.8 to 1.0
Hello It's looks that cassandra 1.0.x is stable, and have interesting things like offheap memtables and row cashes, so we want to upgrade to 1.0 version from 0.8. Is it possible to do without cluster downtime (while we upgrade all nodes)? I mean follow: when we begin upgrade at some point in working cluster will be mix of 0.8 (nodes that are not upgraded yet) and 1.0(nodes that already upgraded) so i am concerned about this situation, i.e. communications between nodes can be broken because version communication protocol incompatibilies
Re: repair broke TTL based expiration
Do you make major compaction?? 2012/3/19 Radim Kolar h...@filez.com: I suspect that running cluster wide repair interferes with TTL based expiration. I am running repair every 7 days and using TTL expiration time 7 days too. Data are never deleted. Stored data in cassandra are always growing (watching them for 3 months) but they should not. If i run manual cleanup, some data are deleted but just about 5%. Currently there are about 3-5 times more rows then i estimate. I suspect that running repair on data with TTL can cause: 1. time check for expired records is ignored and these data are streamed to other node and they will be alive again or 2. streaming data are propagated with full TTL. Lets say that i have ttl 7 days, data are stored for 5 days and then repaired, they should be sent to other node with ttl 2 days not 7. Can someone do testing on this case? I could not play with production cluster too much.
Re: repair broke TTL based expiration
cleanup in you case doesn't have any seens. You write that repair work for you, so you can stop cassandra daemon, delete all data from folder that contain problem data, start cassandra daemon, and run nodetool repair, but in this case ypu must have replication factor for keyspace 3 and have consistency level for data manipulation QUORUM 2012/3/20 Radim Kolar h...@filez.com: Dne 19.3.2012 23:33, ruslan usifov napsal(a): Do you make major compaction?? no, i do cleanups only. Major compactions kills my node with OOM.
Re: slow read
2012/3/5 Jeesoo Shin bsh...@gmail.com Hi all. I have very SLOW READ here. :-( I made a cluster with three node (aws xlarge, replication = 3) Cassandra version is 1.0.6 I have inserted 1,000,000 rows. (standard column) Each row has 200 columns. Each column has 16 byte key, 512 byte value. I used Hector createSliceQuery to get one column in a row. This basic query(random row, fixed column) is created with multiple thread and hit cassandra. I only get up to 140 request per second. Is this all I can get for read? Or am I doing something wrong? Interestingly, when I request rows which doesn't exist, it goes up to 1600 per second. You must test read performance by paralel test (ie multiple threads). The result when not existent rows are more faster is result of bloom filter ANY insight, share will be extremely helpful. Thank you. Regards, Jeesoo.
Re: slow read
And sum of all rq/s threads is 160?? 2012/3/5 Jeesoo Shin bsh...@gmail.com Thank you for reply. :) Yes I did multiple thread. 160, 320 gave me same result. On 3/5/12, ruslan usifov ruslan.usi...@gmail.com wrote: 2012/3/5 Jeesoo Shin bsh...@gmail.com Hi all. I have very SLOW READ here. :-( I made a cluster with three node (aws xlarge, replication = 3) Cassandra version is 1.0.6 I have inserted 1,000,000 rows. (standard column) Each row has 200 columns. Each column has 16 byte key, 512 byte value. I used Hector createSliceQuery to get one column in a row. This basic query(random row, fixed column) is created with multiple thread and hit cassandra. I only get up to 140 request per second. Is this all I can get for read? Or am I doing something wrong? Interestingly, when I request rows which doesn't exist, it goes up to 1600 per second. You must test read performance by paralel test (ie multiple threads). The result when not existent rows are more faster is result of bloom filter ANY insight, share will be extremely helpful. Thank you. Regards, Jeesoo.
Wrong version in debian repository
Hello I think that in http://www.apache.org/dist/cassandra/debian repo there is incorret version for 0.8 branch. There is 0.8.8, but latest version is 0.8.9. May be this repository is abandoned?
Re: Disable Nagle algoritm in thrift i.e. TCP_NODELAY
2012/1/26 Jeffrey Kesselman jef...@gmail.com Most operating systems have a way to do this at the OS level. Could you please provide this way for linux?, for particular application? Maybe some sysctl? On Thu, Jan 26, 2012 at 8:17 AM, ruslan usifov ruslan.usi...@gmail.comwrote: Hello Is it possible set TCP_NODELAY on thrift socket in cassandra? -- It's always darkest just before you are eaten by a grue.
Re: Disable Nagle algoritm in thrift i.e. TCP_NODELAY
Sorry but you misunderstand me, is ask is cassandra have any option to control TCP_NODELAY behaviour, so we doesn't need patch cassandra or thrift code. I found this article https://wiki.cs.columbia.edu:8443/pages/viewpage.action?pageId=12585536, where упоминается mentioned coreTransport.TcpClient.NoDelay, but what is this i misunderstand 2012/1/26 Jeffrey Kesselman jef...@gmail.com To set or get a TCP socket option, call *getsockopthttp://linux.about.com/library/cmd/blcmdl2_getsockopt.htm *(2) to read or *setsockopthttp://linux.about.com/library/cmd/blcmdl2_setsockopt.htm *(2) to write the option with the option level argument set to *SOL_TCP.* In addition, most *SOL_IP *socket options are valid on TCP sockets. For more information see *ip http://linux.about.com/library/cmd/blcmdl7_ip.htm* (7). ... *TCP_NODELAY* If set, disable the Nagle algorithm. This means that segments are always sent as soon as possible, even if there is only a small amount of data. When not set, data is buffered until there is a sufficient amount to send out, thereby avoiding the frequent sending of small packets, which results in poor utilization of the network. This option cannot be used at the same time as the option *TCP_CORK.* *http://bit.ly/zpvLbP* * * On Thu, Jan 26, 2012 at 12:10 PM, ruslan usifov ruslan.usi...@gmail.comwrote: 2012/1/26 Jeffrey Kesselman jef...@gmail.com Most operating systems have a way to do this at the OS level. Could you please provide this way for linux?, for particular application? Maybe some sysctl? On Thu, Jan 26, 2012 at 8:17 AM, ruslan usifov ruslan.usi...@gmail.comwrote: Hello Is it possible set TCP_NODELAY on thrift socket in cassandra? -- It's always darkest just before you are eaten by a grue. -- It's always darkest just before you are eaten by a grue.
Re: Disable Nagle algoritm in thrift i.e. TCP_NODELAY
27 января 2012 г. 1:19 пользователь aaron morton aa...@thelastpickle.comнаписал: Outgoing TCP connections between nodes have TCP_NODELAY on, so do server side THRIFT sockets. Thanks, for exhaustive answer I would assume your client will be setting it as well. No php client doesn have TCP_NODELAY, because php stream sockets doesn't allow set sock options - ie no such API Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/01/2012, at 6:54 AM, sridhar basam wrote: There is no global setting in linux to turn off nagle. Sridhar 2012/1/26 Jeffrey Kesselman jef...@gmail.com: You know... here aught to be a command line command to set it. There is in Solaris and Windows. But Im having trouble finding it for Linux. 2012/1/26 ruslan usifov ruslan.usi...@gmail.com Sorry but you misunderstand me, is ask is cassandra have any option to control TCP_NODELAY behaviour, so we doesn't need patch cassandra or thrift code. I found this article https://wiki.cs.columbia.edu:8443/pages/viewpage.action?pageId=12585536, where упоминается mentioned coreTransport.TcpClient.NoDelay, but what is this i misunderstand 2012/1/26 Jeffrey Kesselman jef...@gmail.com To set or get a TCP socket option, call getsockopt(2) to read or setsockopt(2) to write the option with the option level argument set to SOL_TCP. In addition, most SOL_IP socket options are valid on TCP sockets. For more information see ip(7). ... TCP_NODELAY If set, disable the Nagle algorithm. This means that segments are always sent as soon as possible, even if there is only a small amount of data. When not set, data is buffered until there is a sufficient amount to send out, thereby avoiding the frequent sending of small packets, which results in poor utilization of the network. This option cannot be used at the same time as the option TCP_CORK. http://bit.ly/zpvLbP On Thu, Jan 26, 2012 at 12:10 PM, ruslan usifov ruslan.usi...@gmail.com wrote: 2012/1/26 Jeffrey Kesselman jef...@gmail.com Most operating systems have a way to do this at the OS level. Could you please provide this way for linux?, for particular application? Maybe some sysctl? On Thu, Jan 26, 2012 at 8:17 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello Is it possible set TCP_NODELAY on thrift socket in cassandra? -- It's always darkest just before you are eaten by a grue. -- It's always darkest just before you are eaten by a grue. -- It's always darkest just before you are eaten by a grue.
Re: Disable Nagle algoritm in thrift i.e. TCP_NODELAY
27 января 2012 г. 2:44 пользователь sridhar basam s...@basam.org написал: Which socket API? http://www.php.net/manual/en/function.socket-set-option.php Is possible to do the appropriate setsockopt call to disable NAGLE. No you are wrong php thrift implementation doesn't use sock extension it uses php streams(http://ru.php.net/manual/en/book.stream.php) aka fsockopen stream_socket_recvfrom etc, but php sreams doesn't allow set any sock options:-(. Sridhar 2012/1/26 ruslan usifov ruslan.usi...@gmail.com: 27 января 2012 г. 1:19 пользователь aaron morton aa...@thelastpickle.com написал: Outgoing TCP connections between nodes have TCP_NODELAY on, so do server side THRIFT sockets. Thanks, for exhaustive answer I would assume your client will be setting it as well. No php client doesn have TCP_NODELAY, because php stream sockets doesn't allow set sock options - ie no such API Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/01/2012, at 6:54 AM, sridhar basam wrote: There is no global setting in linux to turn off nagle. Sridhar 2012/1/26 Jeffrey Kesselman jef...@gmail.com: You know... here aught to be a command line command to set it. There is in Solaris and Windows. But Im having trouble finding it for Linux. 2012/1/26 ruslan usifov ruslan.usi...@gmail.com Sorry but you misunderstand me, is ask is cassandra have any option to control TCP_NODELAY behaviour, so we doesn't need patch cassandra or thrift code. I found this article https://wiki.cs.columbia.edu:8443/pages/viewpage.action?pageId=12585536 , where упоминается mentioned coreTransport.TcpClient.NoDelay, but what is this i misunderstand 2012/1/26 Jeffrey Kesselman jef...@gmail.com To set or get a TCP socket option, call getsockopt(2) to read or setsockopt(2) to write the option with the option level argument set to SOL_TCP. In addition, most SOL_IP socket options are valid on TCP sockets. For more information see ip(7). ... TCP_NODELAY If set, disable the Nagle algorithm. This means that segments are always sent as soon as possible, even if there is only a small amount of data. When not set, data is buffered until there is a sufficient amount to send out, thereby avoiding the frequent sending of small packets, which results in poor utilization of the network. This option cannot be used at the same time as the option TCP_CORK. http://bit.ly/zpvLbP On Thu, Jan 26, 2012 at 12:10 PM, ruslan usifov ruslan.usi...@gmail.com wrote: 2012/1/26 Jeffrey Kesselman jef...@gmail.com Most operating systems have a way to do this at the OS level. Could you please provide this way for linux?, for particular application? Maybe some sysctl? On Thu, Jan 26, 2012 at 8:17 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello Is it possible set TCP_NODELAY on thrift socket in cassandra? -- It's always darkest just before you are eaten by a grue. -- It's always darkest just before you are eaten by a grue. -- It's always darkest just before you are eaten by a grue.
Enable thrift logging
Hello I try to log thrift log message (this need to us for solve communicate problem between Cassandra daemon and php client ), so in log4j-server.properties i write follow lines: log4j.logger.org.apache.thrift.transport=DEBUG,THRIFT log4j.appender.THRIFT=org.apache.log4j.RollingFileAppender log4j.appender.THRIFT.maxFileSize=20MB log4j.appender.THRIFT.maxBackupIndex=50 log4j.appender.THRIFT.layout=org.apache.log4j.PatternLayout log4j.appender.THRIFT.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n log4j.appender.THRIFT.File=/var/log/cassandra/8.0/thrift.log But no any messages in log in this case(but thay must be, i.e. Exception trace), if we enable DEBUG in rootLogger ie: log4j.rootLogger=DEBUG,stdout,R Thrift log messages appear in sytem.log as expected, but how can we separate them to separate log? PS: cassandra 0.8.9
Re: Enable thrift logging
2012/1/25 aaron morton aa...@thelastpickle.com Do you want to log from inside the thrift code or from the cassandra thrift classes ? Exceptions happens inside thrift, so inside thrift:-))) if it's the later try log4j.logger.org.apache.thrift=DEBUG,THRIFT org.apache.thrift.transport is part of thrift proper. I try this but without any result Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/01/2012, at 11:36 AM, R. Verlangen wrote: Pick a custom loglevel and redirect them with the /etc/syslog.conf ? 2012/1/24 ruslan usifov ruslan.usi...@gmail.com Hello I try to log thrift log message (this need to us for solve communicate problem between Cassandra daemon and php client ), so in log4j-server.properties i write follow lines: log4j.logger.org.apache.thrift.transport=DEBUG,THRIFT log4j.appender.THRIFT=org.apache.log4j.RollingFileAppender log4j.appender.THRIFT.maxFileSize=20MB log4j.appender.THRIFT.maxBackupIndex=50 log4j.appender.THRIFT.layout=org.apache.log4j.PatternLayout log4j.appender.THRIFT.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n log4j.appender.THRIFT.File=/var/log/cassandra/8.0/thrift.log But no any messages in log in this case(but thay must be, i.e. Exception trace), if we enable DEBUG in rootLogger ie: log4j.rootLogger=DEBUG,stdout,R Thrift log messages appear in sytem.log as expected, but how can we separate them to separate log? PS: cassandra 0.8.9
Re: performance reaching plateau while the hardware is still idle
Use parallel test:-))) 2011/12/15 Kent Tong freemant2...@yahoo.com Hi, I am running a performance test for Cassandra 1.0.5. It can perform about 1500 business operation (one read+one write to the same row) per second. However, the CPU is still 85% idle (as shown by vmstat) and the IO utilization is less than a few percent (as shown by iostat). nodetool tpstats shows basically no active and pending threads. I can run several such test clients concurrently, achieving the same operations per second without increasing the hardware utilization. So, why the performance has reached a plateau while there is still idle hardware resources? Thanks in advance for any idea!
Prevent create snapshot when truncate
Hello Every time when we do truncate, cassandra automatically create snapshots. How can we prevent this?
Re: Does anybody know why Twitter stop integrate Cassandra as Twitter store?
Big thanks for all your replies
Does anybody know why Twitter stop integrate Cassandra as Twitter store?
http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html As said in this post Twiter stop working on using Cassandra as a store for Tweets, but there nothing said why they made this decision? Does anybody have mo information
Re: Does anybody know why Twitter stop integrate Cassandra as Twitter store?
Hello 2011/10/4 Paul Loy ketera...@gmail.com Did you read the article you posted? Yes *We believe that this isn't the time to make large scale migration to a new technology*. We will focus our Cassandra work on new projects that we wouldn't be able to ship without a large-scale data store. There was big boom in network about, that Tweeter will migrate they tweets to cassandra, but than they reject this plans. This explanation sounds very vague. Why they have changed the mind? I find only one article about this: http://highscalability.com/blog/2010/7/11/so-why-is-twitter-really-not-using-cassandra-to-store-tweets.html
Re: Problems using Thrift API in C
Do you have any error messages in cassandra log? 2011/7/28 Aleksandrs Saveljevs aleksandrs.savelj...@zabbix.com Dear all, We are considering using Cassandra for storing gathered data in Zabbix (see https://support.zabbix.com/**browse/ZBXNEXT-844https://support.zabbix.com/browse/ZBXNEXT-844for more details). Because Zabbix is written in C, we are considering using Thrift API in C, too. However, we are running into problems trying to get even the basic code work. Consider the attached source code. This is essentially a rewrite of the first part of the C++ example given at http://wiki.apache.org/** cassandra/ThriftExamples#C.2B-**.2B-http://wiki.apache.org/cassandra/ThriftExamples#C.2B-.2B-. If we run it under strace, we see that it hangs on the call to recv() when setting keyspace: $ strace -s 64 ./test ... socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(9160), sin_addr=inet_addr(127.0.0.1**)}, 16) = 0 send(3, \0\0\0/\200\1\0\1\0\0\0\fset_**keyspace\0\0\0\0\v\0\1\0\0\0\**vmy_keyspace\0, 47, 0) = 47 recv(3, ^C unfinished ... If we run the C++ example, it passes this step successfully. Does anybody know where the problem is? We are using Thrift 0.6.1 and Cassandra 0.8.1. Also, what is the current state of Thrift API in C? Can it be considered stable? Has anybody used it successfully? Any examples? Thanks, Aleksandrs
Re: What will be the steps for adding new nodes
2011/4/16 Roni r...@similarweb.com: I have a 0.6.4 Cassandra cluster of two nodes in full replica (replica factor 2). I wants to add two more nodes and balance the cluster (replica factor 2). I want all of them to be seed's. What should be the simple steps: 1. add the AutoBootstraptrue/AutoBootstrap to all the nodes or only the new ones? You must add this option only on new nodes 2. add the Seed[new_node]/Seed to the config file of the old nodes before adding the new ones? If you do that bootstrap will no be working. And this is not needed step. I think that enough only few seed nodes for fault tolerance 3. do the old node need to be restarted (if no change is needed in their config file)? No that not needed
Re: Cassandra constantly nodes which doens allredy exists
2011/4/12 aaron morton aa...@thelastpickle.com In JConsole go to o.a.c.db.HintedHandoffManager and try the deleteHintsForEndpopints operation. This is also called as when a token is removed from the ring, or when a node is decomissioned. What process did you use to reconfigure the cluster? I decommission node, then step by step restart all nodes in clustrer. When I repeat restart operation twice this LOG entry disappear
Cassandra constantly nodes which doens allredy exists
Hello I use cassandra 0.7.4. After reconfiguring cluster on one node i constantly see folow log: INFO [GossipStage:1] 2011-04-11 17:14:13,514 StorageService.java (line 865) Removing token 56713727820156410577229101238628035242 for /10.32.59.202 INFO [ScheduledTasks:1] 2011-04-11 17:14:13,514 HintedHandOffManager.java (line 210) Deleting any stored hints for 10.32.59.202 But node 10.32.59.202 doesn't exists alredy. How to prevent this?
Re: Flush / Snapshot Triggering Full GCs, Leaving Ring
2011/4/7 Jonathan Ellis jbel...@gmail.com Hypothesis: it's probably the flush causing the CMS, not the snapshot linking. Confirmation possibility #1: Add a logger.warn to CLibrary.createHardLinkWithExec -- with JNA enabled it shouldn't be called, but let's rule it out. Confirmation possibility #2: Force some flushes w/o snapshot. Either way: concurrent mode failure is the easy GC problem. Hopefully you really are seeing mostly that -- this means the JVM didn't start CMS early enough, so it ran out of space before it could finish the concurrent collection, so it falls back to stop-the-world. The fix is a combination of reducing XX:CMSInitiatingOccupancyFraction and (possibly) increasing heap capacity if your heap is simply too full too much of the time. You can also mitigate it by increasing the phi threshold for the failure detector, so the node doing the GC doesn't mark everyone else as dead. (Eventually your heap will fragment and you will see STW collections due to promotion failed, but you should see that much less frequently. GC tuning to reduce fragmentation may be possible based on your workload, but that's out of scope here and in any case the real fix for that is https://issues.apache.org/jira/browse/CASSANDRA-2252.) Jonatan do you have plans to backport this to 0.7 branch. (Because It's very hard to tune CMS, and if people is novice in java this task becomes much harder )
Re: ParNew (promotion failed)
Also after all this messages in stdout.log i see follow: [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor3] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor2] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1] [Unloading class sun.reflect.GeneratedConstructorAccessor3] As write here: http://anshuiitk.blogspot.com/2010/11/excessive-full-garbage-collection.html. This is may Perm size problems, but line Perm : 20073K-19913K(33420K), doesn't say about this?
Re: who to contact?
This bug was fixed in thrift php trunk 2011/3/30 William Oberman ober...@civicscience.com I think I found a bug in the cassandra PHP client. I'm using phpcassa, but the bug is in thrift itself, which I think that library phpcassa just wraps. In any case, I was trying to test on my local machine, which has limited RAM, so I reduced the JVM heap size. Of course I immediately had an OOM causing my local cassandra server to crash, but that caused my unit tests to stall at 100% CPU, which seemed weird to me. I had to figure out why. It seems that TSocket doesn't test for EOF (it's only checking for a socket timeout), causing a tight infinite loop when the connection disappears. Checking for EOF in an else if seems like an easy fix, but given how deep this code is in the library I'll leave it to the experts. My diff of the file: @@ -255,6 +255,9 @@ if (true === $md['timed_out'] false === $md['blocked']) { throw new TTransportException('TSocket: timed out reading '.$len.' bytes from '. $this-host_.':'.$this-port_); +} else if(feof($this-handle_)) { + throw new TTransportException('TSocket: EOF reading '.$len.' bytes from '. + $this-host_.':'.$this-port_); } else { $pre .= $buf; $len -= $sz; -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com
Re: ParNew (promotion failed)
2011/3/23 ruslan usifov ruslan.usi...@gmail.com Hello Sometimes i seen in gc log follow message: 2011-03-23T14:40:56.049+0300: 14897.104: [GC 14897.104: [ParNew (promotion failed) Desired survivor size 41943040 bytes, new threshold 2 (max 2) - age 1:5573024 bytes,5573024 total - age 2:5064608 bytes, 10637632 total : 672577K-670749K(737280K), 0.1837950 secs]14897.288: [CMS: 1602487K-779310K(2326528K), 4.7525580 secs] 2270940K-779310K(3063808K), [ CMS Perm : 20073K-19913K(33420K)], 4.9365810 secs] [Times: user=5.06 sys=0.00, real=4.93 secs] Total time for which application threads were stopped: 4.9378750 seconds After investigations i detect that this happens when Memtableflash and compact happens. So at this moment young part of heap is overflown and Full GC happens. So to resolve this i must tune young generation (HEAP_NEWSIZE) -Xmn, and tune in_memory_compaction_limit_in_mb config parameter? Also if memtables flushes due memtable_flush_after if i separate in time memtable flushes can this helps?
Re: Add node to balanced cluster?
2011/3/25 Eric Gilmore e...@datastax.com Also: http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity Can do that about i represent, but i afraid that when i begin balance cluster with new node this will be a big stress for it. Mey be exists some strategies how to do that?
Re: Add node to balanced cluster?
2011/3/25 Eric Gilmore e...@datastax.com Ruslan, I'm not sure exactly what risks you are referring to -- can you be more specific? Do the CPU-intensive operations one at a time, including doing the cleanup when it will not interfere with other operations, and I think you should be fine, from my understanding. I afraid about disk IO. I think that move and cleanup opperations consume many IO, so when they run - throughput of the cluster can degrade seriously? And this nut problem if this operations took 5-10 minutes, but they cant run hours (1,5 - 2) - and this is on one node, so fully rebalance can took days with the seriouse problems in throuput at that period. Or i exaggerate?
Re: debian/ubuntu mirror down?
Cassandra issue tracker have ticket for this (and in this list link on this ticket was posted, but i forgot where) 2011/3/25 Shashank Tiwari tsha...@gmail.com The Ubuntu Software Update seems to complain -- Failed to fetch http://www.apache.org/dist/cassandra/debian/dists/unstable/main/binary-amd64/Packages.gz 403 Forbidden [IP: 140.211.11.131 80] Failed to fetch http://www.apache.org/dist/cassandra/debian/dists/unstable/main/source/Sources.gz 403 Forbidden [IP: 140.211.11.131 80] Has something changed or is the mirror down? Thanks, Shashank
Re: ParNew (promotion failed)
2011/3/24 Erik Onnen eon...@gmail.com It's been about 7 months now but at the time G1 would regularly segfault for me under load on Linux x64. I'd advise extra precautions in testing and make sure you test with representative load. Which java version do you use?
Re: error connecting to cassandra 0.7.3
and where is transport creation for your thrift interface? Cassandra 0.7 uses Framed transport as default 2011/3/24 Anurag Gujral anurag.guj...@gmail.com I am using the following code to create my client. tr = new TSocket(url, port); TProtocol proto = new TBinaryProtocol(tr); client = new Cassandra.Client(proto); client.set_keyspace(this.keyspace); I am getting the errors I mentioned below Thanks Anurag -- Forwarded message -- From: Anurag Gujral anurag.guj...@gmail.com Date: Thu, Mar 24, 2011 at 1:26 AM Subject: error connecting to cassandra 0.7.3 To: user@cassandra.apache.org I am using cassandra-0.7.3 and thrift-0.0.5,I wrote a java client using thrift 0.0.5 when I try to connect to local cassandra server I get the following error ERROR com.bluekai.cassandra.validation.ValidationThread - Failed to connect to 127.0.0.1. org.apache.thrift.transport.TTransportException: Cannot write to null outputStream I am able to connect to the local cassandra server using cassandra-cli though Any suggestions Thanks Anurag
Why disc access mode conf parameter deleted from yaml in cassandra 0.7 brunch,
mmap which will be set default on 64 bit platforms works badly (i don't know reason why this happens but this is happens on 4 machines in my case so i don't think that it is hardware problems)
Add node to balanced cluster?
Hello Which strategy should i use to and new node to fully balanced cluster (nodes tokens are generated by python script: def tokens(nodes): for x in xrange(nodes): print 2 ** 127 / nodes * x tokens(3); ) How to get balanced cluster after adding ne node without big stress for current cluster?
Re: change node IP address
2011/3/23 aaron morton aa...@thelastpickle.com Which version are you using ? It looks like using 0.7X (and prob 0.6) versions you can just shutdown the node and bring it back up with the new IP and It Just Works https://issues.apache.org/jira/browse/CASSANDRA-872 So to replace one machine on another, there is enough simply copy cassandra data directory to the machine, set on that machine token from previous, and thats all?
ParNew (promotion failed)
Hello Sometimes i seen in gc log follow message: 2011-03-23T14:40:56.049+0300: 14897.104: [GC 14897.104: [ParNew (promotion failed) Desired survivor size 41943040 bytes, new threshold 2 (max 2) - age 1:5573024 bytes,5573024 total - age 2:5064608 bytes, 10637632 total : 672577K-670749K(737280K), 0.1837950 secs]14897.288: [CMS: 1602487K-779310K(2326528K), 4.7525580 secs] 2270940K-779310K(3063808K), [ CMS Perm : 20073K-19913K(33420K)], 4.9365810 secs] [Times: user=5.06 sys=0.00, real=4.93 secs] Total time for which application threads were stopped: 4.9378750 seconds How can i minimize they frequency, or disable? May current workload is a many small objects (about 200 bytes long), and summary of my memtables about 300 MB (16 CF). My heap is 3G,
Re: ParNew (promotion failed)
2011/3/23 Narendra Sharma narendra.sha...@gmail.com I think it is due to fragmentation in old gen, due to which survivor area cannot be moved to old gen. 300MB data size of memtable looks high for 3G heap. I learned that in memory overhead of memtable can be as high as 10x of memtable data size in memory. So either increase the heap or reduce the memtable thresholds further so that old gen gets freed up faster. With 16CFs, I would do both i.e. increase the heap to say 4GB and reduce memtable thresholds further. I think that you don't undestend me, 300MB is a summary thresholds on all 16 CF, so one memtable_threshold is about 18MB. Or all the same it is necessary to reduce memtable_threshold?
Re: Pauses of GC
After some investigations i think that my problems is similar to this : http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/reduced-cached-mem-resident-set-size-growth-td5967110.html Now i disable mmap, and set disk_access_mode to mmap_index_only
Re: Pauses of GC
I mean a linux process heap fragmentation by malloc, so at one critical moment all memory holden by java process in RSS, and OS core cant allocate any system resource an as result hung? Is it possble?
Pauses of GC
Hello Some times i have very long GC pauses: Total time for which application threads were stopped: 0.0303150 seconds 2011-03-17T13:19:56.476+0300: 33295.671: [GC 33295.671: [ParNew: 678855K-20708K(737280K), 0.0271230 secs] 1457643K-806795K(4112384K), 0.027305 0 secs] [Times: user=0.33 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0291820 seconds 2011-03-17T13:20:32.962+0300: 2.157: [GC 2.157: [ParNew: 676068K-23527K(737280K), 0.0302180 secs] 1462155K-817599K(4112384K), 0.030402 0 secs] [Times: user=0.31 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.1270270 seconds 2011-03-17T13:21:11.908+0300: 33371.103: [GC 33371.103: [ParNew: 678887K-21564K(737280K), 0.0268160 secs] 1472959K-823191K(4112384K), 0.027011 0 secs] [Times: user=0.28 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0293330 seconds 2011-03-17T13:21:50.482+0300: 33409.677: [GC 33409.677: [ParNew: 676924K-21115K(737280K), 0.0281720 secs] 1478551K-829900K(4112384K), 0.028363 0 secs] [Times: user=0.27 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0339610 seconds 2011-03-17T13:22:32.849+0300: 33452.044: [GC 33452.044: [ParNew: 676475K-25948K(737280K), 0.0317600 secs] 1485260K-842061K(4112384K), 0.031952 0 secs] [Times: user=0.22 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0344430 seconds 2011-03-17T13:23:14.924+0300: 33494.119: [GC 33494.119: [ParNew: 681308K-25087K(737280K), 0.0282600 secs] 1497421K-848300K(4112384K), 0.028436 0 secs] [Times: user=0.32 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0309160 seconds 2011-03-17T13:23:57.192+0300: 33536.387: [GC 33536.387: [ParNew: 680447K-24805K(737280K), 0.0299910 secs] 1503660K-855829K(4112384K), 0.030167 0 secs] [Times: user=0.29 sys=0.01, real=0.03 secs] Total time for which application threads were stopped: 0.0324200 seconds 2011-03-17T13:24:01.553+0300: 33540.748: [GC 33540.749: [ParNew: 680165K-31886K(737280K), 0.0495620 secs] 1511189K-936503K(4112384K), 0.049742 0 secs] [Times: user=0.57 sys=0.00, real=0.05 secs] Total time for which application threads were stopped: 0.0507030 seconds 2011-03-17T13:37:56.009+0300: 34375.204: [GC 34375.204: [ParNew: 687246K-28727K(737280K), 0.0244720 secs] 1591863K-942459K(4112384K), 0.024690 0 secs] [Times: user=0.18 sys=0.00, real=0.02 secs] Total time for which application threads were stopped: 806.7442720 seconds Total time for which application threads were stopped: 0.0006590 seconds Total time for which application threads were stopped: 0.0004360 seconds Total time for which application threads were stopped: 0.0004630 seconds Total time for which application threads were stopped: 0.0008120 seconds 2011-03-17T13:37:59.018+0300: 34378.213: [GC 34378.213: [ParNew: 676678K-21640K(737280K), 0.0137740 secs] 1590410K-949991K(4112384K), 0.013961 0 secs] [Times: user=0.13 sys=0.02, real=0.01 secs] Total time for which application threads were stopped: 0.0145920 seconds Total time for which application threads were stopped: 0.1036080 seconds Total time for which application threads were stopped: 0.0585600 seconds Total time for which application threads were stopped: 0.0600550 seconds Total time for which application threads were stopped: 0.0008560 seconds Total time for which application threads were stopped: 0.0006770 seconds Total time for which application threads were stopped: 0.0005910 seconds Total time for which application threads were stopped: 0.0351330 seconds Total time for which application threads were stopped: 0.0329020 seconds Total time for which application threads were stopped: 0.0728490 seconds Total time for which application threads were stopped: 0.0480990 seconds Total time for which application threads were stopped: 0.0804250 seconds 2011-03-17T13:38:04.394+0300: 34383.589: [GC 34383.589: [ParNew: 677000K-8375K(737280K), 0.0218310 secs] 1605351K-944271K(4112384K), 0.0220300 secs] I have follow nodetoll cfstats on hung node: Keyspace: fishdom_tuenti Read Count: 4970999 Read Latency: 1.0267005945887335 ms. Write Count: 1441619 Write Latency: 0.013146585887117193 ms. Pending Tasks: 0 Column Family: decor SSTable count: 3 Space used (live): 1296203532 Space used (total): 1302520037 Memtable Columns Count: 1066 Memtable Data Size: 121742 Memtable Switch Count: 11 Read Count: 108125 Read Latency: 2.809 ms. Write Count: 11261 Write Latency: 0.006 ms. Pending Tasks: 0 Key cache capacity: 30 Key cache size: 46470 Key cache hit rate: 0.40384615384615385 Row cache: disabled Compacted row minimum size: 36 Compacted row maximum size: 73457 Compacted row mean size: 958 Column Family: adopt SSTable count: 1 Space used
Re: Pauses of GC
2011/3/17 Narendra Sharma narendra.sha...@gmail.com What heap size are you running with? and Which version of Cassandra? 4G with cassandra 0.7.4
Re: Pauses of GC
At this moments java hungs. Only one thread is work and it run mostly in OS core, with follow trace: [pid 1953] 0.050157 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.22 [pid 1953] 0.59 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329, 797618000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050093 [pid 1953] 0.050152 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.21 [pid 1953] 0.67 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329, 847838000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050090 [pid 1953] 0.050150 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.22 [pid 1953] 0.67 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329, 898054000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050086 [pid 1953] 0.050144 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.22 [pid 1953] 0.60 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329, 948258000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050085 [pid 1953] 0.050144 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.21 [pid 1953] 0.67 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329, 998469000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050067 [pid 1953] 0.050127 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.21 [pid 1953] 0.67 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330, 48664000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050102 [pid 1953] 0.050161 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.21 [pid 1953] 0.59 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330, 98884000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050102 [pid 1953] 0.050160 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.22 [pid 1953] 0.67 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330, 149111000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050097 [pid 1953] 0.050157 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.22 [pid 1953] 0.59 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330, 199327000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050093 [pid 1953] 0.050153 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.22 [pid 1953] 0.67 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330, 249547000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050095 [pid 1953] 0.050155 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.22 [pid 1953] 0.59 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330, 299761000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050094 [pid 1953] 0.050154 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.21 [pid 1953] 0.67 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330, 349981000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050092 [pid 1953] 0.050168 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 0.23 [pid 1953] 0.66 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330, 400216000}, ) = -1 ETIMEDOUT (Connection timed out) 0.050090 And this happens when mmap disck access is on, and in my case when VIRTUAL space of java process is greate then 16G. It that case all system work badly, utilities launch very slow (but not any swap activity), when kill java process all system functionality back. What is that i don know, perhaps this is OS depend i use Ubuntu 10.0.4(LTS) Linux slv007 2.6.32-24-generic #43-Ubuntu SMP Thu Sep 16 14:58:24 UTC 2010 x86_64 GNU/Linux 2011/3/17 Narendra Sharma narendra.sha...@gmail.com Depending on your memtable thresholds the heap may be too small for the deployment. At the same time I don't see any other log statements around that long pause that you have shown in the log snippet. It looks little odd to me. All the ParNew collected almost same amount of heap and did not take lot of time. Check if it is due to some JVM bug. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6477891 -Naren On Thu, Mar 17, 2011 at 9:47 AM, ruslan usifov ruslan.usi...@gmail.comwrote: 2011/3/17 Narendra Sharma narendra.sha...@gmail.com What heap size are you running with? and Which version of Cassandra? 4G with cassandra 0.7.4
swap setting on linux
Dear community! Please share you settings for swap on linux box
replace one node to onother
Hello For example if we want change one server to another with ip address change too. How can we that eases way? For now we do nodetool removetocken, then set autobootstrap: true on new server (with the token that was on old node)
Move token to another node
Hello I have follow task. I want to move token from one node to another how can i do that?
Re: Move token to another node
2011/3/15 Sasha Dolgy sdo...@gmail.com Hi Ruslan, nodetool -h target node move newtoken And how add node to cluster without token?
Re: Strange behaivour
I detect that this was after change schema and it hung on waitpid syscall. What can i do with this?
Calculate memory used for keycache
Hello How is it possible calculate this value? I think that key size, if we use RandomPartitioner will 16 bytes so keycache will took 16*(num of keycache elements) bytes ??
Strange error
Hello in working cluster of cassandra 0.7.3 in system.log i see 2 simular errors: ERROR [ReadStage:12] 2011-03-13 20:27:20,431 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[ReadStage:12,5,m ain] java.lang.NullPointerException at org.apache.cassandra.db.Column.reconcile(Column.java:177) at org.apache.cassandra.db.SuperColumn.addColumn(SuperColumn.java:179) at org.apache.cassandra.db.SuperColumn.putColumn(SuperColumn.java:195) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:220) at org.apache.cassandra.db.filter.QueryFilter$2.reduce(QueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter$2.reduce(QueryFilter.java:108) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:62) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1326) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1203) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131) at org.apache.cassandra.db.Table.getRow(Table.java:333) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
Re: Strange behaivour
2011/3/13 aaron morton aa...@thelastpickle.com It's difficult to say what's causing the freeze. Was the node rejecting client connections during this time ? Yes. I think that hung all java because jmx doesn't respond too Did any of the other nodes log that the node that was freezing was down ? Yes Is there anything else running on the box? No PS: also on graph is visible that hangs in core (system time)
Re: Strange error
2011/3/14 aaron morton aa...@thelastpickle.com Looks related to CASSANDRA-1559 https://issues.apache.org/jira/browse/CASSANDRA-1559 which should be fixed in 0.7.4 https://issues.apache.org/jira/browse/CASSANDRA-1559However everyone said it should not happen. Can you provide some more detail about what you did to cause this to happen. I do nothing (no any maintenance work), just typical workload for production application
Re: Strange error
2011/3/14 aaron morton aa...@thelastpickle.com What sort of workload was that ? All read and write or could there have been some deletes as well? Many reads, some writes, and some deletes
Re: Poor performance on small data set
Here is php windows extension but you must use trunk version of thrift 2011/3/12 Vodnok vod...@gmail.com Thank you all for your replies nagle + delayed ACK problem : I founded a way to solve this via regedit but no impact on response time THRIFT-638 : It seems to be a solution but i don't know how to patch this on my environement phpcassa has a C extension but it's hard for me to build a php extension php_thrift_protocol.dll Description: Binary data
Re: memory utilization
2011/3/11 Chris Burroughs chris.burrou...@gmail.com Is there an more or less constant amount of resident memory, or is it growing over a period of days? As said in cassandra wiki: The main argument for using mmap() instead of standard I/O is the fact that reading entails just touching memory - in the case of the memory being resident, you just read it - you don't even take a page fault (so no overhead in entering the kernel and doing a semi-context switch) So resident memory also will grow(but what happens when all physical memory end i don't know)
Re: memory utilization
2011/3/12 Jonathan Ellis jbel...@gmail.com Nothing happens, because it _doesn't have to be resident_. Hm, but why in my case top show RSS 10g, when max HEAP_SIZE is 6G?? PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 27650 cassandr 20 0 14.9g 10g 3.8g S 51 86.6 370:15.82 jsvc 20583 zabbix25 5 18256 1464 1400 S4 0.0 37:59.88 zabbix_agentd
cassandra and G1 gc
Hello Does anybody use G1 gc in production? What your impressions?
Re: Nodes frozen in GC
2011/3/8 Chris Goffinet c...@chrisgoffinet.com How large are your SSTables on disk? My thought was because you have so many on disk, we have to store the bloom filter + every 128 keys from index in memory. 0.5GB But as I understand store in memory happens only when read happens, i do only inserts. And i think that memory doesn't problem, because heap allocations looks like saw (in max Heap allocations get about 5,5 GB then they reduce to 2GB) Also when i increase Heap Size to 7GB, situation stay mach better, but nodes frozen still happens, and in gc.log I steel see: Total time for which application threads were stopped: 20.0686307 seconds lines (right not so often, like before)
Re: Nodes frozen in GC
2011/3/8 Peter Schuller peter.schul...@infidyne.com (1) I cannot stress this one enough: Run with -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and collect the output. (2) Attach to your process with jconsole or some similar tool. (3) Observe the behavior of the heap over time. Preferably post screenshots so others can look at them. I'm not sure that up to the end you has understood, sorry I launch cassandra with follow gc login options (but doesn't mention about this before, because of this document http://www.datastax.com/docs/0.7/troubleshooting/index#nodes-seem-to-freeze-after-some-period-of-time, there is no any mention about gc.log ): JVM_OPTS=$JVM_OPTS -XX:+PrintGCApplicationStoppedTime JVM_OPTS=$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log And detect that nodes frozen with follow log entires Total time for which application threads were stopped: 30.957 seconds And so on. Also when i think that nodes are frozen i got UnavailableException and TimeOutException, about 20-30 times (i make few Attempts (300 with 1 sec sleep) before final fail), follow fragment of code illustrate what i do for(; $l_i 300; ++$l_i) { try { $client-batch_mutate($mutations, cassandra_ConsistencyLevel::QUORUM); $retval = true; break; } catch(cassandra_UnavailableException $e) { array_push($l_exceptions, get_class($e)); sleep(1); } catch(cassandra_TimedOutException $e) { array_push($l_exceptions, get_class($e)); sleep(1); } catch(Exception $e) { $loger-err(get_class($e).': '.$e-getMessage()); $loger-err($mutations); break; }; };
Re: Several 'TimedOutException' in stress.py
2011/3/8 A J s5a...@gmail.com Trying out stress.py on AWS EC2 environment (4 Large instances. Each of 2-cores and 7.5GB RAM. All in the same region/zone.) python stress.py -o insert -d 10.253.203.224,10.220.203.48,10.220.17.84,10.124.89.81 -l 2 -e ALL -t 10 -n 500 -S 100 -k (I want to try with column size of about 1MB. I am assuming the above gives me 10 parallel threads each executing 50 inserts sequentially (500/10) ). Getting several timeout errors.TimedOutException(). With just 10 concurrent writes spread across 4 nodes, kind of surprised to get so many timeouts. Any suggestions ? It may by EC2 disc speed degradation (io speed of EC2 instances doesnt const, also can vary in greater limits)
Re: Nodes frozen in GC
2011/3/8 Paul Pak p...@yellowseo.com Hi Ruslan, Is it possible for you to tell us the details on what you have done which measurably helped your situation, so we can start a best practices doc on growing cassandra systems? So far, I see that under load, cassandra is rarely ready to take heavy load in it's default configuration and a number of steps need to be done with the configuration of cassandra for proper sizing of memtables, flushing, jvm. Unfortunately, it's very difficult to gauge what the proper or appropriate settings are for a given workload. It would be helpful if you could share, what happened in the default config, what steps you did that helped the situation, h Tow much each step helped your situation. That way we can start a checklist of things to address as we grow in load. It will be great if you provide options that need tuning from best throput, i know only 3: in cassandra.yaml binary_memtable_throughput_in_mb And jvm options: -Xms with -Xmx - for heap size -Xmn - for minor young generation GC
Re: Nodes frozen in GC
2011/3/6 aaron morton aa...@thelastpickle.com Your node is under memory pressure, after the GC there is still 5.7GB in use. In fact it looks like memory usage went up during the GC process. Can you reduce the memtable size, caches or the number of CF's or increase the JVM size? Also is this happening under heavy load ? I have memtable size, and insert data into one CF, with biggest rowsize 1K, how it is possible that after GC all memory is load? Meybe this is memory leak in cassandra 0.7.3?
Re: Nodes frozen in GC
2011/3/8 Jonathan Ellis jbel...@gmail.com It sounds like you're complaining that the JVM sometimes does stop-the-world GC. You can mitigate this but not (for most workloads) eliminate it with GC option tuning. That's simply the state of the art for Java garbage collection right now. Hm, but what to do in this cases?? In these moments throughput of cluster degrade, and I misunderstand what workaround I must do to prevent this situations?
Re: Nodes frozen in GC
2011/3/8 Chris Goffinet c...@chrisgoffinet.com Can you tell me how many SSTables on disk when you see GC pauses? In your 3 node cluster, what's the RF factor? About 30-40, and i use RF=2, and insert rows with QUORUM consistency level
Re: Nodes frozen in GC
2011/3/8 Chris Goffinet c...@chrisgoffinet.com The rows you are inserting, what is your update ratio to those rows? I doesn't update them only insert, with speed 16000 per second