Re: Consistency Level throughput
My question is my throughput per case. > In general, cluster throughput = single node throughput * number of > nodes / replication factor. Yes, I think so too. But I really want to ask is there are no results. Could you look at the chart I made it? http://goo.gl/mACQa 2011/5/27 Maki Watanabe : > I assume your question is on that "how CL will affects on the throughput". > > In theory, I believe CL will not affect on the throughput of the > Cassandra system. > In any CL, the coordinator node needs to submit write/read requests > along the RF specified for the KS. > But for the latency, CL will affects on. Stronger CL will cause larger > latency. > In the real world, it will depends on system configuration, > application design, data, and all of the environment. > However if you found shorter latency with stronger CL, there must be > some reason to explain the behavior. > > maki > > 2011/5/27 Ryu Kobayashi : >> Hi, >> >> Question of Consistency Level throughput. >> >> Environment: >> 6 nodes. Replication factor is 3. >> >> ONE and QUORUM it was not for the throughput difference. >> ALL just extremely slow. >> Not ONE had only half the throughput. >> ONE, TWO and THREE were similar results. >> >> Is there any difference between 2 nodes and 3 nodes? >> >> -- >> >> twitter:@ryu_kobayashi >> > > > > -- > w3m > -- twitter:@ryu_kobayashi
Re: Forcing Cassandra to free up some space
For the purposes of clearing out disk space, you might also occasionally check to see if you have snapshots that you no longer need. Certain operations create snapshots (point-in-time backups of sstables) in the (default) /var/lib/cassandra/data//snapshots directory. If you are absolutely sure that you no longer need a particular snapshot of the sstables, you can reclaim a decent amount of space that way. I'm not sure of all of the other GC discussion going on but that's one way to reclaim some space. On May 26, 2011, at 1:09 PM, Konstantin Naryshkin wrote: > I have a basic understanding of how Cassandra handles the file system > (flushes in Memtables out to SSTables, SSTables get compacted) and I > understand that old files are only deleted when a node is restarted, when > Java does a GC, or when Cassandra feels like it is running out of space. > > My question is, is there some way for us to hurry the process along? We have > a data that we do a lot of inserts into and then delete the data several > hours later. We would like it if we could free up disk space (since our > disks, though large, are shared with other applications). So far, the action > sequence to accomplish this is: > nodetoo flush -> nodetool repair -> nodetool compact -> ?? > > Is there a way for me to make (or even gently suggest to) Cassandra that it > may be a good time to free up some space?
Re: Consistency Level throughput
I assume your question is on that "how CL will affects on the throughput". In theory, I believe CL will not affect on the throughput of the Cassandra system. In any CL, the coordinator node needs to submit write/read requests along the RF specified for the KS. But for the latency, CL will affects on. Stronger CL will cause larger latency. In the real world, it will depends on system configuration, application design, data, and all of the environment. However if you found shorter latency with stronger CL, there must be some reason to explain the behavior. maki 2011/5/27 Ryu Kobayashi : > Hi, > > Question of Consistency Level throughput. > > Environment: > 6 nodes. Replication factor is 3. > > ONE and QUORUM it was not for the throughput difference. > ALL just extremely slow. > Not ONE had only half the throughput. > ONE, TWO and THREE were similar results. > > Is there any difference between 2 nodes and 3 nodes? > > -- > > twitter:@ryu_kobayashi > -- w3m
Re: Consistency Level throughput
I'm afraid I don't quite understand the question. In general, cluster throughput = single node throughput * number of nodes / replication factor. On Thu, May 26, 2011 at 9:39 PM, Ryu Kobayashi wrote: > Hi, > > Question of Consistency Level throughput. > > Environment: > 6 nodes. Replication factor is 3. > > ONE and QUORUM it was not for the throughput difference. > ALL just extremely slow. > Not ONE had only half the throughput. > ONE, TWO and THREE were similar results. > > Is there any difference between 2 nodes and 3 nodes? > > -- > > twitter:@ryu_kobayashi > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
ghost node?
A node with IP 10.46.108.102 was removed from the cluster several days ago but the cassandra logs are full of these messages! Anyone know how to permanently remove this information? I\m beginning to think it is affecting the throughput of the live ndes. INFO [FlushWriter:1] 2011-05-27 04:28:17,976 Memtable.java (line 164) Completed flushing /var/lib/cassandra/data/system/HintsColumnFamily-f-95-Data.db (63 bytes) INFO [ScheduledTasks:1] 2011-05-27 04:29:18,386 Gossiper.java (line 437) FatClient /10.46.108.102 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2011-05-27 04:30:19,902 Gossiper.java (line 610) Node /10.46.108.102 is now part of the cluster INFO [ScheduledTasks:1] 2011-05-27 04:30:19,903 HintedHandOffManager.java (line 210) Deleting any stored hints for 10.46.108.102 INFO [GossipStage:1] 2011-05-27 04:30:19,903 StorageService.java (line 865) Removing token 42535295865117307932921825928971026432 for /10.46.108.102 INFO [ScheduledTasks:1] 2011-05-27 04:30:19,903 ColumnFamilyStore.java (line 1048) Enqueuing flush of Memtable-HintsColumnFamily@2051849391(0 bytes, 0 operations) INFO [FlushWriter:1] 2011-05-27 04:30:19,904 Memtable.java (line 157) Writing Memtable-HintsColumnFamily@2051849391(0 bytes, 0 operations) INFO [FlushWriter:1] 2011-05-27 04:30:26,711 Memtable.java (line 164) Completed flushing /var/lib/cassandra/data/system/HintsColumnFamily-f-96-Data.db (63 bytes) INFO [ScheduledTasks:1] 2011-05-27 04:31:21,420 Gossiper.java (line 437) FatClient /10.46.108.102 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2011-05-27 04:32:23,098 Gossiper.java (line 610) Node /10.46.108.102 is now part of the cluster INFO [ScheduledTasks:1] 2011-05-27 04:32:23,099 HintedHandOffManager.java (line 210) Deleting any stored hints for 10.46.108.102 INFO [GossipStage:1] 2011-05-27 04:32:23,100 StorageService.java (line 865) Removing token 42535295865117307932921825928971026432 for /10.46.108.102 INFO [ScheduledTasks:1] 2011-05-27 04:32:23,100 ColumnFamilyStore.java (line 1048) Enqueuing flush of Memtable-HintsColumnFamily@639962965(0 bytes, 0 operations) INFO [FlushWriter:1] 2011-05-27 04:32:23,100 Memtable.java (line 157) Writing Memtable-HintsColumnFamily@639962965(0 bytes, 0 operations) INFO [FlushWriter:1] 2011-05-27 04:32:23,155 Memtable.java (line 164) Completed flushing /var/lib/cassandra/data/system/HintsColumnFamily-f-97-Data.db (63 bytes) INFO [ScheduledTasks:1] 2011-05-27 04:33:24,457 Gossiper.java (line 437) FatClient /10.46.108.102 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2011-05-27 04:34:25,231 Gossiper.java (line 610) Node /10.46.108.102 is now part of the cluster INFO [ScheduledTasks:1] 2011-05-27 04:34:25,232 HintedHandOffManager.java (line 210) Deleting any stored hints for 10.46.108.102 INFO [GossipStage:1] 2011-05-27 04:34:25,233 StorageService.java (line 865) Removing token 42535295865117307932921825928971026432 for /10.46.108.102 INFO [ScheduledTasks:1] 2011-05-27 04:34:25,233 ColumnFamilyStore.java (line 1048) Enqueuing flush of Memtable-HintsColumnFamily@1211655714(0 bytes, 0 operations) INFO [FlushWriter:1] 2011-05-27 04:34:25,234 Memtable.java (line 157) Writing Memtable-HintsColumnFamily@1211655714(0 bytes, 0 operations) INFO [FlushWriter:1] 2011-05-27 04:34:25,290 Memtable.java (line 164) Completed flushing /var/lib/cassandra/data/system/HintsColumnFamily-f-98-Data.db (63 bytes) INFO [ScheduledTasks:1] 2011-05-27 04:35:26,497 Gossiper.java (line 437) FatClient /10.46.108.102 has been silent for 3ms, removing from gossip
Consistency Level throughput
Hi, Question of Consistency Level throughput. Environment: 6 nodes. Replication factor is 3. ONE and QUORUM it was not for the throughput difference. ALL just extremely slow. Not ONE had only half the throughput. ONE, TWO and THREE were similar results. Is there any difference between 2 nodes and 3 nodes? -- twitter:@ryu_kobayashi
Re: Forcing Cassandra to free up some space
Im also not sure that will guarantee all space is cleaned up. It really depends on what you are doing inside Cassandra. If you have your on garbage collect that is just in some way tied to the gc run, then it will run when it runs. If otoh you are associating records in your storage with specific objects in memory and using one of the post-mortem hooks (finalize or PhantomReference) to tell you to clean up that particular record then its quite possible they wont all get cleaned up. In general hotspot does not find and clean every candidate object on every GC run. It starts with the easiest/fastest to find and then sees what more it thinks it needs to do to create enough memory for anticipated near future needs. On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis wrote: > In summary, system.gc works fine unless you've deliberately done > something like setting the -XX:-DisableExplicitGC flag. > > On Thu, May 26, 2011 at 5:58 PM, Konstantin Naryshkin > wrote: >> So, in summary, there is no way to predictably and efficiently tell >> Cassandra to get rid of all of the extra space it is using on disk? >> >> - Original Message - >> From: "Jeffrey Kesselman" >> To: user@cassandra.apache.org >> Sent: Thursday, May 26, 2011 8:57:49 PM >> Subject: Re: Forcing Cassandra to free up some space >> >> Which JVM? Which collector? There have been and continue to be many. >> >> Hotspot itself supports a number of different collectors with >> different behaviors. Many of them do not collect every candidate on >> every gc, but merely the easiest ones to find. This is why depending >> on finalizers is a *bad* idea in java code. They may well never get >> run. (Finalizer is one of a few features the Sun Java team always >> regretted putting in Java to start with. It has caused quite a few >> application problems over the years) >> >> The really important thing is that NONE of these behaviors of the >> colelctors are guaranteed by specification not to change from version >> to version. Basing your code on non-specified behaviors is a good way >> to hit mysterious failures on updates. >> >> For instance, in the mid 90s, IBM had a mode of their Vm called >> "infinite heap." it *never* garbage collected, even if you called >> System.gc. Instead it just threw away address space and counted on >> the total memory needs for the life of the program being less then the >> total addressable space of the processor. >> >> It was *very* fast for certain kinds of applications. >> >> Far from being pedantic, not depending on undocumented behavior is >> simply good engineering. >> >> >> On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis wrote: >>> I've read the relevant source. While you're pedantically correct re >>> the spec, you're wrong as to what the JVM actually does. >>> >>> On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman wrote: Some references... "An object enters an unreachable state when no more strong references to it exist. When an object is unreachable, it is a candidate for collection. Note the wording: Just because an object is a candidate for collection doesn't mean it will be immediately collected. The JVM is free to delay collection until there is an immediate need for the memory being consumed by the object." http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 and "Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects" http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() It goes on to say that the VM will make a "best effort", but "best effort" is *deliberately* left up to the definition of the gc implementor. I guess you missed the many lectures I have given on this subject over the years at Java One Conferences On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: > It's a common misunderstanding that system.gc is only a suggestion; on > any VM you're likely to run Cassandra on, System.gc will actually > invoke a full collection. > > On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman > wrote: >> Actually this is no gaurantee. Its a common misunderstanding that >> System.gc "forces" gc. It does not. It is a suggestion only. The vm >> always >> has the option as to when and how much it gcs >> >> On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- It's always darkest just before you are eaten by a grue. >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >> >
Re: Forcing Cassandra to free up some space
You really should qualify that with "on all currently known versions of Hotspot" Not trying to give you grief, really, but its an important limitation to understand. On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis wrote: > In summary, system.gc works fine unless you've deliberately done > something like setting the -XX:-DisableExplicitGC flag. > > On Thu, May 26, 2011 at 5:58 PM, Konstantin Naryshkin > wrote: >> So, in summary, there is no way to predictably and efficiently tell >> Cassandra to get rid of all of the extra space it is using on disk? >> >> - Original Message - >> From: "Jeffrey Kesselman" >> To: user@cassandra.apache.org >> Sent: Thursday, May 26, 2011 8:57:49 PM >> Subject: Re: Forcing Cassandra to free up some space >> >> Which JVM? Which collector? There have been and continue to be many. >> >> Hotspot itself supports a number of different collectors with >> different behaviors. Many of them do not collect every candidate on >> every gc, but merely the easiest ones to find. This is why depending >> on finalizers is a *bad* idea in java code. They may well never get >> run. (Finalizer is one of a few features the Sun Java team always >> regretted putting in Java to start with. It has caused quite a few >> application problems over the years) >> >> The really important thing is that NONE of these behaviors of the >> colelctors are guaranteed by specification not to change from version >> to version. Basing your code on non-specified behaviors is a good way >> to hit mysterious failures on updates. >> >> For instance, in the mid 90s, IBM had a mode of their Vm called >> "infinite heap." it *never* garbage collected, even if you called >> System.gc. Instead it just threw away address space and counted on >> the total memory needs for the life of the program being less then the >> total addressable space of the processor. >> >> It was *very* fast for certain kinds of applications. >> >> Far from being pedantic, not depending on undocumented behavior is >> simply good engineering. >> >> >> On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis wrote: >>> I've read the relevant source. While you're pedantically correct re >>> the spec, you're wrong as to what the JVM actually does. >>> >>> On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman wrote: Some references... "An object enters an unreachable state when no more strong references to it exist. When an object is unreachable, it is a candidate for collection. Note the wording: Just because an object is a candidate for collection doesn't mean it will be immediately collected. The JVM is free to delay collection until there is an immediate need for the memory being consumed by the object." http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 and "Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects" http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() It goes on to say that the VM will make a "best effort", but "best effort" is *deliberately* left up to the definition of the gc implementor. I guess you missed the many lectures I have given on this subject over the years at Java One Conferences On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: > It's a common misunderstanding that system.gc is only a suggestion; on > any VM you're likely to run Cassandra on, System.gc will actually > invoke a full collection. > > On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman > wrote: >> Actually this is no gaurantee. Its a common misunderstanding that >> System.gc "forces" gc. It does not. It is a suggestion only. The vm >> always >> has the option as to when and how much it gcs >> >> On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- It's always darkest just before you are eaten by a grue. >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >> >> >> -- >> It's always darkest just before you are eaten by a grue. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- It's always darkest just before you are eaten by a grue.
Re: Forcing Cassandra to free up some space
In summary, system.gc works fine unless you've deliberately done something like setting the -XX:-DisableExplicitGC flag. On Thu, May 26, 2011 at 5:58 PM, Konstantin Naryshkin wrote: > So, in summary, there is no way to predictably and efficiently tell Cassandra > to get rid of all of the extra space it is using on disk? > > - Original Message - > From: "Jeffrey Kesselman" > To: user@cassandra.apache.org > Sent: Thursday, May 26, 2011 8:57:49 PM > Subject: Re: Forcing Cassandra to free up some space > > Which JVM? Which collector? There have been and continue to be many. > > Hotspot itself supports a number of different collectors with > different behaviors. Many of them do not collect every candidate on > every gc, but merely the easiest ones to find. This is why depending > on finalizers is a *bad* idea in java code. They may well never get > run. (Finalizer is one of a few features the Sun Java team always > regretted putting in Java to start with. It has caused quite a few > application problems over the years) > > The really important thing is that NONE of these behaviors of the > colelctors are guaranteed by specification not to change from version > to version. Basing your code on non-specified behaviors is a good way > to hit mysterious failures on updates. > > For instance, in the mid 90s, IBM had a mode of their Vm called > "infinite heap." it *never* garbage collected, even if you called > System.gc. Instead it just threw away address space and counted on > the total memory needs for the life of the program being less then the > total addressable space of the processor. > > It was *very* fast for certain kinds of applications. > > Far from being pedantic, not depending on undocumented behavior is > simply good engineering. > > > On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis wrote: >> I've read the relevant source. While you're pedantically correct re >> the spec, you're wrong as to what the JVM actually does. >> >> On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman wrote: >>> Some references... >>> >>> "An object enters an unreachable state when no more strong references >>> to it exist. When an object is unreachable, it is a candidate for >>> collection. Note the wording: Just because an object is a candidate >>> for collection doesn't mean it will be immediately collected. The JVM >>> is free to delay collection until there is an immediate need for the >>> memory being consumed by the object." >>> >>> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 >>> >>> and "Calling the gc method suggests that the Java Virtual Machine >>> expend effort toward recycling unused objects" >>> >>> http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() >>> >>> It goes on to say that the VM will make a "best effort", but "best >>> effort" is *deliberately* left up to the definition of the gc >>> implementor. >>> >>> I guess you missed the many lectures I have given on this subject over >>> the years at Java One Conferences >>> >>> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: It's a common misunderstanding that system.gc is only a suggestion; on any VM you're likely to run Cassandra on, System.gc will actually invoke a full collection. On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman wrote: > Actually this is no gaurantee. Its a common misunderstanding that > System.gc "forces" gc. It does not. It is a suggestion only. The vm > always > has the option as to when and how much it gcs > > On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com >>> >>> >>> >>> -- >>> It's always darkest just before you are eaten by a grue. >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > It's always darkest just before you are eaten by a grue. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Forcing Cassandra to free up some space
"Is there a way for me to make (or even gently suggest to) Cassandra that it may be a good time to free up some space?" Disregarding what's been said and until ref-counting is implemented this is a useful tool to gently suggest cleanup: https://github.com/ceocoder/jmxgc On Thu, May 26, 2011 at 2:09 PM, Konstantin Naryshkin wrote: > I have a basic understanding of how Cassandra handles the file system > (flushes in Memtables out to SSTables, SSTables get compacted) and I > understand that old files are only deleted when a node is restarted, when > Java does a GC, or when Cassandra feels like it is running out of space. > > My question is, is there some way for us to hurry the process along? We > have a data that we do a lot of inserts into and then delete the data > several hours later. We would like it if we could free up disk space (since > our disks, though large, are shared with other applications). So far, the > action sequence to accomplish this is: > nodetoo flush -> nodetool repair -> nodetool compact -> ?? > > Is there a way for me to make (or even gently suggest to) Cassandra that it > may be a good time to free up some space? > -- http://twitter.com/tjake
Re: Forcing Cassandra to free up some space
Not if it depends on a side effect of garbage collection such as finalizers It aught to publish its own JMX control to cause that to happen. On Thu, May 26, 2011 at 6:58 PM, Konstantin Naryshkin wrote: > So, in summary, there is no way to predictably and efficiently tell Cassandra > to get rid of all of the extra space it is using on disk? > > - Original Message - > From: "Jeffrey Kesselman" > To: user@cassandra.apache.org > Sent: Thursday, May 26, 2011 8:57:49 PM > Subject: Re: Forcing Cassandra to free up some space > > Which JVM? Which collector? There have been and continue to be many. > > Hotspot itself supports a number of different collectors with > different behaviors. Many of them do not collect every candidate on > every gc, but merely the easiest ones to find. This is why depending > on finalizers is a *bad* idea in java code. They may well never get > run. (Finalizer is one of a few features the Sun Java team always > regretted putting in Java to start with. It has caused quite a few > application problems over the years) > > The really important thing is that NONE of these behaviors of the > colelctors are guaranteed by specification not to change from version > to version. Basing your code on non-specified behaviors is a good way > to hit mysterious failures on updates. > > For instance, in the mid 90s, IBM had a mode of their Vm called > "infinite heap." it *never* garbage collected, even if you called > System.gc. Instead it just threw away address space and counted on > the total memory needs for the life of the program being less then the > total addressable space of the processor. > > It was *very* fast for certain kinds of applications. > > Far from being pedantic, not depending on undocumented behavior is > simply good engineering. > > > On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis wrote: >> I've read the relevant source. While you're pedantically correct re >> the spec, you're wrong as to what the JVM actually does. >> >> On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman wrote: >>> Some references... >>> >>> "An object enters an unreachable state when no more strong references >>> to it exist. When an object is unreachable, it is a candidate for >>> collection. Note the wording: Just because an object is a candidate >>> for collection doesn't mean it will be immediately collected. The JVM >>> is free to delay collection until there is an immediate need for the >>> memory being consumed by the object." >>> >>> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 >>> >>> and "Calling the gc method suggests that the Java Virtual Machine >>> expend effort toward recycling unused objects" >>> >>> http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() >>> >>> It goes on to say that the VM will make a "best effort", but "best >>> effort" is *deliberately* left up to the definition of the gc >>> implementor. >>> >>> I guess you missed the many lectures I have given on this subject over >>> the years at Java One Conferences >>> >>> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: It's a common misunderstanding that system.gc is only a suggestion; on any VM you're likely to run Cassandra on, System.gc will actually invoke a full collection. On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman wrote: > Actually this is no gaurantee. Its a common misunderstanding that > System.gc "forces" gc. It does not. It is a suggestion only. The vm > always > has the option as to when and how much it gcs > > On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com >>> >>> >>> >>> -- >>> It's always darkest just before you are eaten by a grue. >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > It's always darkest just before you are eaten by a grue. > -- It's always darkest just before you are eaten by a grue.
Re: Forcing Cassandra to free up some space
So, in summary, there is no way to predictably and efficiently tell Cassandra to get rid of all of the extra space it is using on disk? - Original Message - From: "Jeffrey Kesselman" To: user@cassandra.apache.org Sent: Thursday, May 26, 2011 8:57:49 PM Subject: Re: Forcing Cassandra to free up some space Which JVM? Which collector? There have been and continue to be many. Hotspot itself supports a number of different collectors with different behaviors. Many of them do not collect every candidate on every gc, but merely the easiest ones to find. This is why depending on finalizers is a *bad* idea in java code. They may well never get run. (Finalizer is one of a few features the Sun Java team always regretted putting in Java to start with. It has caused quite a few application problems over the years) The really important thing is that NONE of these behaviors of the colelctors are guaranteed by specification not to change from version to version. Basing your code on non-specified behaviors is a good way to hit mysterious failures on updates. For instance, in the mid 90s, IBM had a mode of their Vm called "infinite heap." it *never* garbage collected, even if you called System.gc. Instead it just threw away address space and counted on the total memory needs for the life of the program being less then the total addressable space of the processor. It was *very* fast for certain kinds of applications. Far from being pedantic, not depending on undocumented behavior is simply good engineering. On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis wrote: > I've read the relevant source. While you're pedantically correct re > the spec, you're wrong as to what the JVM actually does. > > On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman wrote: >> Some references... >> >> "An object enters an unreachable state when no more strong references >> to it exist. When an object is unreachable, it is a candidate for >> collection. Note the wording: Just because an object is a candidate >> for collection doesn't mean it will be immediately collected. The JVM >> is free to delay collection until there is an immediate need for the >> memory being consumed by the object." >> >> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 >> >> and "Calling the gc method suggests that the Java Virtual Machine >> expend effort toward recycling unused objects" >> >> http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() >> >> It goes on to say that the VM will make a "best effort", but "best >> effort" is *deliberately* left up to the definition of the gc >> implementor. >> >> I guess you missed the many lectures I have given on this subject over >> the years at Java One Conferences >> >> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: >>> It's a common misunderstanding that system.gc is only a suggestion; on >>> any VM you're likely to run Cassandra on, System.gc will actually >>> invoke a full collection. >>> >>> On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman wrote: Actually this is no gaurantee. Its a common misunderstanding that System.gc "forces" gc. It does not. It is a suggestion only. The vm always has the option as to when and how much it gcs On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >> >> >> -- >> It's always darkest just before you are eaten by a grue. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- It's always darkest just before you are eaten by a grue.
Re: Re: nodetool move trying to stream data to node no longer in cluster
Hi Aaron - Thanks alot for the great feedback. I'll try your suggestion on removing it as an endpoint with jmx. On , aaron morton wrote: Off the top of my head the simple way to stop invalid end point state been passed around is a full cluster stop. Obviously thats not an option. The problem is if one node has the IP is will share it around with the others. Out of interest take a look at the oacdb.FailureDetector MBean getAllEndpointStates() function. That returns the end point state held by the Gossiper. I think you should see the Phantom IP listed in there. If it's only on some nodes *perhaps* restarting the node with the JVM option -Dcassandra.load_ring_state=false *may* help. That will stop the node from loading it's save ring state and force it to get it via gossip. Again, if there are other nodes with the phantom IP it may just get it again. I'll do some digging and try to get back to you. This pops up from time to time and thinking out loud I wonder if it would be possible to add a new application state that purges an IP from the ring. eg VersionedValue.STATUS_PURGED that works with a ttl so it goes through X number of gossip rounds and then disappears. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 19:58, Jonathan Colby wrote: > @Aaron - > > Unfortunately I'm still seeing message like: " is down", removing from gossip, although with not the same frequency. > > And repair/move jobs don't seem to try to stream data to the removed node anymore. > > Anyone know how to totally purge any stored gossip/endpoint data on nodes that were removed from the cluster. Or what might be happening here otherwise? > > > On May 26, 2011, at 9:10 AM, aaron morton wrote: > >> cool. I was going to suggest that but as you already had the move running I thought it may be a little drastic. >> >> Did it show any progress ? If the IP address is not responding there should have been some sort of error. >> >> Cheers >> >> - >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 26 May 2011, at 15:28, jonathan.co...@gmail.com wrote: >> >>> Seems like it had something to do with stale endpoint information. I did a rolling restart of the whole cluster and that seemed to trigger the nodes to remove the node that was decommissioned. >>> >>> On , aaron morton aa...@thelastpickle.com> wrote: Is it showing progress ? It may just be a problem with the information printed out. Can you check from the other nodes in the cluster to see if they are receiving the stream ? cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 00:42, Jonathan Colby wrote: > I recently removed a node (with decommission) from our cluster. > > I added a couple new nodes and am now trying to rebalance the cluster using nodetool move. > > However, netstats shows that the node being "moved" is trying to stream data to the node that I already decommissioned yesterday. > > The removed node was powered-off, taken out of dns, its IP is not even pingable. It was never a seed neither. > > This is cassandra 0.7.5 on 64bit linux. How do I tell the cluster that this node is gone? Gossip should have detected this. The ring commands shows the correct cluster IPs. > > Here is a portion of netstats. 10.46.108.102 is the node which was removed. > > Mode: Leaving: streaming data to other nodes > Streaming to: /10.46.108.102 > /var/lib/cassandra/data/DFS/main-f-1064-Data.db/(4681027,5195491),(5195491,15308570),(15308570,15891710),(16336750,20558705),(20558705,29112203),(29112203,36279329),(36465942,36623223),(36740457,37227058),(37227058,42206994),(42206994,47380294),(47635053,47709813),(47709813,48353944),(48621287,49406499),(53330048,53571312),(53571312,54153922),(54153922,59857615),(59857615,61029910),(61029910,61871509),(62190800,62498605),(62824281,62964830),(63511604,64353114),(64353114,64760400),(65174702,65919771),(65919771,66435630),(81440029,81725949),(81725949,83313847),(83313847,83908709),(88983863,89237303),(89237303,89934199),(89934199,97 > ... > 5693491,14795861666),(14795861666,14796105318),(14796105318,14796366886),(14796699825,14803874941),(14803874941,14808898331),(14808898331,1481
Re: EC2 node adding trouble
This is the *most* useful page on the wiki http://wiki.apache.org/cassandra/Operations Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 27 May 2011, at 02:06, Marcus Bointon wrote: > On 26 May 2011, at 15:21, Sasha Dolgy wrote: > >> Turn the node off, remove the node from the ring using nodetool and >> removetoken i've found this to be the best problem-free way. >> Maybe it's better now ... >> http://blog.sasha.dolgy.com/2011/03/apache-cassandra-nodetool.html > > So I'd need to have at least replication=2 in order to do that safely? Your > article makes it sound like draining/decommission doesn't work? > > Has anyone automated node addition/removal using chef or similar? > > Marcus
Re: EC2 node adding trouble
This ticket may be just the ticket :) https://issues.apache.org/jira/browse/CASSANDRA-2452 Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 27 May 2011, at 01:16, Sasha Dolgy wrote: > As an aside, you can also use that command to pull meta-data about > instances in AWS. I have implemented this to maintain a list of seed > nodes. This way, when a new instance is brought online, the default > cassandra.yaml is `enhanced` to contain a dynamic list of valid seeds, > proper hostname and a few other bits of useful information. > > Finally, if you aren't using a single security group for all of your > cassandra instances, maybe this may be of help to you. When we add > new nodes to our ring, we add them to a single cassandra security > group. No messing about with security groups per instance... > > -sd > > On Thu, May 26, 2011 at 2:36 PM, Marcus Bointon > wrote: >> Thanks for all your helpful suggestions - I've now got it working. It was >> down to a combination of things. >> >> 1. A missing rule in a security group >> 2. A missing DNS name for the new node, so its default name was defaulting >> to localhost >> 3. Google DNS caching the failed DNS lookup for the full duration of the >> SOA's TTL >> >> In order to avoid the whole problem with assigning IPs using the >> internal/external trick and using up elastic IPs, I found this service which >> I'd not seen before: >> http://www.ducea.com/2009/06/01/howto-update-dns-hostnames-automatically-for-your-amazon-ec2-instances/ >> >> This means you can reliably set (and reset as necessary) a listen address >> with this command: >> >> sed -i "s/^listen_address:.*/listen_address: `curl >> http://169.254.169.254/latest/meta-data/local-ipv4`/"; >> /etc/cassandra/cassandra.yaml >> >> It's not quite as good as having a true dynamic hostname, but at least you >> can drop it in a startup script and forget it. >> >> Marcus
Re: nodetool move trying to stream data to node no longer in cluster
Off the top of my head the simple way to stop invalid end point state been passed around is a full cluster stop. Obviously thats not an option. The problem is if one node has the IP is will share it around with the others. Out of interest take a look at the o.a.c.db.FailureDetector MBean getAllEndpointStates() function. That returns the end point state held by the Gossiper. I think you should see the Phantom IP listed in there. If it's only on some nodes *perhaps* restarting the node with the JVM option -Dcassandra.load_ring_state=false *may* help. That will stop the node from loading it's save ring state and force it to get it via gossip. Again, if there are other nodes with the phantom IP it may just get it again. I'll do some digging and try to get back to you. This pops up from time to time and thinking out loud I wonder if it would be possible to add a new application state that purges an IP from the ring. e.g. VersionedValue.STATUS_PURGED that works with a ttl so it goes through X number of gossip rounds and then disappears. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 19:58, Jonathan Colby wrote: > @Aaron - > > Unfortunately I'm still seeing message like: " is down", > removing from gossip, although with not the same frequency. > > And repair/move jobs don't seem to try to stream data to the removed node > anymore. > > Anyone know how to totally purge any stored gossip/endpoint data on nodes > that were removed from the cluster. Or what might be happening here > otherwise? > > > On May 26, 2011, at 9:10 AM, aaron morton wrote: > >> cool. I was going to suggest that but as you already had the move running I >> thought it may be a little drastic. >> >> Did it show any progress ? If the IP address is not responding there should >> have been some sort of error. >> >> Cheers >> >> - >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 26 May 2011, at 15:28, jonathan.co...@gmail.com wrote: >> >>> Seems like it had something to do with stale endpoint information. I did a >>> rolling restart of the whole cluster and that seemed to trigger the nodes >>> to remove the node that was decommissioned. >>> >>> On , aaron morton wrote: Is it showing progress ? It may just be a problem with the information printed out. Can you check from the other nodes in the cluster to see if they are receiving the stream ? cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 00:42, Jonathan Colby wrote: > I recently removed a node (with decommission) from our cluster. > > I added a couple new nodes and am now trying to rebalance the cluster > using nodetool move. > > However, netstats shows that the node being "moved" is trying to stream > data to the node that I already decommissioned yesterday. > > The removed node was powered-off, taken out of dns, its IP is not even > pingable. It was never a seed neither. > > This is cassandra 0.7.5 on 64bit linux. How do I tell the cluster that > this node is gone? Gossip should have detected this. The ring commands > shows the correct cluster IPs. > > Here is a portion of netstats. 10.46.108.102 is the node which was > removed. > > Mode: Leaving: streaming data to other nodes > Streaming to: /10.46.108.102 > /var/lib/cassandra/data/DFS/main-f-1064-Data.db/(4681027,5195491),(5195491,15308570),(15308570,15891710),(16336750,20558705),(20558705,29112203),(29112203,36279329),(36465942,36623223),(36740457,37227058),(37227058,42206994),(42206994,47380294),(47635053,47709813),(47709813,48353944),(48621287,49406499),(53330048,53571312),(53571312,54153922),(54153922,59857615),(59857615,61029910),(61029910,61871509),(62190800,62498605),(62824281,62964830),(63511604,64353114),(64353114,64760400),(65174702,65919771),(65919771,66435630),(81440029,81725949),(81725949,83313847),(83313847,83908709),(88983863,89237303),(89237303,89934199),(89934199,97 > ... > 5693491,14795861666),(14795861666,14796105318),(14796105318,14796366886),(14796699825,14803874941),(14803874941,14808898331),(14808898331,14811670699),(14811670699,14815125177),(14815125177,14819765003),(14820229433,14820858266) > progress=280574376402/12434049900 - 2256% > . > > > Note 10.46.108.102 is NOT part of the ring. > > Address Status State LoadOwnsToken
Re: Forcing Cassandra to free up some space
Which JVM? Which collector? There have been and continue to be many. Hotspot itself supports a number of different collectors with different behaviors. Many of them do not collect every candidate on every gc, but merely the easiest ones to find. This is why depending on finalizers is a *bad* idea in java code. They may well never get run. (Finalizer is one of a few features the Sun Java team always regretted putting in Java to start with. It has caused quite a few application problems over the years) The really important thing is that NONE of these behaviors of the colelctors are guaranteed by specification not to change from version to version. Basing your code on non-specified behaviors is a good way to hit mysterious failures on updates. For instance, in the mid 90s, IBM had a mode of their Vm called "infinite heap." it *never* garbage collected, even if you called System.gc. Instead it just threw away address space and counted on the total memory needs for the life of the program being less then the total addressable space of the processor. It was *very* fast for certain kinds of applications. Far from being pedantic, not depending on undocumented behavior is simply good engineering. On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis wrote: > I've read the relevant source. While you're pedantically correct re > the spec, you're wrong as to what the JVM actually does. > > On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman wrote: >> Some references... >> >> "An object enters an unreachable state when no more strong references >> to it exist. When an object is unreachable, it is a candidate for >> collection. Note the wording: Just because an object is a candidate >> for collection doesn't mean it will be immediately collected. The JVM >> is free to delay collection until there is an immediate need for the >> memory being consumed by the object." >> >> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 >> >> and "Calling the gc method suggests that the Java Virtual Machine >> expend effort toward recycling unused objects" >> >> http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() >> >> It goes on to say that the VM will make a "best effort", but "best >> effort" is *deliberately* left up to the definition of the gc >> implementor. >> >> I guess you missed the many lectures I have given on this subject over >> the years at Java One Conferences >> >> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: >>> It's a common misunderstanding that system.gc is only a suggestion; on >>> any VM you're likely to run Cassandra on, System.gc will actually >>> invoke a full collection. >>> >>> On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman wrote: Actually this is no gaurantee. Its a common misunderstanding that System.gc "forces" gc. It does not. It is a suggestion only. The vm always has the option as to when and how much it gcs On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >> >> >> -- >> It's always darkest just before you are eaten by a grue. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- It's always darkest just before you are eaten by a grue.
Re: OOM recovering failed node with many CFs
We've applied a fix to the 0.7 branch in https://issues.apache.org/jira/browse/CASSANDRA-2714. The patch probably applies to 0.7.6 as well. On Thu, May 26, 2011 at 11:36 AM, Flavio Baronti wrote: > I tried the manual copy you suggest, but the SystemTable.checkHealth() > function > complains it can't load the system files. Log follows, I will gather some > more > info and create a ticket as soon as possible. > > INFO [main] 2011-05-26 18:25:36,147 AbstractCassandraDaemon.java Logging > initialized > INFO [main] 2011-05-26 18:25:36,172 AbstractCassandraDaemon.java Heap size: > 4277534720/4277534720 > INFO [main] 2011-05-26 18:25:36,174 CLibrary.java JNA not found. Native > methods will be disabled. > INFO [main] 2011-05-26 18:25:36,190 DatabaseDescriptor.java Loading > settings from file:/C:/Cassandra/conf/hscassandra9170.yaml > INFO [main] 2011-05-26 18:25:36,344 DatabaseDescriptor.java DiskAccessMode > 'auto' determined to be mmap, indexAccessMode is mmap > INFO [main] 2011-05-26 18:25:36,532 SSTableReader.java Opening > G:\Cassandra\data\system\Schema-f-2746 > INFO [main] 2011-05-26 18:25:36,577 SSTableReader.java Opening > G:\Cassandra\data\system\Schema-f-2729 > INFO [main] 2011-05-26 18:25:36,590 SSTableReader.java Opening > G:\Cassandra\data\system\Schema-f-2745 > INFO [main] 2011-05-26 18:25:36,599 SSTableReader.java Opening > G:\Cassandra\data\system\Migrations-f-2167 > INFO [main] 2011-05-26 18:25:36,600 SSTableReader.java Opening > G:\Cassandra\data\system\Migrations-f-2131 > INFO [main] 2011-05-26 18:25:36,602 SSTableReader.java Opening > G:\Cassandra\data\system\Migrations-f-1041 > INFO [main] 2011-05-26 18:25:36,603 SSTableReader.java Opening > G:\Cassandra\data\system\Migrations-f-1695 > ERROR [main] 2011-05-26 18:25:36,634 AbstractCassandraDaemon.java Fatal > exception during initialization > org.apache.cassandra.config.ConfigurationException: Found system table > files, but they couldn't be loaded. Did you change the partitioner? > at > org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:236) > at > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:127) > at > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) > at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) > > > Il 5/26/2011 6:04 PM, Jonathan Ellis ha scritto: >> >> Sounds like a legitimate bug, although looking through the code I'm >> not sure what would cause a tight retry loop on migration >> announce/rectify. Can you create a ticket at >> https://issues.apache.org/jira/browse/CASSANDRA ? >> >> As a workaround, I would try manually copying the Migrations and >> Schema sstable files from the system keyspace of the live node, then >> restart the recovering one. >> >> On Thu, May 26, 2011 at 9:27 AM, Flavio Baronti >> wrote: >>> >>> I can't seem to be able to recover a failed node on a database where i >>> did >>> many updates to the schema. >>> >>> I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, >>> but >>> it can't be changed right now), and ReplicationFactor=2. >>> I shut down a node and cleaned its data entirely, then tried to bring it >>> back up. The node starts fetching schema updates from the live node, but >>> the >>> operation fails halfway with an OOME. >>> After some investigation, what I found is that: >>> >>> - I have a lot of schema updates (there are 2067 rows in the >>> system.Schema >>> CF). >>> - The live node loads migrations 1-1000, and sends them to the recovering >>> node (Migration.getLocalMigrations()) >>> - Soon afterwards, the live node checks the schema version on the >>> recovering >>> node and finds it has moved by a little - say it has applied the first 3 >>> migrations. It then loads migrations 3-1003, and sends them to the node. >>> - This process is repeated very quickly (sends migrations 6-1006, 9-1009, >>> etc). >>> >>> Analyzing the memory dump and the logs, it looks like each of these 1000 >>> migration blocks are composed in a single message and sent to the >>> OutboundTcpConnection queue. However, since the schema is big, the >>> messages >>> occupy a lot of space, and are built faster than the connection can send >>> them. Therefore, they accumulate in OutboundTcpConnection.queue, until >>> memory is completely filled. >>> >>> Any suggestions? Can I change something to make this work, apart from >>> reducing the number of CFs? >>> >>> Flavio >>> >> >> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Forcing Cassandra to free up some space
I've read the relevant source. While you're pedantically correct re the spec, you're wrong as to what the JVM actually does. On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman wrote: > Some references... > > "An object enters an unreachable state when no more strong references > to it exist. When an object is unreachable, it is a candidate for > collection. Note the wording: Just because an object is a candidate > for collection doesn't mean it will be immediately collected. The JVM > is free to delay collection until there is an immediate need for the > memory being consumed by the object." > > http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 > > and "Calling the gc method suggests that the Java Virtual Machine > expend effort toward recycling unused objects" > > http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() > > It goes on to say that the VM will make a "best effort", but "best > effort" is *deliberately* left up to the definition of the gc > implementor. > > I guess you missed the many lectures I have given on this subject over > the years at Java One Conferences > > On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: >> It's a common misunderstanding that system.gc is only a suggestion; on >> any VM you're likely to run Cassandra on, System.gc will actually >> invoke a full collection. >> >> On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman wrote: >>> Actually this is no gaurantee. Its a common misunderstanding that >>> System.gc "forces" gc. It does not. It is a suggestion only. The vm always >>> has the option as to when and how much it gcs >>> >>> On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > It's always darkest just before you are eaten by a grue. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Forcing Cassandra to free up some space
Some references... "An object enters an unreachable state when no more strong references to it exist. When an object is unreachable, it is a candidate for collection. Note the wording: Just because an object is a candidate for collection doesn't mean it will be immediately collected. The JVM is free to delay collection until there is an immediate need for the memory being consumed by the object." http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 and "Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects" http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() It goes on to say that the VM will make a "best effort", but "best effort" is *deliberately* left up to the definition of the gc implementor. I guess you missed the many lectures I have given on this subject over the years at Java One Conferences On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: > It's a common misunderstanding that system.gc is only a suggestion; on > any VM you're likely to run Cassandra on, System.gc will actually > invoke a full collection. > > On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman wrote: >> Actually this is no gaurantee. Its a common misunderstanding that >> System.gc "forces" gc. It does not. It is a suggestion only. The vm always >> has the option as to when and how much it gcs >> >> On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- It's always darkest just before you are eaten by a grue.
Re: Forcing Cassandra to free up some space
Im sorry. This was my business at Sun. You are certainly wrong about the Hotspot VM. See this chapter of my book http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis wrote: > It's a common misunderstanding that system.gc is only a suggestion; on > any VM you're likely to run Cassandra on, System.gc will actually > invoke a full collection. > > On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman wrote: >> Actually this is no gaurantee. Its a common misunderstanding that >> System.gc "forces" gc. It does not. It is a suggestion only. The vm always >> has the option as to when and how much it gcs >> >> On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- It's always darkest just before you are eaten by a grue.
Re: Forcing Cassandra to free up some space
It's a common misunderstanding that system.gc is only a suggestion; on any VM you're likely to run Cassandra on, System.gc will actually invoke a full collection. On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman wrote: > Actually this is no gaurantee. Its a common misunderstanding that > System.gc "forces" gc. It does not. It is a suggestion only. The vm always > has the option as to when and how much it gcs > > On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote: > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Forcing Cassandra to free up some space
Actually this is no gaurantee. Its a common misunderstanding that System.gc "forces" gc. It does not. It is a suggestion only. The vm always has the option as to when and how much it gcs On May 26, 2011 2:51 PM, "Jonathan Ellis" wrote:
Re: PHP CQL Driver
yep, works perfectly @ http://caqel.deadcafe.org/ I will try my luck @ phpcassa. Thanks for your time gentlemen. On Thu, May 26, 2011 at 8:59 PM, Sasha Dolgy wrote: > maybe you'd have more luck discussing this on the phpcassa list? > https://groups.google.com/forum/#!forum/phpcassa > > more experience there with PHP and Cassandra ... > > Are you able to validate the query works when not using PHP? > > On Thu, May 26, 2011 at 8:51 PM, Kwasi Gyasi - Agyei > wrote: > > got system in debug mode > > > > the following query fails > > --- > > > > CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey ) WITH comparator > = > > text AND default_validation = text > > > > PHP error reads > > - > > > > #0 > > > /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra_execute_cql_query_result.php(52): > > TBase->_read('Cassandra_execu...', Array, Object(TBinaryProtocol)) #1 > > > /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra.client.php(1771): > > > cassandra_Cassandra_execute_cql_query_result->read(Object(TBinaryProtocol)) > > #2 > > > /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra.client.php(1731): > > CassandraClient->recv_execute_cql_query() #3 > > /Volumes/DATA/Project/libs/php/phpCQL/test/index.php(34): > > CassandraClient->execute_cql_query('CREATE COLUMNFA...', 2) #4 {main} > > > > Cassandra logs read > > -- > > > > DEBUG 20:48:10,659 Disseminating load info ... > > DEBUG 20:49:10,661 Disseminating load info ... > > DEBUG 20:49:22,867 CQL statement type: USE > > DEBUG 20:49:22,870 logged out: # > > > > > > here is the code I'm using to test > > > > > > phpCQLAutoloader::register(); > > > > $socketPool = new TSocketPool(); > > $socketPool->addServer( "127.0.0.1", 9160 ); > > $socketPool->setDebug( true ); > > > > $framedTransport = new TFramedTransport( $socketPool, true, true ); > > $bufferedProtocol = new TBinaryProtocol( $framedTransport, true, true ); > > //new TBinaryProtocolAccelerated( $framedTransport ); > > $cassandraClient = new CassandraClient( $bufferedProtocol, > > $bufferedProtocol ); > > > > try{ > > > > echo "opening connection "; > > $framedTransport->open(); > > > > try{ > > > > echo "settign keyspace to use "; > > $result = $cassandraClient->execute_cql_query( "use nnduronic" , > > cassandra_Compression::NONE); > > print_r( $result ); > > > > }catch( cassandra_InvalidRequestException $exrs ){ > > > > echo "USE error occuired -- " . $exrs->getTraceAsString() . > " > > "; > > } > > > > try{ > > > > echo "Executing create column query "; > > $query = "CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, > > monkey ) WITH comparator = text AND default_validation = text"; > > $result = $cassandraClient->execute_cql_query( $query , > > cassandra_Compression::NONE ); > > > > echo "|". print_r($result) . "|" . ""; > > > > }catch( cassandra_InvalidRequestException $exrs ){ > > echo "COLUMNFAMILY error occuired -- " . > > $exrs->getTraceAsString() . " "; > > } > > echo "closing connnection "; > > $framedTransport->close(); > > > > > > I'm lost :( > > > > On Thu, May 26, 2011 at 9:17 AM, aaron morton > > wrote: > >> > >> Cool, this may be a better discussion for the client-dev list > >> http://www.mail-archive.com/client-dev@cassandra.apache.org/ > >> > >> I would start by turning up the server logging to DEBUG and watching > your > >> update / select queries. > >> > >> Cheers > >> - > >> Aaron Morton > >> Freelance Cassandra Developer > >> @aaronmorton > >> http://www.thelastpickle.com > >> On 26 May 2011, at 16:15, Kwasi Gyasi - Agyei wrote: > >> > >> Hi, > >> > >> I have manged to generate thrift interface for php along with > implementing > >> auto-loading of both Cassandra and thrift core class. > >> > >> However during my testing the only query that works as expected is the > >> create keyspace cql query... all other queries don't do or return any > >> results nor do they throw exceptions even in try catch statement I get > >> nothing. > >> > >> -- > >> 4Things > >> Multimedia and Communication | Property | Entertainment > >> Kwasi Owusu Gyasi - Agyei > >> > >> cell(+27) (0) 76 466 4488 > >> website www.4things.co.za > >> email kwasi.gyasiag...@4things.co.za > >> skypekwasi.gyasiagyei > >> roleDeveloper.Designer.Software Architect > >> > > > > > > > > -- > > 4Things > > Multimedia and Communication | Property | Entertainment > > Kwasi Owusu Gyasi - Agyei > > > > cell(+27) (0) 76 466 4488 > > website www.4things.co.za > > email kwasi.gyasiag...@4things.co.za > > skypekwasi.gyasiagyei > > roleDeveloper.Designer.Software Architect > > > > > > -- > Sasha Dolgy > sasha.do...@gmail.com > -- *4T
Re: PHP CQL Driver
maybe you'd have more luck discussing this on the phpcassa list? https://groups.google.com/forum/#!forum/phpcassa more experience there with PHP and Cassandra ... Are you able to validate the query works when not using PHP? On Thu, May 26, 2011 at 8:51 PM, Kwasi Gyasi - Agyei wrote: > got system in debug mode > > the following query fails > --- > > CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey ) WITH comparator = > text AND default_validation = text > > PHP error reads > - > > #0 > /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra_execute_cql_query_result.php(52): > TBase->_read('Cassandra_execu...', Array, Object(TBinaryProtocol)) #1 > /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra.client.php(1771): > cassandra_Cassandra_execute_cql_query_result->read(Object(TBinaryProtocol)) > #2 > /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra.client.php(1731): > CassandraClient->recv_execute_cql_query() #3 > /Volumes/DATA/Project/libs/php/phpCQL/test/index.php(34): > CassandraClient->execute_cql_query('CREATE COLUMNFA...', 2) #4 {main} > > Cassandra logs read > -- > > DEBUG 20:48:10,659 Disseminating load info ... > DEBUG 20:49:10,661 Disseminating load info ... > DEBUG 20:49:22,867 CQL statement type: USE > DEBUG 20:49:22,870 logged out: # > > > here is the code I'm using to test > > > phpCQLAutoloader::register(); > > $socketPool = new TSocketPool(); > $socketPool->addServer( "127.0.0.1", 9160 ); > $socketPool->setDebug( true ); > > $framedTransport = new TFramedTransport( $socketPool, true, true ); > $bufferedProtocol = new TBinaryProtocol( $framedTransport, true, true ); > //new TBinaryProtocolAccelerated( $framedTransport ); > $cassandraClient = new CassandraClient( $bufferedProtocol, > $bufferedProtocol ); > > try{ > > echo "opening connection "; > $framedTransport->open(); > > try{ > > echo "settign keyspace to use "; > $result = $cassandraClient->execute_cql_query( "use nnduronic" , > cassandra_Compression::NONE); > print_r( $result ); > > }catch( cassandra_InvalidRequestException $exrs ){ > > echo "USE error occuired -- " . $exrs->getTraceAsString() . " > "; > } > > try{ > > echo "Executing create column query "; > $query = "CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, > monkey ) WITH comparator = text AND default_validation = text"; > $result = $cassandraClient->execute_cql_query( $query , > cassandra_Compression::NONE ); > > echo "|". print_r($result) . "|" . ""; > > }catch( cassandra_InvalidRequestException $exrs ){ > echo "COLUMNFAMILY error occuired -- " . > $exrs->getTraceAsString() . " "; > } > echo "closing connnection "; > $framedTransport->close(); > > > I'm lost :( > > On Thu, May 26, 2011 at 9:17 AM, aaron morton > wrote: >> >> Cool, this may be a better discussion for the client-dev list >> http://www.mail-archive.com/client-dev@cassandra.apache.org/ >> >> I would start by turning up the server logging to DEBUG and watching your >> update / select queries. >> >> Cheers >> - >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> On 26 May 2011, at 16:15, Kwasi Gyasi - Agyei wrote: >> >> Hi, >> >> I have manged to generate thrift interface for php along with implementing >> auto-loading of both Cassandra and thrift core class. >> >> However during my testing the only query that works as expected is the >> create keyspace cql query... all other queries don't do or return any >> results nor do they throw exceptions even in try catch statement I get >> nothing. >> >> -- >> 4Things >> Multimedia and Communication | Property | Entertainment >> Kwasi Owusu Gyasi - Agyei >> >> cell (+27) (0) 76 466 4488 >> website www.4things.co.za >> email kwasi.gyasiag...@4things.co.za >> skype kwasi.gyasiagyei >> role Developer.Designer.Software Architect >> > > > > -- > 4Things > Multimedia and Communication | Property | Entertainment > Kwasi Owusu Gyasi - Agyei > > cell (+27) (0) 76 466 4488 > website www.4things.co.za > email kwasi.gyasiag...@4things.co.za > skype kwasi.gyasiagyei > role Developer.Designer.Software Architect > -- Sasha Dolgy sasha.do...@gmail.com
Re: PHP CQL Driver
got system in debug mode the following query fails --- CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey ) WITH comparator = text AND default_validation = text PHP error reads - #0 /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra_execute_cql_query_result.php(52): TBase->_read('Cassandra_execu...', Array, Object(TBinaryProtocol)) #1 /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra.client.php(1771): cassandra_Cassandra_execute_cql_query_result->read(Object(TBinaryProtocol)) #2 /Volumes/DATA/Project/libs/php/phpCQL/vendor/cassandra/cassandra.Cassandra.client.php(1731): CassandraClient->recv_execute_cql_query() #3 /Volumes/DATA/Project/libs/php/phpCQL/test/index.php(34): CassandraClient->execute_cql_query('CREATE COLUMNFA...', 2) #4 {main} Cassandra logs read -- DEBUG 20:48:10,659 Disseminating load info ... DEBUG 20:49:10,661 Disseminating load info ... DEBUG 20:49:22,867 CQL statement type: USE DEBUG 20:49:22,870 logged out: # here is the code I'm using to test phpCQLAutoloader::register(); $socketPool = new TSocketPool(); $socketPool->addServer( "127.0.0.1", 9160 ); $socketPool->setDebug( true ); $framedTransport = new TFramedTransport( $socketPool, true, true ); $bufferedProtocol = new TBinaryProtocol( $framedTransport, true, true ); //new TBinaryProtocolAccelerated( $framedTransport ); $cassandraClient = new CassandraClient( $bufferedProtocol, $bufferedProtocol ); try{ echo "opening connection "; $framedTransport->open(); try{ echo "settign keyspace to use "; $result = $cassandraClient->execute_cql_query( "use nnduronic" , cassandra_Compression::NONE); print_r( $result ); }catch( cassandra_InvalidRequestException $exrs ){ echo "USE error occuired -- " . $exrs->getTraceAsString() . " "; } try{ echo "Executing create column query "; $query = "CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey ) WITH comparator = text AND default_validation = text"; $result = $cassandraClient->execute_cql_query( $query , cassandra_Compression::NONE ); echo "|". print_r($result) . "|" . ""; }catch( cassandra_InvalidRequestException $exrs ){ echo "COLUMNFAMILY error occuired -- " . $exrs->getTraceAsString() . " "; } echo "closing connnection "; $framedTransport->close(); I'm lost :( On Thu, May 26, 2011 at 9:17 AM, aaron morton wrote: > Cool, this may be a better discussion for the client-dev list > http://www.mail-archive.com/client-dev@cassandra.apache.org/ > > I would start by turning up the server logging to DEBUG and watching your > update / select queries. > > Cheers > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 26 May 2011, at 16:15, Kwasi Gyasi - Agyei wrote: > > Hi, > > I have manged to generate thrift interface for php along with implementing > auto-loading of both Cassandra and thrift core class. > > However during my testing the only query that works as expected is the > create keyspace cql query... all other queries don't do or return any > results nor do they throw exceptions even in try catch statement I get > nothing. > > -- > *4Things* > Multimedia and Communication | Property | Entertainment > Kwasi Owusu Gyasi - Agyei > > *cell*(+27) (0) 76 466 4488 > *website *www.4things.co.za > *email *kwasi.gyasiag...@4things.co.za > *skype*kwasi.gyasiagyei > *role*Developer.Designer.Software Architect > > > -- *4Things* Multimedia and Communication | Property | Entertainment Kwasi Owusu Gyasi - Agyei *cell*(+27) (0) 76 466 4488 *website *www.4things.co.za *email *kwasi.gyasiag...@4things.co.za *skype*kwasi.gyasiagyei *role*Developer.Designer.Software Architect
Re: Forcing Cassandra to free up some space
You'd have to call system.gc via JMX. https://issues.apache.org/jira/browse/CASSANDRA-2521 is open to address this, btw. On Thu, May 26, 2011 at 1:09 PM, Konstantin Naryshkin wrote: > I have a basic understanding of how Cassandra handles the file system > (flushes in Memtables out to SSTables, SSTables get compacted) and I > understand that old files are only deleted when a node is restarted, when > Java does a GC, or when Cassandra feels like it is running out of space. > > My question is, is there some way for us to hurry the process along? We have > a data that we do a lot of inserts into and then delete the data several > hours later. We would like it if we could free up disk space (since our > disks, though large, are shared with other applications). So far, the action > sequence to accomplish this is: > nodetoo flush -> nodetool repair -> nodetool compact -> ?? > > Is there a way for me to make (or even gently suggest to) Cassandra that it > may be a good time to free up some space? > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Forcing Cassandra to free up some space
I have a basic understanding of how Cassandra handles the file system (flushes in Memtables out to SSTables, SSTables get compacted) and I understand that old files are only deleted when a node is restarted, when Java does a GC, or when Cassandra feels like it is running out of space. My question is, is there some way for us to hurry the process along? We have a data that we do a lot of inserts into and then delete the data several hours later. We would like it if we could free up disk space (since our disks, though large, are shared with other applications). So far, the action sequence to accomplish this is: nodetoo flush -> nodetool repair -> nodetool compact -> ?? Is there a way for me to make (or even gently suggest to) Cassandra that it may be a good time to free up some space?
Re: OOM recovering failed node with many CFs
I tried the manual copy you suggest, but the SystemTable.checkHealth() function complains it can't load the system files. Log follows, I will gather some more info and create a ticket as soon as possible. INFO [main] 2011-05-26 18:25:36,147 AbstractCassandraDaemon.java Logging initialized INFO [main] 2011-05-26 18:25:36,172 AbstractCassandraDaemon.java Heap size: 4277534720/4277534720 INFO [main] 2011-05-26 18:25:36,174 CLibrary.java JNA not found. Native methods will be disabled. INFO [main] 2011-05-26 18:25:36,190 DatabaseDescriptor.java Loading settings from file:/C:/Cassandra/conf/hscassandra9170.yaml INFO [main] 2011-05-26 18:25:36,344 DatabaseDescriptor.java DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2011-05-26 18:25:36,532 SSTableReader.java Opening G:\Cassandra\data\system\Schema-f-2746 INFO [main] 2011-05-26 18:25:36,577 SSTableReader.java Opening G:\Cassandra\data\system\Schema-f-2729 INFO [main] 2011-05-26 18:25:36,590 SSTableReader.java Opening G:\Cassandra\data\system\Schema-f-2745 INFO [main] 2011-05-26 18:25:36,599 SSTableReader.java Opening G:\Cassandra\data\system\Migrations-f-2167 INFO [main] 2011-05-26 18:25:36,600 SSTableReader.java Opening G:\Cassandra\data\system\Migrations-f-2131 INFO [main] 2011-05-26 18:25:36,602 SSTableReader.java Opening G:\Cassandra\data\system\Migrations-f-1041 INFO [main] 2011-05-26 18:25:36,603 SSTableReader.java Opening G:\Cassandra\data\system\Migrations-f-1695 ERROR [main] 2011-05-26 18:25:36,634 AbstractCassandraDaemon.java Fatal exception during initialization org.apache.cassandra.config.ConfigurationException: Found system table files, but they couldn't be loaded. Did you change the partitioner? at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:236) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:127) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) Il 5/26/2011 6:04 PM, Jonathan Ellis ha scritto: Sounds like a legitimate bug, although looking through the code I'm not sure what would cause a tight retry loop on migration announce/rectify. Can you create a ticket at https://issues.apache.org/jira/browse/CASSANDRA ? As a workaround, I would try manually copying the Migrations and Schema sstable files from the system keyspace of the live node, then restart the recovering one. On Thu, May 26, 2011 at 9:27 AM, Flavio Baronti wrote: I can't seem to be able to recover a failed node on a database where i did many updates to the schema. I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but it can't be changed right now), and ReplicationFactor=2. I shut down a node and cleaned its data entirely, then tried to bring it back up. The node starts fetching schema updates from the live node, but the operation fails halfway with an OOME. After some investigation, what I found is that: - I have a lot of schema updates (there are 2067 rows in the system.Schema CF). - The live node loads migrations 1-1000, and sends them to the recovering node (Migration.getLocalMigrations()) - Soon afterwards, the live node checks the schema version on the recovering node and finds it has moved by a little - say it has applied the first 3 migrations. It then loads migrations 3-1003, and sends them to the node. - This process is repeated very quickly (sends migrations 6-1006, 9-1009, etc). Analyzing the memory dump and the logs, it looks like each of these 1000 migration blocks are composed in a single message and sent to the OutboundTcpConnection queue. However, since the schema is big, the messages occupy a lot of space, and are built faster than the connection can send them. Therefore, they accumulate in OutboundTcpConnection.queue, until memory is completely filled. Any suggestions? Can I change something to make this work, apart from reducing the number of CFs? Flavio
Re: OOM recovering failed node with many CFs
Sounds like a legitimate bug, although looking through the code I'm not sure what would cause a tight retry loop on migration announce/rectify. Can you create a ticket at https://issues.apache.org/jira/browse/CASSANDRA ? As a workaround, I would try manually copying the Migrations and Schema sstable files from the system keyspace of the live node, then restart the recovering one. On Thu, May 26, 2011 at 9:27 AM, Flavio Baronti wrote: > I can't seem to be able to recover a failed node on a database where i did > many updates to the schema. > > I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but > it can't be changed right now), and ReplicationFactor=2. > I shut down a node and cleaned its data entirely, then tried to bring it > back up. The node starts fetching schema updates from the live node, but the > operation fails halfway with an OOME. > After some investigation, what I found is that: > > - I have a lot of schema updates (there are 2067 rows in the system.Schema > CF). > - The live node loads migrations 1-1000, and sends them to the recovering > node (Migration.getLocalMigrations()) > - Soon afterwards, the live node checks the schema version on the recovering > node and finds it has moved by a little - say it has applied the first 3 > migrations. It then loads migrations 3-1003, and sends them to the node. > - This process is repeated very quickly (sends migrations 6-1006, 9-1009, > etc). > > Analyzing the memory dump and the logs, it looks like each of these 1000 > migration blocks are composed in a single message and sent to the > OutboundTcpConnection queue. However, since the schema is big, the messages > occupy a lot of space, and are built faster than the connection can send > them. Therefore, they accumulate in OutboundTcpConnection.queue, until > memory is completely filled. > > Any suggestions? Can I change something to make this work, apart from > reducing the number of CFs? > > Flavio > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
OOM recovering failed node with many CFs
I can't seem to be able to recover a failed node on a database where i did many updates to the schema. I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but it can't be changed right now), and ReplicationFactor=2. I shut down a node and cleaned its data entirely, then tried to bring it back up. The node starts fetching schema updates from the live node, but the operation fails halfway with an OOME. After some investigation, what I found is that: - I have a lot of schema updates (there are 2067 rows in the system.Schema CF). - The live node loads migrations 1-1000, and sends them to the recovering node (Migration.getLocalMigrations()) - Soon afterwards, the live node checks the schema version on the recovering node and finds it has moved by a little - say it has applied the first 3 migrations. It then loads migrations 3-1003, and sends them to the node. - This process is repeated very quickly (sends migrations 6-1006, 9-1009, etc). Analyzing the memory dump and the logs, it looks like each of these 1000 migration blocks are composed in a single message and sent to the OutboundTcpConnection queue. However, since the schema is big, the messages occupy a lot of space, and are built faster than the connection can send them. Therefore, they accumulate in OutboundTcpConnection.queue, until memory is completely filled. Any suggestions? Can I change something to make this work, apart from reducing the number of CFs? Flavio
Re: EC2 node adding trouble
On 26 May 2011, at 15:21, Sasha Dolgy wrote: > Turn the node off, remove the node from the ring using nodetool and > removetoken i've found this to be the best problem-free way. > Maybe it's better now ... > http://blog.sasha.dolgy.com/2011/03/apache-cassandra-nodetool.html So I'd need to have at least replication=2 in order to do that safely? Your article makes it sound like draining/decommission doesn't work? Has anyone automated node addition/removal using chef or similar? Marcus
Re: EC2 node adding trouble
On Thu, May 26, 2011 at 3:12 PM, Marcus Bointon wrote: > I'd like to make sure I've got the right sequence of operations for adding a > node without downtime. If I'm going from 2 to 3 nodes: > > 1 Calculate new initial_token values using the python script > 2 Change token values in existing nodes and restart them > 3 Install/configure new node > 4 Insert new node's token value > 5 Set new node to auto-bootstrap > 6 Start cassandra on new node > 7 Wait for the ring to rebalance > > With token changes (using values from the python script), it's clear that all > nodes will have some data moved. Does this mean that there's a possibility of > overlap between regions if token changes are not absolutely simultaneous on > all nodes? That sounds dangerous to me... Or shouldn't token values be > changed on nodes containing data? > nodetool repair is good. when we add new nodes, we add a new one without specifying the new token. after everything is up and healthy, we determine new tokens and see if there is a need to renumber nodes. if we do, we do one at a time and wait until the nodetool repair is finished on one node before moving to another > Is there a corresponding sequence for removing nodes? I'm guessing draining > is involved. Turn the node off, remove the node from the ring using nodetool and removetoken i've found this to be the best problem-free way. Maybe it's better now ... http://blog.sasha.dolgy.com/2011/03/apache-cassandra-nodetool.html
Re: EC2 node adding trouble
As an aside, you can also use that command to pull meta-data about instances in AWS. I have implemented this to maintain a list of seed nodes. This way, when a new instance is brought online, the default cassandra.yaml is `enhanced` to contain a dynamic list of valid seeds, proper hostname and a few other bits of useful information. Finally, if you aren't using a single security group for all of your cassandra instances, maybe this may be of help to you. When we add new nodes to our ring, we add them to a single cassandra security group. No messing about with security groups per instance... -sd On Thu, May 26, 2011 at 2:36 PM, Marcus Bointon wrote: > Thanks for all your helpful suggestions - I've now got it working. It was > down to a combination of things. > > 1. A missing rule in a security group > 2. A missing DNS name for the new node, so its default name was defaulting to > localhost > 3. Google DNS caching the failed DNS lookup for the full duration of the > SOA's TTL > > In order to avoid the whole problem with assigning IPs using the > internal/external trick and using up elastic IPs, I found this service which > I'd not seen before: > http://www.ducea.com/2009/06/01/howto-update-dns-hostnames-automatically-for-your-amazon-ec2-instances/ > > This means you can reliably set (and reset as necessary) a listen address > with this command: > > sed -i "s/^listen_address:.*/listen_address: `curl > http://169.254.169.254/latest/meta-data/local-ipv4`/"; > /etc/cassandra/cassandra.yaml > > It's not quite as good as having a true dynamic hostname, but at least you > can drop it in a startup script and forget it. > > Marcus
Re: EC2 node adding trouble
On 24 May 2011, at 23:58, Sameer Farooqui wrote: > So, once you know what token each of the 3 nodes should have, shut down the > first two nodes, change their tokens and add the correct token to the 3rd > node (in the YAML file). I'd like to make sure I've got the right sequence of operations for adding a node without downtime. If I'm going from 2 to 3 nodes: 1 Calculate new initial_token values using the python script 2 Change token values in existing nodes and restart them 3 Install/configure new node 4 Insert new node's token value 5 Set new node to auto-bootstrap 6 Start cassandra on new node 7 Wait for the ring to rebalance With token changes (using values from the python script), it's clear that all nodes will have some data moved. Does this mean that there's a possibility of overlap between regions if token changes are not absolutely simultaneous on all nodes? That sounds dangerous to me... Or shouldn't token values be changed on nodes containing data? Can cassandra nodes restart without downtime? I'm looking at http://wiki.apache.org/cassandra/MultinodeCluster but as it says it's deliberately simplistic. Is there a corresponding sequence for removing nodes? I'm guessing draining is involved. Marcus
Re: EC2 node adding trouble
Thanks for all your helpful suggestions - I've now got it working. It was down to a combination of things. 1. A missing rule in a security group 2. A missing DNS name for the new node, so its default name was defaulting to localhost 3. Google DNS caching the failed DNS lookup for the full duration of the SOA's TTL In order to avoid the whole problem with assigning IPs using the internal/external trick and using up elastic IPs, I found this service which I'd not seen before: http://www.ducea.com/2009/06/01/howto-update-dns-hostnames-automatically-for-your-amazon-ec2-instances/ This means you can reliably set (and reset as necessary) a listen address with this command: sed -i "s/^listen_address:.*/listen_address: `curl http://169.254.169.254/latest/meta-data/local-ipv4`/"; /etc/cassandra/cassandra.yaml It's not quite as good as having a true dynamic hostname, but at least you can drop it in a startup script and forget it. Marcus
Re: Corrupted Counter Columns
Some additional information on the settings: I'm using CL.ONE for both reading and writing; and replicate_on_write is true on the Counters CF. I think the problem occurs after a restart when the commitlogs are read. On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu wrote: > Hello, > > I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes. > > Strangely counters are corrupted. Say, the actual value should be : 51664 > and the value that cassandra sometimes outputs is: either 51664 or 18651001. > > And I have no idea on how to diagnose the problem or reproduce it. > > Can you help me in fixing this issue? > > Regards, > Utku >
Corrupted Counter Columns
Hello, I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes. Strangely counters are corrupted. Say, the actual value should be : 51664 and the value that cassandra sometimes outputs is: either 51664 or 18651001. And I have no idea on how to diagnose the problem or reproduce it. Can you help me in fixing this issue? Regards, Utku
Re: Priority queue in a single row - performance falls over time
persistent [priority] queues are better suited to something like HornetQ than Cassandra. On Wed, May 25, 2011 at 9:10 PM, Dan Kuebrich wrote: > It sounds like the problem is that the row is getting filled up with > tombstones and becoming enormous? Another idea then, which might not be > worth the added complexity, is to progressively use new rows. Depending on > volume, this could mean having 5-minute-window rows, or 1 minute, or > whatever works best. > > Read: Assuming you're not falling behind, you only need to query the row > that the current time falls in and the one immediately prior. If you do > fall behind, you'll have to walk backwards in buckets until you find them > empty. > Write: Write column to the bucket (row) that corresponds to the correct > time window. > Delete: Delete the column from the row it was read from. When all columns > in the row are deleted the row can GC. > > Again, cassandra might not be the correct datastore. > > On Wed, May 25, 2011 at 3:56 PM, Jonathan Ellis wrote: > >> You're basically intentionally inflicting the worst case scenario on >> the Cassandra storage engine: >> http://wiki.apache.org/cassandra/DistributedDeletes >> >> You could play around with reducing gc_grace_seconds but a PQ with >> "millions" of items is something you should probably just do in memory >> these days. >> >> On Wed, May 25, 2011 at 10:43 AM, wrote: >> > >> > >> > Hi all, >> > >> > I'm trying to implement a priority queue for holding a large number >> (millions) >> > of items that need to be processed in time order. My solution works - >> but gets >> > slower and slower until performance is unacceptable - even with a small >> number >> > of items. >> > >> > Each item essentially needs to be popped off the queue (some arbitrary >> work is >> > then done) and then the item is returned to the queue with a new >> timestamp >> > indicating when it should be processed again. We thus cycle through all >> work >> > items eventually, but some may come around more frequently than others. >> > >> > I am implementing this as a single Cassandra row, in a CF with a >> TimeUUID >> > comparator. >> > >> > Each column name is a TimeUUID, with an arbitrary column value >> describing the >> > work item; the columns are thus sorted in time order. >> > >> > To pop items, I do a get() such as: >> > >> > cf.get(row_key, column_finish=now, column_start=yesterday, >> column_count=1000) >> > >> > to get all the items at the head of the queue (if any) whose time >> exceeds the >> > current system time. >> > >> > For each item retrieved, I do a delete to remove the old column, then an >> insert >> > with a fresh TimeUUID column name (system time + arbitrary increment), >> thus >> > putting the item back somewhere in the queue (currently, the back of the >> queue) >> > >> > I do a batch_mutate for all these deletes and inserts, with a queue size >> of >> > 2000. These are currently interleaved i.e. >> delete1-insert1-delete2-insert2... >> > >> > This all appears to work correctly, but the performance starts at around >> 8000 >> > cycles/sec, falls to around 1800/sec over the first 250K cycles, and >> continues >> > to fall over time, down to about 150/sec, after a few million cycles. >> This >> > happens regardless of the overall size of the row (I have tried sizes >> from 1000 >> > to 100,000 items). My target performance is 1000 cycles/sec (but my data >> store >> > will need to handle other work concurrently). >> > >> > I am currently using just a single node running on localhost, using a >> pycassa >> > client. 4 core, 4GB machine, Fedora 14. >> > >> > Is this expected behaviour (is there just too much churn for a single >> row to >> > perform well), or am I doing something wrong? >> > >> > Would https://issues.apache.org/jira/browse/CASSANDRA-2583 in version >> 0.8.1 fix >> > this problem (I am using version 0.7.6)? >> > >> > Thanks! >> > >> > David. >> > >> > >> > This message was sent using IMP, the Internet Messaging Program. >> > >> > This email and any attachments to it may be confidential and are >> > intended solely for the use of the individual to whom it is addressed. >> > If you are not the intended recipient of this email, you must neither >> > take any action based upon its contents, nor copy or show it to anyone. >> > Please contact the sender if you believe you have received this email in >> > error. QinetiQ may monitor email traffic data and also the content of >> > email for the purposes of security. QinetiQ Limited (Registered in >> > England & Wales: Company Number: 3796233) Registered office: Cody >> Technology >> > Park, Ively Road, Farnborough, Hampshire, GU14 0LX >> http://www.qinetiq.com. >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > -- - Pa
Re: How to programmatically index an existed column?
Hi Aaron, Thank you for your reminder. I've found out the solution myself, and I share it here: KeyspaceDefinition keyspaceDefinition = cluster.describeKeyspace(KEYSPACE); ColumnFamilyDefinition cdf = keyspaceDefinition.getCfDefs().get(0); BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(cdf); BasicColumnDefinition bcdf = new BasicColumnDefinition(); bcdf.setName(StringSerializer.get().toByteBuffer("birthyear")); bcdf.setIndexName("birthyearidx"); bcdf.setIndexType(ColumnIndexType.KEYS); bcdf.setValidationClass(ComparatorType.LONGTYPE.getClassName()); columnFamilyDefinition.addColumnDefinition(bcdf); cluster.updateColumnFamily(new ThriftCfDef(columnFamilyDefinition)); -- Dikang Gu 0086 - 18611140205 On Thursday, May 26, 2011 at 3:16 PM, aaron morton wrote: > Please post to one list at a time. Otherwise people may spend their time > helping you when someone already has. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 26 May 2011, at 17:35, Dikang Gu wrote: > > > > > I want to build a secondary index on an existed column, how to > > programmatically do this using hector API? > > > > Thanks. > > > > -- > > Dikang Gu > > 0086 - 18611140205 >
Re: nodetool move trying to stream data to node no longer in cluster
@Aaron - Unfortunately I'm still seeing message like: " is down", removing from gossip, although with not the same frequency. And repair/move jobs don't seem to try to stream data to the removed node anymore. Anyone know how to totally purge any stored gossip/endpoint data on nodes that were removed from the cluster. Or what might be happening here otherwise? On May 26, 2011, at 9:10 AM, aaron morton wrote: > cool. I was going to suggest that but as you already had the move running I > thought it may be a little drastic. > > Did it show any progress ? If the IP address is not responding there should > have been some sort of error. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 26 May 2011, at 15:28, jonathan.co...@gmail.com wrote: > >> Seems like it had something to do with stale endpoint information. I did a >> rolling restart of the whole cluster and that seemed to trigger the nodes to >> remove the node that was decommissioned. >> >> On , aaron morton wrote: >>> Is it showing progress ? It may just be a problem with the information >>> printed out. >>> >>> >>> >>> Can you check from the other nodes in the cluster to see if they are >>> receiving the stream ? >>> >>> >>> >>> cheers >>> >>> >>> >>> - >>> >>> Aaron Morton >>> >>> Freelance Cassandra Developer >>> >>> @aaronmorton >>> >>> http://www.thelastpickle.com >>> >>> >>> >>> On 26 May 2011, at 00:42, Jonathan Colby wrote: >>> >>> >>> I recently removed a node (with decommission) from our cluster. >>> >>> I added a couple new nodes and am now trying to rebalance the cluster using nodetool move. >>> >>> However, netstats shows that the node being "moved" is trying to stream data to the node that I already decommissioned yesterday. >>> >>> The removed node was powered-off, taken out of dns, its IP is not even pingable. It was never a seed neither. >>> >>> This is cassandra 0.7.5 on 64bit linux. How do I tell the cluster that this node is gone? Gossip should have detected this. The ring commands shows the correct cluster IPs. >>> >>> Here is a portion of netstats. 10.46.108.102 is the node which was removed. >>> >>> Mode: Leaving: streaming data to other nodes >>> Streaming to: /10.46.108.102 >>> /var/lib/cassandra/data/DFS/main-f-1064-Data.db/(4681027,5195491),(5195491,15308570),(15308570,15891710),(16336750,20558705),(20558705,29112203),(29112203,36279329),(36465942,36623223),(36740457,37227058),(37227058,42206994),(42206994,47380294),(47635053,47709813),(47709813,48353944),(48621287,49406499),(53330048,53571312),(53571312,54153922),(54153922,59857615),(59857615,61029910),(61029910,61871509),(62190800,62498605),(62824281,62964830),(63511604,64353114),(64353114,64760400),(65174702,65919771),(65919771,66435630),(81440029,81725949),(81725949,83313847),(83313847,83908709),(88983863,89237303),(89237303,89934199),(89934199,97 >>> ... >>> 5693491,14795861666),(14795861666,14796105318),(14796105318,14796366886),(14796699825,14803874941),(14803874941,14808898331),(14808898331,14811670699),(14811670699,14815125177),(14815125177,14819765003),(14820229433,14820858266) >>> progress=280574376402/12434049900 - 2256% >>> . >>> >>> >>> Note 10.46.108.102 is NOT part of the ring. >>> >>> Address Status State LoadOwnsToken >>> 148873535527910577765226390751398592512 >>> 10.46.108.100 Up Normal 71.73 GB12.50% 0 >>> 10.46.108.101 Up Normal 109.69 GB 12.50% 21267647932558653966460912964485513216 >>> 10.47.108.100 Up Leaving 281.95 GB 37.50% 85070591730234615865843651857942052863 10.47.108.102 Up Normal 210.77 GB 0.00% 85070591730234615865843651857942052864 >>> 10.47.108.101 Up Normal 289.59 GB 16.67% 113427455640312821154458202477256070484 >>> 10.46.108.103 Up Normal 299.87 GB 8.33% 127605887595351923798765477786913079296 >>> 10.47.108.103 Up Normal 94.99 GB12.50% 148873535527910577765226390751398592511 >>> 10.46.108.104 Up Normal 103.01 GB 0.00% 148873535527910577765226390751398592512 >>> >>> >>> >>> >>> >>> >
Re: EC2 node adding trouble
On 26 May 2011, at 00:17, aaron morton wrote: > I've seen discussion of using the EIP but I do not have direct experience. The idea is not to use the external IP, but the external DNS name because of this very useful trick (please excuse me if you already know this!): Say the DNS name of an elastic IP assigned to an instance is ec2-50-18-223-109.compute-1.amazonaws.com, then from outside EC2: #host ec2-50-18-223-109.compute-1.amazonaws.com ec2-50-18-223-109.compute-1.amazonaws.com has address 50.18.223.109 But from inside EC2: #host ec2-50-18-223-109.compute-1.amazonaws.com ec2-50-18-223-109.compute-1.amazonaws.com has address 10.126.13.22 If you suspend and resume an instance, its internal IP will change, but the external will not (if it's assigned the same elastic IP), but if you use the external name, you'll get consistent behaviour whatever happens. This is extremely useful! Of course it would be extremely useful if we could get behaviour like this without having to assign an elastic IP, as this is a waste of IPs otherwise. Marcus
Re: PHP CQL Driver
Cool, this may be a better discussion for the client-dev list http://www.mail-archive.com/client-dev@cassandra.apache.org/ I would start by turning up the server logging to DEBUG and watching your update / select queries. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 16:15, Kwasi Gyasi - Agyei wrote: > Hi, > > I have manged to generate thrift interface for php along with implementing > auto-loading of both Cassandra and thrift core class. > > However during my testing the only query that works as expected is the create > keyspace cql query... all other queries don't do or return any results nor do > they throw exceptions even in try catch statement I get nothing. > > -- > 4Things > Multimedia and Communication | Property | Entertainment > Kwasi Owusu Gyasi - Agyei > > cell(+27) (0) 76 466 4488 > website www.4things.co.za > email kwasi.gyasiag...@4things.co.za > skypekwasi.gyasiagyei > roleDeveloper.Designer.Software Architect
Re: How to programmatically index an existed column?
Please post to one list at a time. Otherwise people may spend their time helping you when someone already has. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 17:35, Dikang Gu wrote: > > I want to build a secondary index on an existed column, how to > programmatically do this using hector API? > > Thanks. > > -- > Dikang Gu > 0086 - 18611140205
Re: nodetool move trying to stream data to node no longer in cluster
cool. I was going to suggest that but as you already had the move running I thought it may be a little drastic. Did it show any progress ? If the IP address is not responding there should have been some sort of error. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 15:28, jonathan.co...@gmail.com wrote: > Seems like it had something to do with stale endpoint information. I did a > rolling restart of the whole cluster and that seemed to trigger the nodes to > remove the node that was decommissioned. > > On , aaron morton wrote: > > Is it showing progress ? It may just be a problem with the information > > printed out. > > > > > > > > Can you check from the other nodes in the cluster to see if they are > > receiving the stream ? > > > > > > > > cheers > > > > > > > > - > > > > Aaron Morton > > > > Freelance Cassandra Developer > > > > @aaronmorton > > > > http://www.thelastpickle.com > > > > > > > > On 26 May 2011, at 00:42, Jonathan Colby wrote: > > > > > > > > > I recently removed a node (with decommission) from our cluster. > > > > > > > > > > I added a couple new nodes and am now trying to rebalance the cluster > > > using nodetool move. > > > > > > > > > > However, netstats shows that the node being "moved" is trying to stream > > > data to the node that I already decommissioned yesterday. > > > > > > > > > > The removed node was powered-off, taken out of dns, its IP is not even > > > pingable. It was never a seed neither. > > > > > > > > > > This is cassandra 0.7.5 on 64bit linux. How do I tell the cluster that > > > this node is gone? Gossip should have detected this. The ring commands > > > shows the correct cluster IPs. > > > > > > > > > > Here is a portion of netstats. 10.46.108.102 is the node which was > > > removed. > > > > > > > > > > Mode: Leaving: streaming data to other nodes > > > > > Streaming to: /10.46.108.102 > > > > > > > > /var/lib/cassandra/data/DFS/main-f-1064-Data.db/(4681027,5195491),(5195491,15308570),(15308570,15891710),(16336750,20558705),(20558705,29112203),(29112203,36279329),(36465942,36623223),(36740457,37227058),(37227058,42206994),(42206994,47380294),(47635053,47709813),(47709813,48353944),(48621287,49406499),(53330048,53571312),(53571312,54153922),(54153922,59857615),(59857615,61029910),(61029910,61871509),(62190800,62498605),(62824281,62964830),(63511604,64353114),(64353114,64760400),(65174702,65919771),(65919771,66435630),(81440029,81725949),(81725949,83313847),(83313847,83908709),(88983863,89237303),(89237303,89934199),(89934199,97 > > > > > ... > > > > > 5693491,14795861666),(14795861666,14796105318),(14796105318,14796366886),(14796699825,14803874941),(14803874941,14808898331),(14808898331,14811670699),(14811670699,14815125177),(14815125177,14819765003),(14820229433,14820858266) > > > > > progress=280574376402/12434049900 - 2256% > > > > > . > > > > > > > > > > > > > > > Note 10.46.108.102 is NOT part of the ring. > > > > > > > > > > Address Status State LoadOwnsToken > > > > > > > > 148873535527910577765226390751398592512 > > > > > 10.46.108.100 Up Normal 71.73 GB12.50% 0 > > > > > 10.46.108.101 Up Normal 109.69 GB 12.50% > > > 21267647932558653966460912964485513216 > > > > > 10.47.108.100 Up Leaving 281.95 GB 37.50% > > > 85070591730234615865843651857942052863 > > > 10.47.108.102 Up Normal 210.77 GB 0.00% > > > 85070591730234615865843651857942052864 > > > > > 10.47.108.101 Up Normal 289.59 GB 16.67% > > > 113427455640312821154458202477256070484 > > > > > 10.46.108.103 Up Normal 299.87 GB 8.33% > > > 127605887595351923798765477786913079296 > > > > > 10.47.108.103 Up Normal 94.99 GB12.50% > > > 148873535527910577765226390751398592511 > > > > > 10.46.108.104 Up Normal 103.01 GB 0.00% > > > 148873535527910577765226390751398592512 > > > > > > > > > > > > > > > > > > > > >