Get cassandra SuperColumn only!
Hi, I have a cassandra datastore as follows: key:{ supercol (utf8) : { subcol (timuuid) : data } } Now, for a particular usecase I want to do slice on 2 levels. Firstly on supercols then from the selected supercols results slice subcols (mostly to restrict no of items fetched in mem). I have tried various API's and there doesn't seem to be a way to do this. The reason being when I slice supercols i get the subcols in the result too! Now, ofcourse, I can add another index as follows: key : { supercol (utf8) : emptydata } } Haven't looked at cassandra storage in too detail - but hoping there is a better solution! Thanks in advance.
0.7 live schema updates
Hi! I like the new feature of making live schema updates. You can add, drop and rename columns and keyspaces via thrift, but how do you modify column attributes like key_cache_size or rows_cached? Thank you.
Re: 0.7 live schema updates
You can change these attrs using JMX interface. Take a look at org.apache.cassandra.tools.NodeProbe setCacheCapacities method.
busy thread on IncomingStreamReader
Hi - has anyone made any progress with this issue? We are having the same problem with our Cassandra nodes in production. At some point a node (and sometimes all 3) will jump to 100% CPU usage and stay there for hours until restarted. Stack traces reveal several threads in a seemingly endless loop doing this: Thread-21770 - Thread t...@25278 java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileChannelImpl.size0(Native Method) at sun.nio.ch.FileChannelImpl.size(Unknown Source) - locked java.lang.obj...@7a2c843d at sun.nio.ch.FileChannelImpl.transferFrom(Unknown Source) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) My understanding from reading the code is that this trace shows a thread belonging to the StreamingService which is writing an incoming stream to disk. There seems to be some kind of bizzare problem which is causing the FileChannel.size() function to spin with high CPU. Also, this problem is not easy to replicate - so I would appreciate any information on how the StreamingService works and what triggers it to transfer these file streams. Thanks, Joseph Mermelstein LivePerson http://solutions.liveperson.com i all, We setup two nodes and simply set replication factor=2 for test run. After both nodes, say, node A and node B, serve several hours, we found that node A always keep 300% cpu usage. (the other node is under 100% cpu, which is normal) thread dump on node A shows that there are 3 busy threads related to IncomingStreamReader: == Thread-66 prio=10 tid=0x2aade4018800 nid=0x69e7 runnable [0x4030a000] java.lang.Thread.State: RUNNABLE at sun.misc.Unsafe.setMemory(Native Method) at sun.nio.ch.Util.erase(Util.java:202) at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) Thread-65 prio=10 tid=0x2aade4017000 nid=0x69e6 runnable [0x4d44b000] java.lang.Thread.State: RUNNABLE at sun.misc.Unsafe.setMemory(Native Method) at sun.nio.ch.Util.erase(Util.java:202) at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) Thread-62 prio=10 tid=0x2aade4014800 nid=0x4150 runnable [0x4d34a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileChannelImpl.size0(Native Method) at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:309) - locked 0x2aaac450dcd0 (a java.lang.Object) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:597) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) === Is there anyone experience similar issue ? environments: OS --- CentOS 5.4, Linux 2.6.18-164.15.1.el5 SMP x86_64 GNU/Linux Java --- build 1.6.0_16-b01, Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Cassandra --- 0.6.0 Node configuration --- node A and node B. both nodes use node A as Seed client --- Java thrift clients pick one node randomly to do read and write. -- Ingram Chen online share order: http://dinbendon.net blog: http://www.javaworld.com.tw/roller/page/ingramchen
Getting client only example to work
Hi I am using 0.7.0-beta1 , and trying to get the contrib/client_only example to work. I am running cassandra on host1, and trying to access it from host2. When using thirft (via cassandra-cli) and in my application; I am able to connect and do all operations as expected. But I am not able to connect to cassandra when using the code in client_only (or far that matter using contrib/bmt_example). Since my test requires to do bulk insertion of about 1.4 TB of data, so I need to use a non-thirft interface. The error that I am getting is follows (the keyspace and the column family exist and can be used via Thirft) : 10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap 10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip Exception in thread main java.lang.IllegalArgumentException: Unknown ColumnFamily Standard1 in keyspace Keyspace1 at org .apache .cassandra .config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:1009) at org .apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java: 418) at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103) at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187) I am using the following code (from client_only example) (also passing JVM parameter -Dstorage-config=path_2_cassandra.yaml) public static void main(String[] args) throws Exception { System.setProperty(storage-config,cassandra.yaml); testWriting(); } // from client_only example private static void testWriting() throws Exception { StorageService.instance.initClient(); // sleep for a bit so that gossip can do its thing. try { Thread.sleep(1L); } catch (Exception ex) { throw new AssertionError(ex); } // do some writing. final AbstractType comp = ColumnFamily.getComparatorFor(Keyspace1, Standard1, null); for (int i = 0; i 100; i++) { RowMutation change = new RowMutation(Keyspace1, (key + i).getBytes()); ColumnPath cp = new ColumnPath(Standard1).setColumn((colb).getBytes()); change.add(new QueryPath(cp), (value + i).getBytes(), new TimestampClock(0)); // don't call change.apply(). The reason is that is makes a static call into Table, which will perform // local storage initialization, which creates local directories. // change.apply(); StorageProxy.mutate(Arrays.asList(change)); System.out.println(wrote key + i); } System.out.println(Done writing.); StorageService.instance.stopClient(); }
RE: 0.7 live schema updates
But you'll loose these settings after Cassandra restart. -Original Message- From: Oleg Anastasyev [mailto:olega...@gmail.com] Sent: Thursday, September 16, 2010 11:21 AM To: user@cassandra.apache.org Subject: Re: 0.7 live schema updates You can change these attrs using JMX interface. Take a look at org.apache.cassandra.tools.NodeProbe setCacheCapacities method.
IndexingLocking in Cassandra
Hello, I have a few questions about indexing and locking in Cassandra: - if I understood well only row level indexing exists prior to v0.7. I mean only the primary keys are indexed. Is that true? - is it possible to use composite primary keys? For instance I have a user object: User(name,birthday,gender,address) and I want to have the (name,birthday) columns as PK. Can I do? If yes, how? - does Cassandra support CF (table) level locking? Couls someone explain me/provide a link how? Thanks in advance, Sandor
Re: IndexingLocking in Cassandra
Hello, I have a few questions about indexing and locking in Cassandra: - if I understood well only row level indexing exists prior to v0.7. I mean only the primary keys are indexed. Is that true? Yes and no. The row name is the key which you use to fetch the row from cassandra. There are methods to iterate thru rows but that's not efficient and should be used only in batch operations. Columns inside rows are sorted by their names so they are also indexes as you use the column name to fetch the contents of the column. If you want to index data by other ways you need to build your own application code which maintains such indexes and the upcoming 0.7 version will bring some handy features which makes the coders job much easier. - is it possible to use composite primary keys? For instance I have a user object: User(name,birthday,gender,address) and I want to have the (name,birthday) columns as PK. Can I do? If yes, how? You can always create your row key as a string like $name_$birthday. Did this answer to your question? - does Cassandra support CF (table) level locking? Couls someone explain me/provide a link how? No, cassandra doesn't have any locking capabilities. You can always use some external locking mechanism like zookeeper [http://hadoop.apache.org/zookeeper/] or implement your own sollution on top of cassandra (not recommended as it's quite hard to get it correctly). - Juho Mäkinen / Garo
RE: IndexingLocking in Cassandra
Thanks for your fast answer. Regarding to the composite keys: that's what I thought by default I just needed some confirmation. Unfortunately I can not use this approach in our application so I will figure out something else. I will check out Zookeeper to see if I can use it. Thanks again! Hello, I have a few questions about indexing and locking in Cassandra: - if I understood well only row level indexing exists prior to v0.7. I mean only the primary keys are indexed. Is that true? Yes and no. The row name is the key which you use to fetch the row from cassandra. There are methods to iterate thru rows but that's not efficient and should be used only in batch operations. Columns inside rows are sorted by their names so they are also indexes as you use the column name to fetch the contents of the column. If you want to index data by other ways you need to build your own application code which maintains such indexes and the upcoming 0.7 version will bring some handy features which makes the coders job much easier. - is it possible to use composite primary keys? For instance I have a user object: User(name,birthday,gender,address) and I want to have the (name,birthday) columns as PK. Can I do? If yes, how? You can always create your row key as a string like $name_$birthday. Did this answer to your question? - does Cassandra support CF (table) level locking? Couls someone explain me/provide a link how? No, cassandra doesn't have any locking capabilities. You can always use some external locking mechanism like zookeeper [http://hadoop.apache.org/zookeeper/] or implement your own sollution on top of cassandra (not recommended as it's quite hard to get it correctly). - Juho Mäkinen / Garo
Re: Getting client only example to work
I discovered some problems with the fat client earlier this week when I tried using it. It needs some fixes to keep up with all the 0.7 changes. Gary. On Thu, Sep 16, 2010 at 05:48, Asif Jan asif@gmail.com wrote: Hi I am using 0.7.0-beta1 , and trying to get the contrib/client_only example to work. I am running cassandra on host1, and trying to access it from host2. When using thirft (via cassandra-cli) and in my application; I am able to connect and do all operations as expected. But I am not able to connect to cassandra when using the code in client_only (or far that matter using contrib/bmt_example). Since my test requires to do bulk insertion of about 1.4 TB of data, so I need to use a non-thirft interface. The error that I am getting is follows (the keyspace and the column family exist and can be used via Thirft) : 10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap 10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip Exception in thread main java.lang.IllegalArgumentException: Unknown ColumnFamily Standard1 in keyspace Keyspace1 at org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:1009) at org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:418) at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103) at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187) I am using the following code (from client_only example) (also passing JVM parameter -Dstorage-config=path_2_cassandra.yaml) public static void main(String[] args) throws Exception { System.setProperty(storage-config,cassandra.yaml); testWriting(); } // from client_only example private static void testWriting() throws Exception { StorageService.instance.initClient(); // sleep for a bit so that gossip can do its thing. try { Thread.sleep(1L); } catch (Exception ex) { throw new AssertionError(ex); } // do some writing. final AbstractType comp = ColumnFamily.getComparatorFor(Keyspace1, Standard1, null); for (int i = 0; i 100; i++) { RowMutation change = new RowMutation(Keyspace1, (key + i).getBytes()); ColumnPath cp = new ColumnPath(Standard1).setColumn((colb).getBytes()); change.add(new QueryPath(cp), (value + i).getBytes(), new TimestampClock(0)); // don't call change.apply(). The reason is that is makes a static call into Table, which will perform // local storage initialization, which creates local directories. // change.apply(); StorageProxy.mutate(Arrays.asList(change)); System.out.println(wrote key + i); } System.out.println(Done writing.); StorageService.instance.stopClient(); }
Re: 0.7 live schema updates
beta-2 will include the ability to set these values and others. Look for the system_update_column_family() and system_update_keyspace() methods. Gary. On Thu, Sep 16, 2010 at 02:38, Marc Canaleta mcanal...@gmail.com wrote: Hi! I like the new feature of making live schema updates. You can add, drop and rename columns and keyspaces via thrift, but how do you modify column attributes like key_cache_size or rows_cached? Thank you.
Re: Build an index to for join query
Alvin - assuming I understand what you're after correctly, why not make a CF Name_Address(name, address). Modifying the Cassandra methods to do the join you describe seems like overkill to me... -Paul On Sep 15, 2010, at 7:34 PM, Alvin UW wrote: Hello, I am going to build an index to join two CFs. First, we see this index as a CF/SCF. The difference is I don't materialise it. Assume we have two tables: ID_Address(Id, address) , Name_ID(name, id) Then,the index is: Name_Address(name, address) When the application tries to query on Name_Address, the value of name is given by the application. I want to direct the read operation to Name_ID to get Id value, then go to ID_Address to get the address value by the Id value. So far, I consider only the read operation. By this way, the join query is transparent to the user. So I think I should find out which methods or classes are in charge of the read operation in the above operation. For example, the operation in cassandra CLI get Keyspace1.Standard2['jsmith'] calls exactly which methods in the server side? I noted CassandraServer is used to listen to clients, and there are some methods such as get(), get_slice(). Is it the right place I can modify to implement my idea? Thanks. Alvin
Pb with memtable_throughput_in_mb?
Hi, I am trying out the latest trunk version and I get an error when starting Cassandra with -Xmx3G: Fatal error: memtable_operations_in_millions must be a positive double I guess it is caused by line 76 in org/apache/cassandra/config/Config.java [0]: public Integer memtable_throughput_in_mb = (int) Runtime.getRuntime().maxMemory() / 8; The cast to (int) is done on maxMemory() but this method returns a long, leading to a cast to a negative integer for mem=3G for instance. Thus memtable_operations_in_millions becomes negative (Double memtable_operations_in_millions = memtable_throughput_in_mb / 64 * 0.3) and the exception is thrown: maxMemory() is measured in bytes but I guess memtable_throughput_in_mb should in MB (as it names imply), which is not the case here. What do you think? Thanks for any input you have to this, Cheers [0] http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/config/Config.java
Re: Pb with memtable_throughput_in_mb?
On Thu, Sep 16, 2010 at 11:00 AM, Thomas Boucher ethx...@gmail.com wrote: Hi, I am trying out the latest trunk version and I get an error when starting Cassandra with -Xmx3G: Fatal error: memtable_operations_in_millions must be a positive double I guess it is caused by line 76 in org/apache/cassandra/config/Config.java [0]: public Integer memtable_throughput_in_mb = (int) Runtime.getRuntime().maxMemory() / 8; The cast to (int) is done on maxMemory() but this method returns a long, leading to a cast to a negative integer for mem=3G for instance. Thus memtable_operations_in_millions becomes negative (Double memtable_operations_in_millions = memtable_throughput_in_mb / 64 * 0.3) and the exception is thrown: maxMemory() is measured in bytes but I guess memtable_throughput_in_mb should in MB (as it names imply), which is not the case here. Oops, good catch. Fixed in r997841. -Brandon
Buildding a Ubuntu / Debian package for Cassandra
Guys, I am trying to build a debian package in order to deploy Cassandra 0.6.5 on Ubuntu. I see that you have a ./debian directory in the source builds, do you have a bit more background on how it is used and build? P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help. Thanks, FR Francois Richard
Re: Buildding a Ubuntu / Debian package for Cassandra
Hello Francois, There are already .debs available here: http://wiki.apache.org/cassandra/DebianPackaging I've also setup a PPA to build the packages on Ubuntu here: https://launchpad.net/~cassandra-ubuntu/+archive/stable Its currently still at v0.6.4, but I am in the process of uploading 0.6.5 as I write this email.. The .debs are nearly identical. The only difference is that I've packaged the jars necessary to build, so that you get the same exact versions of all libraries if you need to patch + repeat the build. Also, these are built specifically for Ubuntu releases, so if we find any incompatibilities between debian/ubuntu we can fix them for ubuntu users. I hope this helps! On Sep 16, 2010, at 10:30 AM, Francois Richard wrote: Guys, I am trying to build a debian package in order to deploy Cassandra 0.6.5 on Ubuntu. I see that you have a ./debian directory in the source builds, do you have a bit more background on how it is used and build? P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help. Thanks, FR Francois Richard
Re: Get cassandra SuperColumn only!
AFAIK there is no way to get a list of the super columns, without also getting the sub columns. I do not know if there is a technical reason that would prevent this from being added.In general it's more efficient to make 1 request that pulls back more data, than two or more than pull back just enough data. But you also want to design to answer the queries you need to make.Keeping an index of super column names in another CF does not sound too bad. it might pay to take another look at why you are using a super CF. It may be better to use two standard CF's if say you want to have one sort of request that gets a list of things, and another sort of request that gets the details for a number of things.AaronOn 16 Sep, 2010,at 07:25 PM, Saurabh Raje saur...@webaroo.com wrote:Hi,I have a cassandra datastore as follows: key:{supercol (utf8) : { subcol (timuuid) : data}}Now, for a particular usecase I want to do slice on 2 levels. Firstlyon supercols then from the selected supercols results slice subcols (mostly to restrict no of items fetched in mem). I have tried variousAPI's and there doesn't seem to be a way to do this. The reason beingwhen I slice supercols i get the subcols in the result too! Now, ofcourse, I can add another index as follows:key : { supercol (utf8) : emptydata}}Haven't looked at cassandra storage in too detail - but hoping thereis a better solution! Thanks in advance
RE: Buildding a Ubuntu / Debian package for Cassandra
Thanks Clint, I am going to look-up the links below, I am pretty new on the DEB packaging in general and from what I have seen so far, a lot of the tutorial on the web are mostly based on classic [ .configure | make | make install ] of an application built in C. In this case I wanted to figure out the DEB packaging in the context of a Java application. I'll read on more and will stay in touch. My goal at the end of the day, is to install the stock package for Cassandra and then to create a special Cassandra-config package that would move and deploy my customized configuration files on the system. Thanks, FR -Original Message- From: Clint Byrum [mailto:cl...@ubuntu.com] Sent: Thursday, September 16, 2010 10:54 AM To: user@cassandra.apache.org Subject: Re: Buildding a Ubuntu / Debian package for Cassandra Hello Francois, There are already .debs available here: http://wiki.apache.org/cassandra/DebianPackaging I've also setup a PPA to build the packages on Ubuntu here: https://launchpad.net/~cassandra-ubuntu/+archive/stable Its currently still at v0.6.4, but I am in the process of uploading 0.6.5 as I write this email.. The .debs are nearly identical. The only difference is that I've packaged the jars necessary to build, so that you get the same exact versions of all libraries if you need to patch + repeat the build. Also, these are built specifically for Ubuntu releases, so if we find any incompatibilities between debian/ubuntu we can fix them for ubuntu users. I hope this helps! On Sep 16, 2010, at 10:30 AM, Francois Richard wrote: Guys, I am trying to build a debian package in order to deploy Cassandra 0.6.5 on Ubuntu. I see that you have a ./debian directory in the source builds, do you have a bit more background on how it is used and build? P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help. Thanks, FR Francois Richard
Re: Getting client only example to work
ok, did something about the message service changed in the initClient method; essentially now one can not call initClient when a cassandra instance is running on the same machine. thanks On Sep 16, 2010, at 3:48 PM, Gary Dusbabek wrote: I discovered some problems with the fat client earlier this week when I tried using it. It needs some fixes to keep up with all the 0.7 changes. Gary. On Thu, Sep 16, 2010 at 05:48, Asif Jan asif@gmail.com wrote: Hi I am using 0.7.0-beta1 , and trying to get the contrib/client_only example to work. I am running cassandra on host1, and trying to access it from host2. When using thirft (via cassandra-cli) and in my application; I am able to connect and do all operations as expected. But I am not able to connect to cassandra when using the code in client_only (or far that matter using contrib/bmt_example). Since my test requires to do bulk insertion of about 1.4 TB of data, so I need to use a non- thirft interface. The error that I am getting is follows (the keyspace and the column family exist and can be used via Thirft) : 10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap 10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip Exception in thread main java.lang.IllegalArgumentException: Unknown ColumnFamily Standard1 in keyspace Keyspace1 at org .apache .cassandra .config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java: 1009) at org .apache .cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:418) at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103) at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187) I am using the following code (from client_only example) (also passing JVM parameter -Dstorage-config=path_2_cassandra.yaml) public static void main(String[] args) throws Exception { System.setProperty(storage-config,cassandra.yaml); testWriting(); } // from client_only example private static void testWriting() throws Exception { StorageService.instance.initClient(); // sleep for a bit so that gossip can do its thing. try { Thread.sleep(1L); } catch (Exception ex) { throw new AssertionError(ex); } // do some writing. final AbstractType comp = ColumnFamily.getComparatorFor(Keyspace1, Standard1, null); for (int i = 0; i 100; i++) { RowMutation change = new RowMutation(Keyspace1, (key + i).getBytes()); ColumnPath cp = new ColumnPath(Standard1).setColumn((colb).getBytes()); change.add(new QueryPath(cp), (value + i).getBytes(), new TimestampClock(0)); // don't call change.apply(). The reason is that is makes a static call into Table, which will perform // local storage initialization, which creates local directories. // change.apply(); StorageProxy.mutate(Arrays.asList(change)); System.out.println(wrote key + i); } System.out.println(Done writing.); StorageService.instance.stopClient(); }
Re: Bootstrapping stays stuck
Thanks to driftx from cassandra IRC channel for helping out. This was resolved by increasing the rpc timeout for the bootstrap process. On Wed, Sep 15, 2010 at 11:43 AM, Gurpreet Singh gurpreet.si...@gmail.comwrote: This problem still stays unresolved despite numerous restarts to the cluster. I cant seem to find a way out of this one, and I am not really looking for a workaround, kinda need this to work if i need to go to production. Turned on the ALL logging in log4j, and now I see the following exception (EOFException) on the destination. After receiving each file, it seems to be throwing this exception. The transfer is successful except for this exception. The source successful declares the transfer complete. But the destination does not move out of the bootstrapping mode, and just sits there. DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line 65) Receiving stream: finished reading chunk, awaiting more DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line 87) Removing stream context /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369 DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java (line 73) Sending a streaming finished message with org.apache.cassandra.streaming.completedfilesta...@54828e7 to IP1 TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line 82) eof reading from socket; closing java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59) DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line 51) Receiving stream DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line 54) Creating file for /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line 65) Receiving stream: finished reading chunk, awaiting more DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line 87) Removing stream context /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045 DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java (line 73) Sending a streaming finished message with org.apache.cassandra.streaming.completedfilesta...@7b41a32f to IP1 TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line 82) eof reading from socket; closing java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59) /G On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Hi Vineet, I have tracked the nodetool streams to completion each time. Below are the logs on the source and destination node. There are 3 sstables being transferred, and the transfer seems to be successful. However, after the streams finish, the source prints out messages about the dropped messages, which may point to the problem. ideas? I checked port 7000 is open for communication. 9160 is not up on the node being bootstrapped, but that comes up after the node is bootstrapped, is that right? Thanks a ton, /G *Logs on the source node (IP2):* * * INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79) Flushing memtables for userdata... INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95) Performing anticompaction ... INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java (line 339) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')] INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129) GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is 4415946752 INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java (line 396) AntiCompacted to /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db. 49074138589/36770836242 bytes for 5990912 keys. Time: 1438607ms. INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java (line 339) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')] INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java (line 396) AntiCompacted to /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db. 28185244/21126422 bytes for 47722 keys. Time: 2310ms. INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java (line 339) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')] INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.java (line 396) AntiCompacted to
questions on cassandra (repair and multi-datacenter)
Hi, I have a few questions and was looking for an answer. I have a cluster of 7 Cassandra 0.6.5 nodes in my test setup. RF=2. Original data size is about 100 gigs, with RF=2, i see the total load on the cluster is about 200 gigs, all good. 1. I was looking to increase the RF to 3. This process entails changing the config and calling repair on the keyspace one at a time, right? So, I started with one node at a time, changed the config file on the first node for the keyspace, restarted the node. And then called a nodetool repair on the node. These same steps i followed for every node after that, as I read somewhere that the repair should be invoked one node at a time. (a) What is the best way to ascertain if the repair is completed on a node? (b) After the repair was finished, I was expecting the total data load to be 300 gigs. However, calling the ring command, shows the total load to be 370 gigs. I double checked and config on all machines says RF=3. I am calling a cleanup on each node right now. Is the cleanup required after calling a repair? Am i missing something? 2. This question is regarding multi-datacenter support. I plan to have a cluster of 6 machines across 2 datacenters, with the machines from the datacenters alternating on the ring. RF=3 is the plan. I already have a test setup as described above, which has most of the data, but its still configured on the default RackUnAware strategy. I was hoping to find the right steps to move it to RackAware strategy with the PropertyFileEndpointSnitch that I read somewhere (not sure if thats supported in 0.6.5, but CustomEndPointSnitch is the same, right?), all this without having to repopulate any data again. Currently there is only 1 datacenter, but I was stil planning to set the cluster up as it would be in multi-datacenter support, and run it like that in the one datacenter, and when the second datacenter comes up, just copy all the files across to the new nodes in the second datacenter, and bring the whole cluster up. Will this work ? I have tried copying files to a new node, shutting down all nodes, and bringing back everything up, and it recognized the new ips. Thanks Gurpreet
What the thrift version cassandra 0.7 beta uses?
What the thrift version cassandra 0.7 beta uses? -- Best regards, Ivy Tang
Re: What the thrift version cassandra 0.7 beta uses?
It doesn't use a specific version - it uses a specific subversion revision. The revision number is appended to the thrift jar in the cassandra lib folder. On Sep 16, 2010, at 9:10 PM, Ying Tang wrote: What the thrift version cassandra 0.7 beta uses? -- Best regards, Ivy Tang
Re: What the thrift version cassandra 0.7 beta uses?
So the thrift.lib will maybe change while the cassandra is updating? On Thu, Sep 16, 2010 at 10:36 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: It doesn't use a specific version - it uses a specific subversion revision. The revision number is appended to the thrift jar in the cassandra lib folder. On Sep 16, 2010, at 9:10 PM, Ying Tang wrote: What the thrift version cassandra 0.7 beta uses? -- Best regards, Ivy Tang -- Best regards, Ivy Tang
Re: questions on cassandra (repair and multi-datacenter)
On Thu, Sep 16, 2010 at 3:19 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: 1. I was looking to increase the RF to 3. This process entails changing the config and calling repair on the keyspace one at a time, right? So, I started with one node at a time, changed the config file on the first node for the keyspace, restarted the node. And then called a nodetool repair on the node. You need to change the RF on _all_ nodes in the cluster _before_ running repair on _any_ of them. If nodes disagree on which nodes should have replicas for keys, repair will not work correctly. Different RF for the same keyspace creates that disagreement. b