Hello, I did find these exceptions. I issued the loadbalance command on node 192.168.2.10.
INFO [MESSAGING-SERVICE-POOL:3] 2010-03-01 10:34:40,764 TcpConnection.java (line 315) Closing errored connection java.nio.channels.SocketChannel[connected local=/192.168.2.10:55973 remote=/ 192.168.2.13:7000] WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-03-01 10:34:40,964 MessagingService.java (line 555) Running on default stage - beware WARN [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/192.168.2.10:40758 remote=/ 192.168.2.13:7000] WARN [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java (line 485) Exception was generated at : 03/01/2010 10:34:40 on thread MESSAGING-SERVICE-POOL:1 Reached an EOL or something bizzare occured. Reading from: /192.168.2.13BufferSizeRemaining: 16 java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 BufferSizeRemaining: 16 at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44) at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39) at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95) at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO [MESSAGING-SERVICE-POOL:1] 2010-03-01 10:34:40,964 TcpConnection.java (line 315) Closing errored connection java.nio.channels.SocketChannel[connected local=/192.168.2.10:40758 remote=/ 192.168.2.13:7000] INFO [MESSAGE-STREAMING-POOL:1] 2010-03-01 10:35:23,171 TcpConnection.java (line 315) Closing errored connection java.nio.channels.SocketChannel[connected local=/192.168.2.10:56728 remote=/ 192.168.2.13:7000] INFO [MESSAGE-STREAMING-POOL:1] 2010-03-01 10:35:23,221 FileStreamTask.java (line 79) Exception was generated at : 03/01/2010 10:35:23 on thread MESSAGE-STREAMING-POOL:1 Value too large for defined data type java.io.IOException: Value too large for defined data type at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source) at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) at org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226) at org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I can certainly upgrade to 0.6 and try a loadbalance there, do you still think it is advisable? All of my key/value entries are well under 1024 bytes but I have millions of them. Do you think I have a data corruption problem? Thanks, Jon On Mon, Mar 1, 2010 at 2:54 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Mon, Mar 1, 2010 at 3:18 PM, Jon Graham <sjclou...@gmail.com> wrote: > > Thanks Jonathan. > > > > It seems like the load balance operation isn't moving. I haven't seen any > > data file time changes in 2 hours and no location file time > > changes in over an hour. > > > > I can see a tcp port # 7000 opened on the node where I ran the > loadbalance > > command. It is connected to > > port 39033 on the node receiving the data. The CPU usage on both systems > is > > very low. There are about 10 > > million records on the node where the load balance command was issued. > > Did you check logs for exceptions? > > > My six node Cassandra ring consists of tokens for nodes 1-6 of: 0 > > (ascii 0x30) 6 B H O (the letter O) T > > > > The load balance target node initially had a token of 'H' (using ordered > > partitioning). The source node has a key of 0 (ascii 0x30). Most of the > data > > on the source node has keys starting with '/'. Slash falls between tokens > T > > and 0 in my ring so most of the data landed on the node with token 0 > with > > replicas on the next 2 nodes. My token space is badly divided for the > data I > > have already inserted. > > > > Does the initial token value of the load balance target node selected by > > Cassandra need to be cleared or set to a specific value before hand to > > accomodate the load balance data transfer? > > No. > > > Would I have better luck decommissioning nodes 4,5,6 and trying to > > bootstrapping these nodes one at a time > > with better initial token values? > > LoadBalance is basically sugar for decommission + bootstrap, so no. > > > I am looking for a good way to move/split/re-balance data from nodes > 1,2,3 > > to nodes 4, 5, 6 while achiving a better token space distribution. > > I would upgrade to the 0.6 beta and try loadbalance again. > > -Jonathan >