Knowing when there is a *real* need to add nodes

2011-05-12 Thread Tomer B
Hi

I'm trying to predict when my cluster would soon be needing new nodes
added, i want a continuous graph telling my of my cluster health so
that when i see my cluster becomes more and more busy (I want numbers
 measurments) i would be able to know i need to start purchasing more
machines and get them into my cluster, so i want to know of that
beforehand.
I'm writing here what I came with after doing some research over net.
I would highly appreciate any additional gauge measurements and ranges
in order to test my cluster health and to know beforehand when i'm
going to soon need more nodes.Although i'm writing down green
gauge,yellow gauge,red gauge, i'm also trying to find a continuous
graph where i can tell where our cluster stand (as much as
possible...)

Also my recommendation is always before adding new nodes:

1. Make sure all nodes are balanced and if not balance them.
2. Separate commit log drive from data (SSTables) drive
3. use mmap index only in memory and not auto
4. Increase disk IO if possible.
5. Avoid swapping as much as possible.


As for my gauge tests for when to add new nodes:

test: nodetool tpstats -h cassandra_host
green gauge: No pending column with number higher
yellow gauge: pending columns 100-2000
red gauge:Larger than 3000

test: iostat -x -n -p -z 5 10  and iostat -xcn 5
green gauge: kw/s + kr/s reaches is below 25% capacity of disk io
yellow gauge: 20%-50%
red gauge: 50%+

test: ostat -x -n -p -z 5 10 and check %b column
green gauge: less than 10%
yellow gauge:  10%-80%
red gauge: 90%+

test: nodetool cfstats --host localhost
green gauge: “SSTable count” item does not continually grow over time
yellow gauge:
red gauge: “SSTable count” item continually grows over time

test: ./nodetool cfstats --host localhost | grep -i pending
green gauge: 0-2
yellow gauge: 3-100
red gauge: 101+

I would highly appreciate any additional gauge measurements and ranges
in order to test my cluster health and to know ***beforehand*** when
i'm going to soon need more nodes.


Re: network topology issue

2011-05-12 Thread Sylvain Lebresne
On Thu, May 12, 2011 at 1:58 AM, Anurag Gujral anurag.guj...@gmail.com wrote:
 Hi All,
  I am testing network topology strategy in cassandra I am using
 two nodes , one node each in different data center.
 Since the nodes are in different dc I assigned token 0 to both the nodes.
 I added both the nodes as seeds in the cassandra.yaml and  I am  using
 properyfilesnitch as endpoint snitch where I have specified the colo
 details.

 I started first node then I when I restarted second node I got an error that
 token 0 is already being used.Why am I getting this error.

You cannot have two nodes with the same token, so you'll have to use 0 and
1 for instance. It's true that with NTS you have to think of each datacenter
as a separate ring, but there is still this restriction that each token must be
different across the whole cluster.


 Second Question: I already have cassandra running in two different data
 centers I want to add a new keyspace which uses networkTopology strategy
 in the light of above errors how can I accomplish this.


 Thanks
 Anurag



Import/Export of Schema Migrations

2011-05-12 Thread David Boxenhorn
My use case is like this: I have a development cluster, a staging cluster
and a production cluster. When I finish a set of migrations (i.e. changes)
on the development cluster, I want to apply them to the staging cluster, and
eventually the production cluster. I don't want to do it by hand, because
it's a painful and error-prone process. What I would like to do is export
the last N migrations from the development cluster as a text file, with
exactly the same format as the original text commands, and import them to
the staging and production clusters.

I think the best place to do this might be the CLI, since you would probably
want to view your migrations before exporting them. Something like this:

show migrations N;Shows the last N migrations.
export migrations N fileName;   Exports the last N migrations to file
fileName.
import migrations fileName; Imports migrations from fileName.

The import process would apply the migrations one at a time giving you
feedback like, applying migration: update column family If a migration
fails, the process should give an appropriate message and stop.

Is anyone else interested in this? I have created a Jira ticket for it here:

https://issues.apache.org/jira/browse/CASSANDRA-2636


Re: Excessive allocation during hinted handoff

2011-05-12 Thread Terje Marthinussen
Just out of curiosity is this on the receiver or sender side?

I have been wondering a bit if the hint playback could need some
adjustment.
There is potentially quite big differences on how much is sent per throttle
delay time depending on what your data looks like.

Early 0.7 releases also built up hints very easily under load due to nodes
quickly getting marked as down due to gossip sharing the same thread as many
other operations.

Terje

On Thu, May 12, 2011 at 1:28 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Doesn't really look abnormal to me for a heavy write load situation
 which is what receiving hints is.

 On Wed, May 11, 2011 at 1:55 PM, Gabriel Tataranu gabr...@wajam.com
 wrote:
  Greetings,
 
  I'm experiencing some issues with 2 nodes (out of more than 10). Right
  after startup (Listening for thrift clients...) the nodes will create
  objects at high rate using all available CPU cores:
 
   INFO 18:13:15,350 GC for PS Scavenge: 292 ms, 494902976 reclaimed
  leaving 2024909864 used; max is 6658457600
   INFO 18:13:20,393 GC for PS Scavenge: 252 ms, 478691280 reclaimed
  leaving 2184252600 used; max is 6658457600
  
   INFO 18:15:23,909 GC for PS Scavenge: 283 ms, 452943472 reclaimed
  leaving 5523891120 used; max is 6658457600
   INFO 18:15:24,912 GC for PS Scavenge: 273 ms, 466157568 reclaimed
  leaving 5594606128 used; max is 6658457600
 
  This will eventually trigger old-gen GC and then the process repeats
  until hinted handoff finishes.
 
  The build version was updated from 0.7.2 to 0.7.5 but the behavior was
  exactly the same.
 
  Thank you.
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Excessive allocation during hinted handoff

2011-05-12 Thread Terje Marthinussen
An if you have 10 nodes, do all of them happen to send hints to the two with
GC?

Terje

On Thu, May 12, 2011 at 6:10 PM, Terje Marthinussen tmarthinus...@gmail.com
 wrote:

 Just out of curiosity is this on the receiver or sender side?

 I have been wondering a bit if the hint playback could need some
 adjustment.
 There is potentially quite big differences on how much is sent per throttle
 delay time depending on what your data looks like.

 Early 0.7 releases also built up hints very easily under load due to nodes
 quickly getting marked as down due to gossip sharing the same thread as many
 other operations.

 Terje

 On Thu, May 12, 2011 at 1:28 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Doesn't really look abnormal to me for a heavy write load situation
 which is what receiving hints is.

 On Wed, May 11, 2011 at 1:55 PM, Gabriel Tataranu gabr...@wajam.com
 wrote:
  Greetings,
 
  I'm experiencing some issues with 2 nodes (out of more than 10). Right
  after startup (Listening for thrift clients...) the nodes will create
  objects at high rate using all available CPU cores:
 
   INFO 18:13:15,350 GC for PS Scavenge: 292 ms, 494902976 reclaimed
  leaving 2024909864 used; max is 6658457600
   INFO 18:13:20,393 GC for PS Scavenge: 252 ms, 478691280 reclaimed
  leaving 2184252600 used; max is 6658457600
  
   INFO 18:15:23,909 GC for PS Scavenge: 283 ms, 452943472 reclaimed
  leaving 5523891120 used; max is 6658457600
   INFO 18:15:24,912 GC for PS Scavenge: 273 ms, 466157568 reclaimed
  leaving 5594606128 used; max is 6658457600
 
  This will eventually trigger old-gen GC and then the process repeats
  until hinted handoff finishes.
 
  The build version was updated from 0.7.2 to 0.7.5 but the behavior was
  exactly the same.
 
  Thank you.
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





Cassandra causing very high load on CPU's 0.6.3

2011-05-12 Thread Ali Ahsan

Hi All

I am experience some problem with me two Cassandra node with RF=2,Both 
node CPU's usage is very high,load average: 9.47, 5.72, 5.11 and this 
causing my application to time out .I have xeon with 8 processor and 16 
GB of Ram.and LVM setup for Cassandra.How can i trace the main issue of 
load i have no swap and swappies also set to 0.I am running Centos 5.5 
64 Bit.


tpstats
--

Pool NameActive   Pending  Completed
STREAM-STAGE  0 0  0
RESPONSE-STAGE0 0  18439
ROW-READ-STAGE0 0  16235
LB-OPERATIONS 0 0  0
MESSAGE-DESERIALIZER-POOL 0 0  63673
GMFD  0 0632
LB-TARGET 0 0  0
CONSISTENCY-MANAGER   0 0   4008
ROW-MUTATION-STAGE1 1  70414
MESSAGE-STREAMING-POOL0 0  0
LOAD-BALANCER-STAGE   0 0  0
FLUSH-SORTER-POOL 0 0  0
MEMTABLE-POST-FLUSHER 0 0 32
FLUSH-WRITER-POOL 0 0 32
AE-SERVICE-STAGE  0 0  0
HINTED-HANDOFF-POOL   1 1  0

--
S.Ali Ahsan





CounterColumn increments gone after restart

2011-05-12 Thread Utku Can Topçu
Hi guys,

I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the way
it should be but:
- I create a ColumnFamily named Counters
- do a few increments on a column.
- kill cassandra
- start cassandra

When I look at the counter column, the value is 1.

See the following pastebin please: http://pastebin.com/9jYdDiRY


Re: Excessive allocation during hinted handoff

2011-05-12 Thread Gabriel Tataranu
Greetings,

 Just out of curiosity is this on the receiver or sender side?

Looks like sender side, although the 2 nodes were replicating to each
other so it's hard to tell.

 
 I have been wondering a bit if the hint playback could need some
 adjustment. 
 There is potentially quite big differences on how much is sent per
 throttle delay time depending on what your data looks like.
 
 Early 0.7 releases also built up hints very easily under load due to
 nodes quickly getting marked as down due to gossip sharing the same
 thread as many other operations.

Like I said, cassandra was updated to 0.7.5 (latest build as of today)
following the advice on IRC.  There was no change in behavior.

Best,

Gabriel



Re: Excessive allocation during hinted handoff

2011-05-12 Thread Gabriel Tataranu

 An if you have 10 nodes, do all of them happen to send hints to the two
 with GC?

The 2 nodes are adjacent in token range. They are replicating to each other.

Other nodes have no data to replicate so there's no proof one way or
another.


Best,

Gabriel



Re: CounterColumn increments gone after restart

2011-05-12 Thread Utku Can Topçu
see the ticket https://issues.apache.org/jira/browse/CASSANDRA-2642 please

On Thu, May 12, 2011 at 3:28 PM, Utku Can Topçu u...@topcu.gen.tr wrote:

 Hi guys,

 I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the
 way it should be but:
 - I create a ColumnFamily named Counters
 - do a few increments on a column.
 - kill cassandra
 - start cassandra

 When I look at the counter column, the value is 1.

 See the following pastebin please: http://pastebin.com/9jYdDiRY



Re: Excessive allocation during hinted handoff

2011-05-12 Thread Gabriel Tataranu
Greetings,

 Doesn't really look abnormal to me for a heavy write load situation
 which is what receiving hints is.

I would agree with you but this raises some questions about write
performance. Plus I've only seen this kind of behavior recently and only
on 2 adjacent nodes. So I have good reason to believe this is the
exception and not the rule.

Best,

Gabriel



Re: Excessive allocation during hinted handoff

2011-05-12 Thread Gabriel Tataranu

 I'm assuming the two nodes are the ones receiving the HH after they were 
 down. 

Adjacent, so yes.

 
 Are there a lot of hints collected while they are down ? you can check the 
 HintedHandOffManager MBean in JConsole

There wasn't any downtime - that's something else that's weird.

 
 What does the TPStats look like on the nodes under pressure ? And how many 
 nodes are delivering hints to the nodes when they restart?

TPStats do show activity on the HH. I'll have some examples latter if
the nodes decide to do this again.

 
 Finally hinted_handoff_throttle_delay_in_ms in conf/cassandra.yaml will let 
 you slow down the delivery rate if HH is indeed the problem. 

That's useful info. Thanks.

Best,

Gabriel



Re: Excessive allocation during hinted handoff

2011-05-12 Thread Gabriel Tataranu
 What does the TPStats look like on the nodes under pressure ? And how many 
 nodes are delivering hints to the nodes when they restart?

$nodetool -h 127.0.0.1 tpstats
Pool NameActive   Pending  Completed
ReadStage 1 11992475
RequestResponseStage  0 02247486
MutationStage 0 01631349
ReadRepairStage   0 0 583432
GossipStage   0 0 241324
AntiEntropyStage  0 0  0
MigrationStage0 0  0
MemtablePostFlusher   0 0 46
StreamStage   0 0  0
FlushWriter   0 0 46
MiscStage 0 0  0
FlushSorter   0 0  0
InternalResponseStage 0 0  0
HintedHandoff 1 5152


dstat -cmdln during the event:

total-cpu-usage --memory-usage- ---load-avg---
-dsk/total- -net/total-
usr sys idl wai hiq siq| used  buff  cach  free| 1m   5m  15m | read
writ| recv  send
 87   6   6   0   0   1|6890M 32.1M 1001M 42.8M|2.36 2.87 1.73|   0
0 |  75k  144k
 88  10   2   0   0   0|6889M 32.2M 1002M 41.6M|3.05 3.00 1.78|   0
0 |  60k  102k
 89   9   2   0   0   0|6890M 32.2M 1003M 41.0M|3.05 3.00 1.78|   0
0 |  38k   70k
 89  10   1   0   0   0|6890M 32.2M 1003M 40.7M|3.05 3.00 1.78|   0
0 |  26k   24k
 93   6   2   0   0   0|6890M 32.2M 1003M 40.9M|3.05 3.00 1.78|   0
0 |  37k   31k
 90   8   2   0   0   0|6890M 32.2M 1003M 39.9M|3.05 3.00 1.78|   0
0 |  67k   69k
 87   8   4   0   0   1|6890M 32.2M 1004M 38.7M|4.09 3.22 1.85|   0
0 | 123k  262k
 83  13   2   0   0   2|6890M 32.2M 1004M 38.3M|4.09 3.22 1.85|   0
0 | 445k   18M
 90   6   3   0   0   0|6890M 32.2M 1005M 38.2M|4.09 3.22 1.85|   0
0 |  72k   91k
 40   7  25  27   0   0|6890M 32.2M 1005M 37.8M|4.09 3.22 1.85|   0
0 | 246k 8034k
  0   0  59  41   0   0|6890M 32.2M 1005M 37.7M|4.09 3.22 1.85|   0
0 |  19k 6490B
  1   2  45  52   0   0|6891M 32.2M  999M 43.1M|4.00 3.21 1.86|   0
0 |  29k   18k
 72   8  15   3   0   1|6892M 32.2M  999M 41.6M|4.00 3.21 1.86|   0
0 | 431k   11M
 88   9   2   0   0   1|6907M 32.0M  985M 41.1M|4.00 3.21 1.86|   0
0 |  99k   77k
 88  10   1   0   0   1|6913M 31.9M  977M 44.1M|4.00 3.21 1.86|   0
0 | 112k  619k
 89   9   1   0   0   1|6892M 31.9M  977M 64.4M|4.00 3.21 1.86|   0
0 | 109k  369k
 90   8   1   0   0   0|6892M 31.9M  979M 62.5M|4.80 3.39 1.92|   0
0 | 130k   97k
 83  13   1   0   0   3|6893M 32.0M  981M 59.8M|4.80 3.39 1.92|   0
0 | 503k   18M
 78  11  10   0   0   0|6893M 32.0M  981M 59.5M|4.80 3.39 1.92|   0
0 | 102k  110k


The low cpu periods are due to major GC (JVM frozen).

 
 TPStats do show activity on the HH. I'll have some examples latter if
 the nodes decide to do this again.
 

 Finally hinted_handoff_throttle_delay_in_ms in conf/cassandra.yaml will let 
 you slow down the delivery rate if HH is indeed the problem. 
 


Best,

Gabriel



Re: Cassandra causing very high load on CPU's 0.6.3

2011-05-12 Thread Ali Ahsan

On 05/12/2011 04:08 PM, Ali Ahsan wrote:

Hi All

I am experience some problem with me two Cassandra node with RF=2,Both 
node CPU's usage is very high,load average: 9.47, 5.72, 5.11 and this 
causing my application to time out .I have xeon with 8 processor and 
16 GB of Ram.and LVM setup for Cassandra.How can i trace the main 
issue of load i have no swap and swappies also set to 0.I am running 
Centos 5.5 64 Bit. 


Add to this i am using openjdk not sunjdk will this be an issue ?


Re: Cassandra causing very high load on CPU's 0.6.3

2011-05-12 Thread Sylvain Lebresne
On Thu, May 12, 2011 at 6:04 PM, Ali Ahsan ali.ah...@panasiangroup.com wrote:
 On 05/12/2011 04:08 PM, Ali Ahsan wrote:

 Hi All

 I am experience some problem with me two Cassandra node with RF=2,Both
 node CPU's usage is very high,load average: 9.47, 5.72, 5.11 and this
 causing my application to time out .I have xeon with 8 processor and 16 GB
 of Ram.and LVM setup for Cassandra.How can i trace the main issue of load i
 have no swap and swappies also set to 0.I am running Centos 5.5 64 Bit.

 Add to this i am using openjdk not sunjdk will this be an issue ?

It is indeed advised to use sunjdk as openjdk is a bit behind as far
as bug fixes are
concerned.

Moreover, 0.6.3 is pretty old now and we do have fixed a number of
issue related to
load spikes, so before investigating further the best advice I can
give you is to upgrade
(either to 0.6.13 if you really feel like staying on 0.6, or to 0.7.5).

--
Sylvain


Monitoring bytes read per cf

2011-05-12 Thread Daniel Doubleday
Hi all

got a question for folks with some code insight again.

To be able to better understand where our IO load is coming from we want to 
monitor the number of bytes read from disc per cf. (we love stats)

What I have done is wrapping the FileDataInput in SSTableReader to sum the 
bytes read in CFS. This will only record data file access but that would be 
good enough for us.

It seems to work fine but maybe someone here knows that this is not a good idea 


Cheers,
Daniel

Some code:

SSTableReader:
private static final boolean KEEP_IO_STATISICS = 
Boolean.getBoolean(cassandra.keepIOStats);
public FileDataInput getFileDataInput(DecoratedKey decoratedKey, int 
bufferSize)
{
long position = getPosition(decoratedKey, Operator.EQ);
if (position  0)
return null;

FileDataInput segment = dfile.getSegment(position, bufferSize);
return (KEEP_IO_STATISICS) ? new MonitoringFileDataIInput(metadata, 
segment) : segment; 
}

with MonitoringFileDataIInput

public class MonitoringFileDataIInput implements FileDataInput, Closeable
{

private final FileDataInput fileDataInput;
private final ColumnFamilyStore columnFamilyStore;

public MonitoringFileDataIInput(CFMetaData cfMetaData, FileDataInput 
fileDataInput)
{
columnFamilyStore = 
Table.open(cfMetaData.tableName).getColumnFamilyStore(cfMetaData.cfId);
this.fileDataInput = fileDataInput;
}

@Override
public boolean readBoolean() throws IOException
{
columnFamilyStore.addBytesRead(1);
return fileDataInput.readBoolean();
}

// ... etc

and ColumnFamilyStore
private final AtomicLong bytesRead = new AtomicLong(0L);

@Override // ColumnFamilyStoreMBean
public long getBytesRead()
{
return bytesRead.get();
}

public void addBytesRead(int num) 
{
bytesRead.addAndGet(num);
}

  

Re: Cassandra causing very high load on CPU's 0.6.3

2011-05-12 Thread Jonathan Ellis
https://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.6.13/CHANGES.txt

On Thu, May 12, 2011 at 11:56 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote:

 It is indeed advised to use sunjdk as openjdk is a bit behind as far
 as bug fixes are
 concerned.

 Moreover, 0.6.3 is pretty old now and we do have fixed a number of
 issue related to
 load spikes, so before investigating further the best advice I can
 give you is to upgrade
 (either to 0.6.13 if you really feel like staying on 0.6, or to 0.7.5).

 Thanks let me discussed that with my team.How many changes and where do we
 need.

 --
 S.Ali Ahsan

 Senior System Engineer

 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure, error
 or virus-free. We do not accept liability for any errors or omissions.





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: network topology issue

2011-05-12 Thread Anurag Gujral
Thanks everyone for your responses.

On Thu, May 12, 2011 at 1:18 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, May 12, 2011 at 1:58 AM, Anurag Gujral anurag.guj...@gmail.com
 wrote:
  Hi All,
   I am testing network topology strategy in cassandra I am
 using
  two nodes , one node each in different data center.
  Since the nodes are in different dc I assigned token 0 to both the nodes.
  I added both the nodes as seeds in the cassandra.yaml and  I am  using
  properyfilesnitch as endpoint snitch where I have specified the colo
  details.
 
  I started first node then I when I restarted second node I got an error
 that
  token 0 is already being used.Why am I getting this error.

 You cannot have two nodes with the same token, so you'll have to use 0
 and
 1 for instance. It's true that with NTS you have to think of each
 datacenter
 as a separate ring, but there is still this restriction that each token
 must be
 different across the whole cluster.

 
  Second Question: I already have cassandra running in two different data
  centers I want to add a new keyspace which uses networkTopology strategy
  in the light of above errors how can I accomplish this.
 
 
  Thanks
  Anurag
 



Commitlog Disk Full

2011-05-12 Thread Sanjeev Kulkarni
Hey guys,
I have a ec2 debian cluster consisting of several nodes running 0.7.5 on
ephimeral disks.
These are fresh installs and not upgrades.
The commitlog is set to the smaller of the disks which is around 10G in size
and the datadir is set to the bigger disk.
The config file is basically the same as the one supplied by the default
installation.
Our applications write to the cluster. After about a day of writing we
started noticing the commitlog disk filling up. Soon we went over the disk
limit and writes started failing. At this point we stopped the cluster.
Over the course of the day we inserted around 25G of data. Our columns
values are pretty small.
I understand that cassandra periodically cleans up the commitlog directories
by generating sstables in datadir. Is there any way to speed up this
movement from commitog to datadir?
Thanks!


Re: Commitlog Disk Full

2011-05-12 Thread Peter Schuller
 I understand that cassandra periodically cleans up the commitlog directories
 by generating sstables in datadir. Is there any way to speed up this
 movement from commitog to datadir?

commitlog_rotation_threshold_in_mb could cause problems if it was set
very very high, but with the default of 128mb it should not be an
issue.

I suspect the most likely reason is that you have a column family
whose memtable flush settings are extreme. A commit log segment cannot
be removed until the corresponding data has been flushed to an
sstable. For high-throughput memtables where you flush regularly this
should happen often. For idle or almost idle memtables you may be
waiting on the timeout criteria to trigger. So in general, having a
memtable with a long expiry time will have the potential to generate
commit logs of whatever size is implied by the write traffic during
that periods.

The memtable setting in question is the memtable_flush_after
setting. Do you have that set to something very high on one of your
column families?

You can use describe keyspace name_of_keyspace in cassandra-cli to
check current settings.

-- 
/ Peter Schuller


Hinted Handoff

2011-05-12 Thread Anurag Gujral
Hi All,
   I have two questions:
a) Is there  a way to turn on and off hinted handoff per keyspace rather
than for multiple keyspaces.
b)It looks like cassandra stores hinted handoff data in one row.Is it true?
.Does having one row for hinted handoff implies
if nodes are down for longer period of time not all the data which needs to
be replicated will be on the node which is alive.

Thanks
Anurag


running TPC-C on cassandra clusters

2011-05-12 Thread Xiaowei Wang
Hi all,

My partner and I currently using cassandra cluster to run TPC-C. We first
use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,  the
other(worker node) has 4 cores. During the loading time, either the client
node or the worker node will down(cannot be detected) randomly and then
up again in a short time. If the two nodes both down, we failed in
loading. If only one of them down, we can continue to load data.

The problem is if we use multiple threads(we write multiprocess code), say 4
clients threads, some of them might be stop at the point one of the nodes
first down, and the dead threads will never come back This will not only
enlarge our loading time, but also effect the amount of data we can load.

So we need to figure out why the nodes continue to be up and down and fix
this problem.

Thanks for any help!

Best,
Xiaowei


Re: Hinted Handoff

2011-05-12 Thread Sameer Farooqui
I'm not sure about your first question.

I believe the internal system keyspace holds the hinted handoff information.

In 0.6 and earlier, HintedHandoffManager.sendMessage used to read the entire
row into memory and then send the row back to the client in a single
message. As of 0.7, Cassandra pages within a single hinted row instead
(which improves performance for wide rows).



On Thu, May 12, 2011 at 11:48 AM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Hi All,
I have two questions:
 a) Is there  a way to turn on and off hinted handoff per keyspace rather
 than for multiple keyspaces.
 b)It looks like cassandra stores hinted handoff data in one row.Is it true?
 .Does having one row for hinted handoff implies
 if nodes are down for longer period of time not all the data which needs to
 be replicated will be on the node which is alive.

 Thanks
 Anurag



Re: running TPC-C on cassandra clusters

2011-05-12 Thread Jonathan Ellis
I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)

On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com wrote:
 Hi all,

 My partner and I currently using cassandra cluster to run TPC-C. We first
 use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,  the
 other(worker node) has 4 cores. During the loading time, either the client
 node or the worker node will down(cannot be detected) randomly and then
 up again in a short time. If the two nodes both down, we failed in
 loading. If only one of them down, we can continue to load data.

 The problem is if we use multiple threads(we write multiprocess code), say 4
 clients threads, some of them might be stop at the point one of the nodes
 first down, and the dead threads will never come back This will not only
 enlarge our loading time, but also effect the amount of data we can load.

 So we need to figure out why the nodes continue to be up and down and fix
 this problem.

 Thanks for any help!

 Best,
 Xiaowei





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Crash when uploading large data sets

2011-05-12 Thread James Cipar
I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
unique data), to a cluster of 10 servers.  I'm using batch_mutate, and breaking 
the data up into chunks of about 10k records.  Each record is about 5KB, so a 
total of about 50MB per batch.  When I upload a smaller 2 GB data set, 
everything works fine.  When I upload the 20 GB data set, servers will 
occasionally crash.  Currently I have my client code automatically detect this 
and restart the server, but that is less than ideal.

I'm not sure what information to gather to determine what's going on here.  
Here is a sample of a log file from when a crash occurred.  The crash was 
immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea 
what's going on here?  Any other info I can gather to try to debug this?







 INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) GC 
for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
7774142464
 INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) GC 
for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
7774142464
 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
50) Creating new commitlog segment 
/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
 INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 1115783 
operations)
 INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) Writing 
Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
 INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) GC 
for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 
7774142464
 INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) GC 
for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 
7774142464
 INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) 
Completed flushing 
/mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 
bytes)
 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) 
Discarding obsolete commit 
log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log)
 INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) GC 
for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 
7774142464
 INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 
1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 
operations)
 INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) Writing 
Memtable-Standard1@479849353(51941121 bytes, 1115783 operations)
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61
 INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) 
Logging initialized
 INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) 
Heap size: 7634681856/7635730432
 INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. 
Native methods will be disabled.
 INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) Loading 
settings from 
file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml
 INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) 
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Schema-f-1
 INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Schema-f-2
 INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1
 INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2
 INFO [main] 2011-05-12 19:02:21,342 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-2
 INFO [main] 2011-05-12 19:02:21,344 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-1
 INFO 

Re: Crash when uploading large data sets

2011-05-12 Thread Sameer Farooqui
The key JVM options for Cassandra are in cassandra.in.sh.

What is your min and max heap size?

The default setting of max heap size is 1GB. How much RAM do your nodes
have? You may want to increase this setting. You can also set the -Xmx and
-Xms options to the same value to keep Java from having to manage heap
growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you
can get a lot more on 64-bit.

Try messing with some of the other settings in the cassandra.in.sh file.

You may not have DEBUG mode turned on for Cassandra and therefore may not be
getting the full details of what's going on when the server crashes. In the
cassandra-home/conf/log4j-server.properties file, set this line from the
default of INFO to DEBUG:

log4j.rootLogger=INFO,stdout,R


Also, you haven't configured JNA on this server. Here's some info about it
and how to configure it:

JNA provides Java programs easy access to native shared libraries without
writing anything but Java code.

Note from Cassandra developers for why JNA is needed:
*Linux aggressively swaps out infrequently used memory to make more room
for its file system buffer cache. Unfortunately, modern generational garbage
collectors like the JVM's leave parts of its heap un-touched for relatively
large amounts of time, leading Linux to swap it out. When the JVM finally
goes to use or GC that memory, swap hell ensues.

Setting swappiness to zero can mitigate this behavior but does not eliminate
it entirely. Turning off swap entirely is effective. But to avoid surprising
people who don't know about this behavior, the best solution is to tell
Linux not to swap out the JVM, and that is what we do now with mlockall via
JNA.

Because of licensing issues, we can't distribute JNA with Cassandra, so you
must manually add it to the Cassandra lib/ directory or otherwise place it
on the classpath. If the JNA jar is not present, Cassandra will continue as
before.*

Get JNA with:
*cd ~
wget
http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb*

To install:
*techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
(Reading database ... 44334 files and directories currently installed.)
Preparing to replace libjna-java 3.2.4-2 (using
libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
Unpacking replacement libjna-java ...
Setting up libjna-java (3.2.7-0~nmu.2) ...*


The deb package will install the JNA jar file to /usr/share/java/jna.jar,
but Cassandra only loads it if its in the class path. The easy way to do
this is just create a symlink into your Cassandra lib directory (note:
replace /home/techlabs with your home dir location):
*ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib*

Research:
http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/


- Sameer


On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote:

 I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB
 unique data), to a cluster of 10 servers.  I'm using batch_mutate, and
 breaking the data up into chunks of about 10k records.  Each record is about
 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data
 set, everything works fine.  When I upload the 20 GB data set, servers will
 occasionally crash.  Currently I have my client code automatically detect
 this and restart the server, but that is less than ideal.

 I'm not sure what information to gather to determine what's going on here.
  Here is a sample of a log file from when a crash occurred.  The crash was
 immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea
 what's going on here?  Any other info I can gather to try to debug this?







  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line
 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max
 is 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line
 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max
 is 7774142464
  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java
 (line 50) Creating new commitlog segment
 /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java
 (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529
 bytes, 1115783 operations)
  INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158)
 Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
  INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line
 128) GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max
 is 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line
 128) GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max
 is 7774142464
  INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165)
 Completed flushing
 

Re: Unable to add columns to empty row in Column family: Cassandra

2011-05-12 Thread Narendra Sharma
Can u share the code?

On Mon, May 2, 2011 at 11:34 PM, anuya joshi anu...@gmail.com wrote:

 Hello,

 I am using Cassandra for my application.My Cassandra client uses Thrift
 APIs directly. The problem I am facing currently is as follows:

 1) I added a row and columns in it dynamically via Thrift API Client
 2) Next, I used command line client to delete row which actually deleted
 all the columns in it, leaving empty row with original row id.
 3) Now, I am trying to add columns dynamically using client program into
 this empty row with same row key
 However, columns are not being inserted.
 But, when tried from command line client, it worked correctly.

 Any pointer on this would be of great use

 Thanks in  advance,

 Regards,
 Anuya




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Crash when uploading large data sets

2011-05-12 Thread James Cipar
It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my 
physical memory.  These are 15GB VMs, so that's 7.5GB for Cassandra.  I would 
have expected that to work, but I will override to 13 GB just to see what 
happens.

I've also got the JNA thing set up.  Do you think this would cause the crashes, 
or is it just a performance improvement?



On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:

 The key JVM options for Cassandra are in cassandra.in.sh.
 
 What is your min and max heap size?
 
 The default setting of max heap size is 1GB. How much RAM do your nodes have? 
 You may want to increase this setting. You can also set the -Xmx and -Xms 
 options to the same value to keep Java from having to manage heap growth. On 
 a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a 
 lot more on 64-bit.
 
 Try messing with some of the other settings in the cassandra.in.sh file.
 
 You may not have DEBUG mode turned on for Cassandra and therefore may not be 
 getting the full details of what's going on when the server crashes. In the 
 cassandra-home/conf/log4j-server.properties file, set this line from the 
 default of INFO to DEBUG:
 
 log4j.rootLogger=INFO,stdout,R
 
 
 Also, you haven't configured JNA on this server. Here's some info about it 
 and how to configure it:
 
 JNA provides Java programs easy access to native shared libraries without 
 writing anything but Java code.
 
 Note from Cassandra developers for why JNA is needed:
 Linux aggressively swaps out infrequently used memory to make more room for 
 its file system buffer cache. Unfortunately, modern generational garbage 
 collectors like the JVM's leave parts of its heap un-touched for relatively 
 large amounts of time, leading Linux to swap it out. When the JVM finally 
 goes to use or GC that memory, swap hell ensues.
 
 Setting swappiness to zero can mitigate this behavior but does not eliminate 
 it entirely. Turning off swap entirely is effective. But to avoid surprising 
 people who don't know about this behavior, the best solution is to tell Linux 
 not to swap out the JVM, and that is what we do now with mlockall via JNA.
 
 Because of licensing issues, we can't distribute JNA with Cassandra, so you 
 must manually add it to the Cassandra lib/ directory or otherwise place it on 
 the classpath. If the JNA jar is not present, Cassandra will continue as 
 before.
 
 Get JNA with: 
 cd ~
 wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb
 
 To install: 
 techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
 (Reading database ... 44334 files and directories currently installed.)
 Preparing to replace libjna-java 3.2.4-2 (using 
 libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
 Unpacking replacement libjna-java ...
 Setting up libjna-java (3.2.7-0~nmu.2) ...
 
 
 The deb package will install the JNA jar file to /usr/share/java/jna.jar, but 
 Cassandra only loads it if its in the class path. The easy way to do this is 
 just create a symlink into your Cassandra lib directory (note: replace 
 /home/techlabs with your home dir location):
 ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib
 
 Research:
 http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/
 
 
 - Sameer
 
 
 On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote:
 I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
 unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
 breaking the data up into chunks of about 10k records.  Each record is about 
 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data 
 set, everything works fine.  When I upload the 20 GB data set, servers will 
 occasionally crash.  Currently I have my client code automatically detect 
 this and restart the server, but that is less than ideal.
 
 I'm not sure what information to gather to determine what's going on here.  
 Here is a sample of a log file from when a crash occurred.  The crash was 
 immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea 
 what's going on here?  Any other info I can gather to try to debug this?
 
 
 
 
 
 
 
  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) 
 GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) 
 GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
 7774142464
  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
 50) Creating new commitlog segment 
 /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 
 1115783 operations)
  INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) 
 

Re: Crash when uploading large data sets

2011-05-12 Thread James Cipar
Oh, forgot this detail:  I have no swap configured, so swapping is not the 
cause of the crash.  Could it be that I'm running out of memory on a 15GB 
machine?  That seems unlikely.  I grepped dmesg for oom and didn't see 
anything from the oom killer, and I used the instructions from the following 
web page and didn't see that the oom killer had killed anything.

http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer

jcipar@172-19-149-62:~$ sudo cat /var/log/messages | grep --ignore-case killed 
process
jcipar@172-19-149-62:~$ 



Also, this is pretty subjective, so I can't say for sure until it finishes, but 
this seems to be running *much* slower after setting the heap size and setting 
up JNA.



On May 12, 2011, at 7:52 PM, James Cipar wrote:

 It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my 
 physical memory.  These are 15GB VMs, so that's 7.5GB for Cassandra.  I would 
 have expected that to work, but I will override to 13 GB just to see what 
 happens.
 
 I've also got the JNA thing set up.  Do you think this would cause the 
 crashes, or is it just a performance improvement?
 
 
 
 On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:
 
 The key JVM options for Cassandra are in cassandra.in.sh.
 
 What is your min and max heap size?
 
 The default setting of max heap size is 1GB. How much RAM do your nodes 
 have? You may want to increase this setting. You can also set the -Xmx and 
 -Xms options to the same value to keep Java from having to manage heap 
 growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you 
 can get a lot more on 64-bit.
 
 Try messing with some of the other settings in the cassandra.in.sh file.
 
 You may not have DEBUG mode turned on for Cassandra and therefore may not be 
 getting the full details of what's going on when the server crashes. In the 
 cassandra-home/conf/log4j-server.properties file, set this line from the 
 default of INFO to DEBUG:
 
 log4j.rootLogger=INFO,stdout,R
 
 
 Also, you haven't configured JNA on this server. Here's some info about it 
 and how to configure it:
 
 JNA provides Java programs easy access to native shared libraries without 
 writing anything but Java code.
 
 Note from Cassandra developers for why JNA is needed:
 Linux aggressively swaps out infrequently used memory to make more room for 
 its file system buffer cache. Unfortunately, modern generational garbage 
 collectors like the JVM's leave parts of its heap un-touched for relatively 
 large amounts of time, leading Linux to swap it out. When the JVM finally 
 goes to use or GC that memory, swap hell ensues.
 
 Setting swappiness to zero can mitigate this behavior but does not eliminate 
 it entirely. Turning off swap entirely is effective. But to avoid surprising 
 people who don't know about this behavior, the best solution is to tell 
 Linux not to swap out the JVM, and that is what we do now with mlockall via 
 JNA.
 
 Because of licensing issues, we can't distribute JNA with Cassandra, so you 
 must manually add it to the Cassandra lib/ directory or otherwise place it 
 on the classpath. If the JNA jar is not present, Cassandra will continue as 
 before.
 
 Get JNA with: 
 cd ~
 wget 
 http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb
 
 To install: 
 techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
 (Reading database ... 44334 files and directories currently installed.)
 Preparing to replace libjna-java 3.2.4-2 (using 
 libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
 Unpacking replacement libjna-java ...
 Setting up libjna-java (3.2.7-0~nmu.2) ...
 
 
 The deb package will install the JNA jar file to /usr/share/java/jna.jar, 
 but Cassandra only loads it if its in the class path. The easy way to do 
 this is just create a symlink into your Cassandra lib directory (note: 
 replace /home/techlabs with your home dir location):
 ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib
 
 Research:
 http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/
 
 
 - Sameer
 
 
 On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote:
 I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
 unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
 breaking the data up into chunks of about 10k records.  Each record is about 
 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data 
 set, everything works fine.  When I upload the 20 GB data set, servers will 
 occasionally crash.  Currently I have my client code automatically detect 
 this and restart the server, but that is less than ideal.
 
 I'm not sure what information to gather to determine what's going on here.  
 Here is a sample of a log file from when a crash occurred.  The crash was 
 immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea 
 what's going on here?  Any other info I can 

Re: Crash when uploading large data sets

2011-05-12 Thread Jonathan Ellis
If it's a jvm crash there should be a hs_err_pid.log file left around
in the directory you started Cassandra from.

On Thu, May 12, 2011 at 6:15 PM, James Cipar jci...@cmu.edu wrote:
 I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
 unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
 breaking the data up into chunks of about 10k records.  Each record is about 
 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data 
 set, everything works fine.  When I upload the 20 GB data set, servers will 
 occasionally crash.  Currently I have my client code automatically detect 
 this and restart the server, but that is less than ideal.

 I'm not sure what information to gather to determine what's going on here.  
 Here is a sample of a log file from when a crash occurred.  The crash was 
 immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea 
 what's going on here?  Any other info I can gather to try to debug this?







  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) 
 GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) 
 GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
 7774142464
  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
 50) Creating new commitlog segment 
 /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 
 1115783 operations)
  INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) 
 Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
  INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) 
 GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 
 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) 
 GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 
 7774142464
  INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) 
 Completed flushing 
 /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 
 bytes)
  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) 
 Discarding obsolete commit 
 log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log)
  INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) 
 GC for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 
 7774142464
  INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 
 1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 
 operations)
  INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) 
 Writing Memtable-Standard1@479849353(51941121 bytes, 1115783 operations)
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61
  INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) 
 Logging initialized
  INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) 
 Heap size: 7634681856/7635730432
  INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. 
 Native methods will be disabled.
  INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) 
 Loading settings from 
 file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml
  INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) 
 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
  INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening 
 /mnt/scratch/jcipar/cassandra/data/system/Schema-f-1
  INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening 
 /mnt/scratch/jcipar/cassandra/data/system/Schema-f-2
  INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening 
 /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1
  INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening 
 /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2
  INFO [main] 

Re: Crash when uploading large data sets

2011-05-12 Thread Jeffrey Kesselman
If this a 64bit VM?

A 32bit Java VM with default c-heap settings can only actually use
about 2GB of Java Heap.

On Thu, May 12, 2011 at 8:08 PM, James Cipar jci...@cmu.edu wrote:
 Oh, forgot this detail:  I have no swap configured, so swapping is not the 
 cause of the crash.  Could it be that I'm running out of memory on a 15GB 
 machine?  That seems unlikely.  I grepped dmesg for oom and didn't see 
 anything from the oom killer, and I used the instructions from the following 
 web page and didn't see that the oom killer had killed anything.

 http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer

 jcipar@172-19-149-62:~$ sudo cat /var/log/messages | grep --ignore-case 
 killed process
 jcipar@172-19-149-62:~$



 Also, this is pretty subjective, so I can't say for sure until it finishes, 
 but this seems to be running *much* slower after setting the heap size and 
 setting up JNA.



 On May 12, 2011, at 7:52 PM, James Cipar wrote:

 It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my 
 physical memory.  These are 15GB VMs, so that's 7.5GB for Cassandra.  I 
 would have expected that to work, but I will override to 13 GB just to see 
 what happens.

 I've also got the JNA thing set up.  Do you think this would cause the 
 crashes, or is it just a performance improvement?



 On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:

 The key JVM options for Cassandra are in cassandra.in.sh.

 What is your min and max heap size?

 The default setting of max heap size is 1GB. How much RAM do your nodes 
 have? You may want to increase this setting. You can also set the -Xmx and 
 -Xms options to the same value to keep Java from having to manage heap 
 growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you 
 can get a lot more on 64-bit.

 Try messing with some of the other settings in the cassandra.in.sh file.

 You may not have DEBUG mode turned on for Cassandra and therefore may not 
 be getting the full details of what's going on when the server crashes. In 
 the cassandra-home/conf/log4j-server.properties file, set this line from 
 the default of INFO to DEBUG:

 log4j.rootLogger=INFO,stdout,R


 Also, you haven't configured JNA on this server. Here's some info about it 
 and how to configure it:

 JNA provides Java programs easy access to native shared libraries without 
 writing anything but Java code.

 Note from Cassandra developers for why JNA is needed:
 Linux aggressively swaps out infrequently used memory to make more room 
 for its file system buffer cache. Unfortunately, modern generational 
 garbage collectors like the JVM's leave parts of its heap un-touched for 
 relatively large amounts of time, leading Linux to swap it out. When the 
 JVM finally goes to use or GC that memory, swap hell ensues.

 Setting swappiness to zero can mitigate this behavior but does not 
 eliminate it entirely. Turning off swap entirely is effective. But to avoid 
 surprising people who don't know about this behavior, the best solution is 
 to tell Linux not to swap out the JVM, and that is what we do now with 
 mlockall via JNA.

 Because of licensing issues, we can't distribute JNA with Cassandra, so you 
 must manually add it to the Cassandra lib/ directory or otherwise place it 
 on the classpath. If the JNA jar is not present, Cassandra will continue as 
 before.

 Get JNA with:
 cd ~
 wget 
 http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb

 To install:
 techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
 (Reading database ... 44334 files and directories currently installed.)
 Preparing to replace libjna-java 3.2.4-2 (using 
 libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
 Unpacking replacement libjna-java ...
 Setting up libjna-java (3.2.7-0~nmu.2) ...


 The deb package will install the JNA jar file to /usr/share/java/jna.jar, 
 but Cassandra only loads it if its in the class path. The easy way to do 
 this is just create a symlink into your Cassandra lib directory (note: 
 replace /home/techlabs with your home dir location):
 ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib

 Research:
 http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/


 - Sameer


 On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote:
 I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
 unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
 breaking the data up into chunks of about 10k records.  Each record is 
 about 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 
 GB data set, everything works fine.  When I upload the 20 GB data set, 
 servers will occasionally crash.  Currently I have my client code 
 automatically detect this and restart the server, but that is less than 
 ideal.

 I'm not sure what information to gather to determine what's going on here.  
 Here is a sample of a log 

Re: running TPC-C on cassandra clusters

2011-05-12 Thread Xiaowei Wang
Thanks Jonathan, but can you provide some links about 0.7 svn branch?

2011/5/12 Jonathan Ellis jbel...@gmail.com

 I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)

 On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Hi all,
 
  My partner and I currently using cassandra cluster to run TPC-C. We first
  use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,
 the
  other(worker node) has 4 cores. During the loading time, either the
 client
  node or the worker node will down(cannot be detected) randomly and then
  up again in a short time. If the two nodes both down, we failed in
  loading. If only one of them down, we can continue to load data.
 
  The problem is if we use multiple threads(we write multiprocess code),
 say 4
  clients threads, some of them might be stop at the point one of the nodes
  first down, and the dead threads will never come back This will not
 only
  enlarge our loading time, but also effect the amount of data we can load.
 
  So we need to figure out why the nodes continue to be up and down and fix
  this problem.
 
  Thanks for any help!
 
  Best,
  Xiaowei
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: running TPC-C on cassandra clusters

2011-05-12 Thread Jonathan Ellis
https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7

On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com wrote:
 Thanks Jonathan, but can you provide some links about 0.7 svn branch?

 2011/5/12 Jonathan Ellis jbel...@gmail.com

 I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)

 On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Hi all,
 
  My partner and I currently using cassandra cluster to run TPC-C. We
  first
  use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,
  the
  other(worker node) has 4 cores. During the loading time, either the
  client
  node or the worker node will down(cannot be detected) randomly and
  then
  up again in a short time. If the two nodes both down, we failed in
  loading. If only one of them down, we can continue to load data.
 
  The problem is if we use multiple threads(we write multiprocess code),
  say 4
  clients threads, some of them might be stop at the point one of the
  nodes
  first down, and the dead threads will never come back This will not
  only
  enlarge our loading time, but also effect the amount of data we can
  load.
 
  So we need to figure out why the nodes continue to be up and down and
  fix
  this problem.
 
  Thanks for any help!
 
  Best,
  Xiaowei
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: running TPC-C on cassandra clusters

2011-05-12 Thread Xiaowei Wang
Oh sorry, we use cassandra-0.7.4 already. Is the version fine?

2011/5/12 Jonathan Ellis jbel...@gmail.com

 https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7

 On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Thanks Jonathan, but can you provide some links about 0.7 svn branch?
 
  2011/5/12 Jonathan Ellis jbel...@gmail.com
 
  I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)
 
  On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
  wrote:
   Hi all,
  
   My partner and I currently using cassandra cluster to run TPC-C. We
   first
   use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,
   the
   other(worker node) has 4 cores. During the loading time, either the
   client
   node or the worker node will down(cannot be detected) randomly and
   then
   up again in a short time. If the two nodes both down, we failed in
   loading. If only one of them down, we can continue to load data.
  
   The problem is if we use multiple threads(we write multiprocess code),
   say 4
   clients threads, some of them might be stop at the point one of the
   nodes
   first down, and the dead threads will never come back This will
 not
   only
   enlarge our loading time, but also effect the amount of data we can
   load.
  
   So we need to figure out why the nodes continue to be up and down and
   fix
   this problem.
  
   Thanks for any help!
  
   Best,
   Xiaowei
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



assertion error in cassandra when doing nodetool move

2011-05-12 Thread Anurag Gujral
Hi All,
I run following command on one of my nodes to  move the token
from 0 to 2.
/usr/cassandra/cassandra/bin/nodetool -h 10.170.195.204 -p 8080 move 2. I
dont understand why is this happening?

I am getting the following assertion error:
Exception in thread main java.lang.AssertionError
at
org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:389)
at
org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:414)
at
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:94)
at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:929)
at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:895)
at
org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1595)
at
org.apache.cassandra.service.StorageService.move(StorageService.java:1733)
at
org.apache.cassandra.service.StorageService.move(StorageService.java:1708)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

Thanks
Anurag


Re: running TPC-C on cassandra clusters

2011-05-12 Thread Jonathan Ellis
Not if you want the pausing/marking down fixes that were done more recently. :)

On Thu, May 12, 2011 at 8:39 PM, Xiaowei Wang xiaowei...@gmail.com wrote:
 Oh sorry, we use cassandra-0.7.4 already. Is the version fine?

 2011/5/12 Jonathan Ellis jbel...@gmail.com

 https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7

 On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Thanks Jonathan, but can you provide some links about 0.7 svn branch?
 
  2011/5/12 Jonathan Ellis jbel...@gmail.com
 
  I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)
 
  On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
  wrote:
   Hi all,
  
   My partner and I currently using cassandra cluster to run TPC-C. We
   first
   use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,
   the
   other(worker node) has 4 cores. During the loading time, either the
   client
   node or the worker node will down(cannot be detected) randomly and
   then
   up again in a short time. If the two nodes both down, we failed in
   loading. If only one of them down, we can continue to load data.
  
   The problem is if we use multiple threads(we write multiprocess
   code),
   say 4
   clients threads, some of them might be stop at the point one of the
   nodes
   first down, and the dead threads will never come back This will
   not
   only
   enlarge our loading time, but also effect the amount of data we can
   load.
  
   So we need to figure out why the nodes continue to be up and down and
   fix
   this problem.
  
   Thanks for any help!
  
   Best,
   Xiaowei
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: running TPC-C on cassandra clusters

2011-05-12 Thread Xiaowei Wang
Thanks!

2011/5/12 Jonathan Ellis jbel...@gmail.com

 Not if you want the pausing/marking down fixes that were done more
 recently. :)

 On Thu, May 12, 2011 at 8:39 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Oh sorry, we use cassandra-0.7.4 already. Is the version fine?
 
  2011/5/12 Jonathan Ellis jbel...@gmail.com
 
  https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7
 
  On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com
  wrote:
   Thanks Jonathan, but can you provide some links about 0.7 svn branch?
  
   2011/5/12 Jonathan Ellis jbel...@gmail.com
  
   I'd recommend trying the 0.7 svn branch (soon to be voted on as
 0.7.6)
  
   On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
   wrote:
Hi all,
   
My partner and I currently using cassandra cluster to run TPC-C. We
first
use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8
 cores,
the
other(worker node) has 4 cores. During the loading time, either the
client
node or the worker node will down(cannot be detected) randomly
 and
then
up again in a short time. If the two nodes both down, we failed
 in
loading. If only one of them down, we can continue to load data.
   
The problem is if we use multiple threads(we write multiprocess
code),
say 4
clients threads, some of them might be stop at the point one of the
nodes
first down, and the dead threads will never come back This will
not
only
enlarge our loading time, but also effect the amount of data we can
load.
   
So we need to figure out why the nodes continue to be up and down
 and
fix
this problem.
   
Thanks for any help!
   
Best,
Xiaowei
   
   
  
  
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of DataStax, the source for professional Cassandra support
   http://www.datastax.com
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com