date:20110512

Greetings,

 Just out of curiosity is this on the receiver or sender side?

Looks like sender side, although the 2 nodes were replicating to each
other so it's hard to tell.

 
 I have been wondering a bit if the hint playback could need some
 adjustment. 
 There is potentially quite big differences on how much is sent per
 throttle delay time depending on what your data looks like.
 
 Early 0.7 releases also built up hints very easily under load due to
 nodes quickly getting marked as down due to gossip sharing the same
 thread as many other operations.

Like I said, cassandra was updated to 0.7.5 (latest build as of today)
following the advice on IRC.  There was no change in behavior.

Best,

Gabriel

Re: Excessive allocation during hinted handoff


 An if you have 10 nodes, do all of them happen to send hints to the two
 with GC?

The 2 nodes are adjacent in token range. They are replicating to each other.

Other nodes have no data to replicate so there's no proof one way or
another.


Best,

Gabriel

Re: CounterColumn increments gone after restart

2011-05-12 Thread Utku Can Topçu

see the ticket https://issues.apache.org/jira/browse/CASSANDRA-2642 please

On Thu, May 12, 2011 at 3:28 PM, Utku Can Topçu u...@topcu.gen.tr wrote:

 Hi guys,

 I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the
 way it should be but:
 - I create a ColumnFamily named Counters
 - do a few increments on a column.
 - kill cassandra
 - start cassandra

 When I look at the counter column, the value is 1.

 See the following pastebin please: http://pastebin.com/9jYdDiRY

Re: Excessive allocation during hinted handoff

Greetings,

 Doesn't really look abnormal to me for a heavy write load situation
 which is what receiving hints is.

I would agree with you but this raises some questions about write
performance. Plus I've only seen this kind of behavior recently and only
on 2 adjacent nodes. So I have good reason to believe this is the
exception and not the rule.

Best,

Gabriel

Re: Excessive allocation during hinted handoff


 I'm assuming the two nodes are the ones receiving the HH after they were 
 down. 

Adjacent, so yes.

 
 Are there a lot of hints collected while they are down ? you can check the 
 HintedHandOffManager MBean in JConsole

There wasn't any downtime - that's something else that's weird.

 
 What does the TPStats look like on the nodes under pressure ? And how many 
 nodes are delivering hints to the nodes when they restart?

TPStats do show activity on the HH. I'll have some examples latter if
the nodes decide to do this again.

 
 Finally hinted_handoff_throttle_delay_in_ms in conf/cassandra.yaml will let 
 you slow down the delivery rate if HH is indeed the problem. 

That's useful info. Thanks.

Best,

Gabriel

Re: Excessive allocation during hinted handoff

 What does the TPStats look like on the nodes under pressure ? And how many 
 nodes are delivering hints to the nodes when they restart?

$nodetool -h 127.0.0.1 tpstats
Pool NameActive   Pending  Completed
ReadStage 1 11992475
RequestResponseStage  0 02247486
MutationStage 0 01631349
ReadRepairStage   0 0 583432
GossipStage   0 0 241324
AntiEntropyStage  0 0  0
MigrationStage0 0  0
MemtablePostFlusher   0 0 46
StreamStage   0 0  0
FlushWriter   0 0 46
MiscStage 0 0  0
FlushSorter   0 0  0
InternalResponseStage 0 0  0
HintedHandoff 1 5152


dstat -cmdln during the event:

total-cpu-usage --memory-usage- ---load-avg---
-dsk/total- -net/total-
usr sys idl wai hiq siq| used  buff  cach  free| 1m   5m  15m | read
writ| recv  send
 87   6   6   0   0   1|6890M 32.1M 1001M 42.8M|2.36 2.87 1.73|   0
0 |  75k  144k
 88  10   2   0   0   0|6889M 32.2M 1002M 41.6M|3.05 3.00 1.78|   0
0 |  60k  102k
 89   9   2   0   0   0|6890M 32.2M 1003M 41.0M|3.05 3.00 1.78|   0
0 |  38k   70k
 89  10   1   0   0   0|6890M 32.2M 1003M 40.7M|3.05 3.00 1.78|   0
0 |  26k   24k
 93   6   2   0   0   0|6890M 32.2M 1003M 40.9M|3.05 3.00 1.78|   0
0 |  37k   31k
 90   8   2   0   0   0|6890M 32.2M 1003M 39.9M|3.05 3.00 1.78|   0
0 |  67k   69k
 87   8   4   0   0   1|6890M 32.2M 1004M 38.7M|4.09 3.22 1.85|   0
0 | 123k  262k
 83  13   2   0   0   2|6890M 32.2M 1004M 38.3M|4.09 3.22 1.85|   0
0 | 445k   18M
 90   6   3   0   0   0|6890M 32.2M 1005M 38.2M|4.09 3.22 1.85|   0
0 |  72k   91k
 40   7  25  27   0   0|6890M 32.2M 1005M 37.8M|4.09 3.22 1.85|   0
0 | 246k 8034k
  0   0  59  41   0   0|6890M 32.2M 1005M 37.7M|4.09 3.22 1.85|   0
0 |  19k 6490B
  1   2  45  52   0   0|6891M 32.2M  999M 43.1M|4.00 3.21 1.86|   0
0 |  29k   18k
 72   8  15   3   0   1|6892M 32.2M  999M 41.6M|4.00 3.21 1.86|   0
0 | 431k   11M
 88   9   2   0   0   1|6907M 32.0M  985M 41.1M|4.00 3.21 1.86|   0
0 |  99k   77k
 88  10   1   0   0   1|6913M 31.9M  977M 44.1M|4.00 3.21 1.86|   0
0 | 112k  619k
 89   9   1   0   0   1|6892M 31.9M  977M 64.4M|4.00 3.21 1.86|   0
0 | 109k  369k
 90   8   1   0   0   0|6892M 31.9M  979M 62.5M|4.80 3.39 1.92|   0
0 | 130k   97k
 83  13   1   0   0   3|6893M 32.0M  981M 59.8M|4.80 3.39 1.92|   0
0 | 503k   18M
 78  11  10   0   0   0|6893M 32.0M  981M 59.5M|4.80 3.39 1.92|   0
0 | 102k  110k


The low cpu periods are due to major GC (JVM frozen).

 
 TPStats do show activity on the HH. I'll have some examples latter if
 the nodes decide to do this again.
 

 Finally hinted_handoff_throttle_delay_in_ms in conf/cassandra.yaml will let 
 you slow down the delivery rate if HH is indeed the problem. 
 


Best,

Gabriel

Re: Cassandra causing very high load on CPU's 0.6.3

2011-05-12 Thread Ali Ahsan


On 05/12/2011 04:08 PM, Ali Ahsan wrote:

Hi All

I am experience some problem with me two Cassandra node with RF=2,Both 
node CPU's usage is very high,load average: 9.47, 5.72, 5.11 and this 
causing my application to time out .I have xeon with 8 processor and 
16 GB of Ram.and LVM setup for Cassandra.How can i trace the main 
issue of load i have no swap and swappies also set to 0.I am running 
Centos 5.5 64 Bit. 


Add to this i am using openjdk not sunjdk will this be an issue ?

Re: Cassandra causing very high load on CPU's 0.6.3

2011-05-12 Thread Sylvain Lebresne

On Thu, May 12, 2011 at 6:04 PM, Ali Ahsan ali.ah...@panasiangroup.com wrote:
 On 05/12/2011 04:08 PM, Ali Ahsan wrote:

 Hi All

 I am experience some problem with me two Cassandra node with RF=2,Both
 node CPU's usage is very high,load average: 9.47, 5.72, 5.11 and this
 causing my application to time out .I have xeon with 8 processor and 16 GB
 of Ram.and LVM setup for Cassandra.How can i trace the main issue of load i
 have no swap and swappies also set to 0.I am running Centos 5.5 64 Bit.

 Add to this i am using openjdk not sunjdk will this be an issue ?

It is indeed advised to use sunjdk as openjdk is a bit behind as far
as bug fixes are
concerned.

Moreover, 0.6.3 is pretty old now and we do have fixed a number of
issue related to
load spikes, so before investigating further the best advice I can
give you is to upgrade
(either to 0.6.13 if you really feel like staying on 0.6, or to 0.7.5).

--
Sylvain

Monitoring bytes read per cf

2011-05-12 Thread Daniel Doubleday

Hi all

got a question for folks with some code insight again.

To be able to better understand where our IO load is coming from we want to 
monitor the number of bytes read from disc per cf. (we love stats)

What I have done is wrapping the FileDataInput in SSTableReader to sum the 
bytes read in CFS. This will only record data file access but that would be 
good enough for us.

It seems to work fine but maybe someone here knows that this is not a good idea 


Cheers,
Daniel

Some code:

SSTableReader:
private static final boolean KEEP_IO_STATISICS = 
Boolean.getBoolean(cassandra.keepIOStats);
public FileDataInput getFileDataInput(DecoratedKey decoratedKey, int 
bufferSize)
{
long position = getPosition(decoratedKey, Operator.EQ);
if (position  0)
return null;

FileDataInput segment = dfile.getSegment(position, bufferSize);
return (KEEP_IO_STATISICS) ? new MonitoringFileDataIInput(metadata, 
segment) : segment; 
}

with MonitoringFileDataIInput

public class MonitoringFileDataIInput implements FileDataInput, Closeable
{

private final FileDataInput fileDataInput;
private final ColumnFamilyStore columnFamilyStore;

public MonitoringFileDataIInput(CFMetaData cfMetaData, FileDataInput 
fileDataInput)
{
columnFamilyStore = 
Table.open(cfMetaData.tableName).getColumnFamilyStore(cfMetaData.cfId);
this.fileDataInput = fileDataInput;
}

@Override
public boolean readBoolean() throws IOException
{
columnFamilyStore.addBytesRead(1);
return fileDataInput.readBoolean();
}

// ... etc

and ColumnFamilyStore
private final AtomicLong bytesRead = new AtomicLong(0L);

@Override // ColumnFamilyStoreMBean
public long getBytesRead()
{
return bytesRead.get();
}

public void addBytesRead(int num) 
{
bytesRead.addAndGet(num);
}

Re: Cassandra causing very high load on CPU's 0.6.3

https://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.6.13/CHANGES.txt

On Thu, May 12, 2011 at 11:56 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote:

 It is indeed advised to use sunjdk as openjdk is a bit behind as far
 as bug fixes are
 concerned.

 Moreover, 0.6.3 is pretty old now and we do have fixed a number of
 issue related to
 load spikes, so before investigating further the best advice I can
 give you is to upgrade
 (either to 0.6.13 if you really feel like staying on 0.6, or to 0.7.5).

 Thanks let me discussed that with my team.How many changes and where do we
 need.

 --
 S.Ali Ahsan

 Senior System Engineer

 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure, error
 or virus-free. We do not accept liability for any errors or omissions.





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: network topology issue

2011-05-12 Thread Anurag Gujral

Thanks everyone for your responses.

On Thu, May 12, 2011 at 1:18 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, May 12, 2011 at 1:58 AM, Anurag Gujral anurag.guj...@gmail.com
 wrote:
  Hi All,
   I am testing network topology strategy in cassandra I am
 using
  two nodes , one node each in different data center.
  Since the nodes are in different dc I assigned token 0 to both the nodes.
  I added both the nodes as seeds in the cassandra.yaml and  I am  using
  properyfilesnitch as endpoint snitch where I have specified the colo
  details.
 
  I started first node then I when I restarted second node I got an error
 that
  token 0 is already being used.Why am I getting this error.

 You cannot have two nodes with the same token, so you'll have to use 0
 and
 1 for instance. It's true that with NTS you have to think of each
 datacenter
 as a separate ring, but there is still this restriction that each token
 must be
 different across the whole cluster.

 
  Second Question: I already have cassandra running in two different data
  centers I want to add a new keyspace which uses networkTopology strategy
  in the light of above errors how can I accomplish this.
 
 
  Thanks
  Anurag

Commitlog Disk Full

2011-05-12 Thread Sanjeev Kulkarni

Hey guys,
I have a ec2 debian cluster consisting of several nodes running 0.7.5 on
ephimeral disks.
These are fresh installs and not upgrades.
The commitlog is set to the smaller of the disks which is around 10G in size
and the datadir is set to the bigger disk.
The config file is basically the same as the one supplied by the default
installation.
Our applications write to the cluster. After about a day of writing we
started noticing the commitlog disk filling up. Soon we went over the disk
limit and writes started failing. At this point we stopped the cluster.
Over the course of the day we inserted around 25G of data. Our columns
values are pretty small.
I understand that cassandra periodically cleans up the commitlog directories
by generating sstables in datadir. Is there any way to speed up this
movement from commitog to datadir?
Thanks!

Re: Commitlog Disk Full

2011-05-12 Thread Peter Schuller

 I understand that cassandra periodically cleans up the commitlog directories
 by generating sstables in datadir. Is there any way to speed up this
 movement from commitog to datadir?

commitlog_rotation_threshold_in_mb could cause problems if it was set
very very high, but with the default of 128mb it should not be an
issue.

I suspect the most likely reason is that you have a column family
whose memtable flush settings are extreme. A commit log segment cannot
be removed until the corresponding data has been flushed to an
sstable. For high-throughput memtables where you flush regularly this
should happen often. For idle or almost idle memtables you may be
waiting on the timeout criteria to trigger. So in general, having a
memtable with a long expiry time will have the potential to generate
commit logs of whatever size is implied by the write traffic during
that periods.

The memtable setting in question is the memtable_flush_after
setting. Do you have that set to something very high on one of your
column families?

You can use describe keyspace name_of_keyspace in cassandra-cli to
check current settings.

-- 
/ Peter Schuller

Hinted Handoff

2011-05-12 Thread Anurag Gujral

Hi All,
   I have two questions:
a) Is there  a way to turn on and off hinted handoff per keyspace rather
than for multiple keyspaces.
b)It looks like cassandra stores hinted handoff data in one row.Is it true?
.Does having one row for hinted handoff implies
if nodes are down for longer period of time not all the data which needs to
be replicated will be on the node which is alive.

Thanks
Anurag

running TPC-C on cassandra clusters

Hi all,

My partner and I currently using cassandra cluster to run TPC-C. We first
use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,  the
other(worker node) has 4 cores. During the loading time, either the client
node or the worker node will down(cannot be detected) randomly and then
up again in a short time. If the two nodes both down, we failed in
loading. If only one of them down, we can continue to load data.

The problem is if we use multiple threads(we write multiprocess code), say 4
clients threads, some of them might be stop at the point one of the nodes
first down, and the dead threads will never come back This will not only
enlarge our loading time, but also effect the amount of data we can load.

So we need to figure out why the nodes continue to be up and down and fix
this problem.

Thanks for any help!

Best,
Xiaowei

Re: Hinted Handoff

2011-05-12 Thread Sameer Farooqui

I'm not sure about your first question.

I believe the internal system keyspace holds the hinted handoff information.

In 0.6 and earlier, HintedHandoffManager.sendMessage used to read the entire
row into memory and then send the row back to the client in a single
message. As of 0.7, Cassandra pages within a single hinted row instead
(which improves performance for wide rows).



On Thu, May 12, 2011 at 11:48 AM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Hi All,
I have two questions:
 a) Is there  a way to turn on and off hinted handoff per keyspace rather
 than for multiple keyspaces.
 b)It looks like cassandra stores hinted handoff data in one row.Is it true?
 .Does having one row for hinted handoff implies
 if nodes are down for longer period of time not all the data which needs to
 be replicated will be on the node which is alive.

 Thanks
 Anurag

Re: running TPC-C on cassandra clusters

I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)

On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com wrote:
 Hi all,

 My partner and I currently using cassandra cluster to run TPC-C. We first
 use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,  the
 other(worker node) has 4 cores. During the loading time, either the client
 node or the worker node will down(cannot be detected) randomly and then
 up again in a short time. If the two nodes both down, we failed in
 loading. If only one of them down, we can continue to load data.

 The problem is if we use multiple threads(we write multiprocess code), say 4
 clients threads, some of them might be stop at the point one of the nodes
 first down, and the dead threads will never come back This will not only
 enlarge our loading time, but also effect the amount of data we can load.

 So we need to figure out why the nodes continue to be up and down and fix
 this problem.

 Thanks for any help!

 Best,
 Xiaowei





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Crash when uploading large data sets

2011-05-12 Thread James Cipar

I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
unique data), to a cluster of 10 servers.  I'm using batch_mutate, and breaking 
the data up into chunks of about 10k records.  Each record is about 5KB, so a 
total of about 50MB per batch.  When I upload a smaller 2 GB data set, 
everything works fine.  When I upload the 20 GB data set, servers will 
occasionally crash.  Currently I have my client code automatically detect this 
and restart the server, but that is less than ideal.

I'm not sure what information to gather to determine what's going on here.  
Here is a sample of a log file from when a crash occurred.  The crash was 
immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea 
what's going on here?  Any other info I can gather to try to debug this?







 INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) GC 
for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
7774142464
 INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) GC 
for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
7774142464
 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
50) Creating new commitlog segment 
/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
 INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 1115783 
operations)
 INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) Writing 
Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
 INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) GC 
for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 
7774142464
 INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) GC 
for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 
7774142464
 INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) 
Completed flushing 
/mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 
bytes)
 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) 
Discarding obsolete commit 
log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log)
 INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) GC 
for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 
7774142464
 INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 
1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 
operations)
 INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) Writing 
Memtable-Standard1@479849353(51941121 bytes, 1115783 operations)
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61
 INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) 
Logging initialized
 INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) 
Heap size: 7634681856/7635730432
 INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. 
Native methods will be disabled.
 INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) Loading 
settings from 
file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml
 INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) 
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Schema-f-1
 INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Schema-f-2
 INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1
 INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2
 INFO [main] 2011-05-12 19:02:21,342 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-2
 INFO [main] 2011-05-12 19:02:21,344 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-1
 INFO

Re: Crash when uploading large data sets

2011-05-12 Thread Sameer Farooqui

The key JVM options for Cassandra are in cassandra.in.sh.

What is your min and max heap size?

The default setting of max heap size is 1GB. How much RAM do your nodes
have? You may want to increase this setting. You can also set the -Xmx and
-Xms options to the same value to keep Java from having to manage heap
growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you
can get a lot more on 64-bit.

Try messing with some of the other settings in the cassandra.in.sh file.

You may not have DEBUG mode turned on for Cassandra and therefore may not be
getting the full details of what's going on when the server crashes. In the
cassandra-home/conf/log4j-server.properties file, set this line from the
default of INFO to DEBUG:

log4j.rootLogger=INFO,stdout,R


Also, you haven't configured JNA on this server. Here's some info about it
and how to configure it:

JNA provides Java programs easy access to native shared libraries without
writing anything but Java code.

Note from Cassandra developers for why JNA is needed:
*Linux aggressively swaps out infrequently used memory to make more room
for its file system buffer cache. Unfortunately, modern generational garbage
collectors like the JVM's leave parts of its heap un-touched for relatively
large amounts of time, leading Linux to swap it out. When the JVM finally
goes to use or GC that memory, swap hell ensues.

Setting swappiness to zero can mitigate this behavior but does not eliminate
it entirely. Turning off swap entirely is effective. But to avoid surprising
people who don't know about this behavior, the best solution is to tell
Linux not to swap out the JVM, and that is what we do now with mlockall via
JNA.

Because of licensing issues, we can't distribute JNA with Cassandra, so you
must manually add it to the Cassandra lib/ directory or otherwise place it
on the classpath. If the JNA jar is not present, Cassandra will continue as
before.*

Get JNA with:
*cd ~
wget
http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb*

To install:
*techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
(Reading database ... 44334 files and directories currently installed.)
Preparing to replace libjna-java 3.2.4-2 (using
libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
Unpacking replacement libjna-java ...
Setting up libjna-java (3.2.7-0~nmu.2) ...*


The deb package will install the JNA jar file to /usr/share/java/jna.jar,
but Cassandra only loads it if its in the class path. The easy way to do
this is just create a symlink into your Cassandra lib directory (note:
replace /home/techlabs with your home dir location):
*ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib*

Research:
http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/


- Sameer


On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote:

 I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB
 unique data), to a cluster of 10 servers.  I'm using batch_mutate, and
 breaking the data up into chunks of about 10k records.  Each record is about
 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data
 set, everything works fine.  When I upload the 20 GB data set, servers will
 occasionally crash.  Currently I have my client code automatically detect
 this and restart the server, but that is less than ideal.

 I'm not sure what information to gather to determine what's going on here.
  Here is a sample of a log file from when a crash occurred.  The crash was
 immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea
 what's going on here?  Any other info I can gather to try to debug this?







  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line
 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max
 is 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line
 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max
 is 7774142464
  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java
 (line 50) Creating new commitlog segment
 /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java
 (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529
 bytes, 1115783 operations)
  INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158)
 Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
  INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line
 128) GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max
 is 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line
 128) GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max
 is 7774142464
  INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165)
 Completed flushing

Re: Unable to add columns to empty row in Column family: Cassandra

2011-05-12 Thread Narendra Sharma

Can u share the code?

On Mon, May 2, 2011 at 11:34 PM, anuya joshi anu...@gmail.com wrote:

 Hello,

 I am using Cassandra for my application.My Cassandra client uses Thrift
 APIs directly. The problem I am facing currently is as follows:

 1) I added a row and columns in it dynamically via Thrift API Client
 2) Next, I used command line client to delete row which actually deleted
 all the columns in it, leaving empty row with original row id.
 3) Now, I am trying to add columns dynamically using client program into
 this empty row with same row key
 However, columns are not being inserted.
 But, when tried from command line client, it worked correctly.

 Any pointer on this would be of great use

 Thanks in  advance,

 Regards,
 Anuya




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*

Re: Crash when uploading large data sets

2011-05-12 Thread James Cipar

It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my 
physical memory.  These are 15GB VMs, so that's 7.5GB for Cassandra.  I would 
have expected that to work, but I will override to 13 GB just to see what 
happens.

I've also got the JNA thing set up.  Do you think this would cause the crashes, 
or is it just a performance improvement?



On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:

 The key JVM options for Cassandra are in cassandra.in.sh.
 
 What is your min and max heap size?
 
 The default setting of max heap size is 1GB. How much RAM do your nodes have? 
 You may want to increase this setting. You can also set the -Xmx and -Xms 
 options to the same value to keep Java from having to manage heap growth. On 
 a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a 
 lot more on 64-bit.
 
 Try messing with some of the other settings in the cassandra.in.sh file.
 
 You may not have DEBUG mode turned on for Cassandra and therefore may not be 
 getting the full details of what's going on when the server crashes. In the 
 cassandra-home/conf/log4j-server.properties file, set this line from the 
 default of INFO to DEBUG:
 
 log4j.rootLogger=INFO,stdout,R
 
 
 Also, you haven't configured JNA on this server. Here's some info about it 
 and how to configure it:
 
 JNA provides Java programs easy access to native shared libraries without 
 writing anything but Java code.
 
 Note from Cassandra developers for why JNA is needed:
 Linux aggressively swaps out infrequently used memory to make more room for 
 its file system buffer cache. Unfortunately, modern generational garbage 
 collectors like the JVM's leave parts of its heap un-touched for relatively 
 large amounts of time, leading Linux to swap it out. When the JVM finally 
 goes to use or GC that memory, swap hell ensues.
 
 Setting swappiness to zero can mitigate this behavior but does not eliminate 
 it entirely. Turning off swap entirely is effective. But to avoid surprising 
 people who don't know about this behavior, the best solution is to tell Linux 
 not to swap out the JVM, and that is what we do now with mlockall via JNA.
 
 Because of licensing issues, we can't distribute JNA with Cassandra, so you 
 must manually add it to the Cassandra lib/ directory or otherwise place it on 
 the classpath. If the JNA jar is not present, Cassandra will continue as 
 before.
 
 Get JNA with: 
 cd ~
 wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb
 
 To install: 
 techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
 (Reading database ... 44334 files and directories currently installed.)
 Preparing to replace libjna-java 3.2.4-2 (using 
 libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
 Unpacking replacement libjna-java ...
 Setting up libjna-java (3.2.7-0~nmu.2) ...
 
 
 The deb package will install the JNA jar file to /usr/share/java/jna.jar, but 
 Cassandra only loads it if its in the class path. The easy way to do this is 
 just create a symlink into your Cassandra lib directory (note: replace 
 /home/techlabs with your home dir location):
 ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib
 
 Research:
 http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/
 
 
 - Sameer
 
 
 On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote:
 I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
 unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
 breaking the data up into chunks of about 10k records.  Each record is about 
 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data 
 set, everything works fine.  When I upload the 20 GB data set, servers will 
 occasionally crash.  Currently I have my client code automatically detect 
 this and restart the server, but that is less than ideal.
 
 I'm not sure what information to gather to determine what's going on here.  
 Here is a sample of a log file from when a crash occurred.  The crash was 
 immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea 
 what's going on here?  Any other info I can gather to try to debug this?
 
 
 
 
 
 
 
  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) 
 GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) 
 GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
 7774142464
  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
 50) Creating new commitlog segment 
 /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 
 1115783 operations)
  INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158)

Re: Crash when uploading large data sets

2011-05-12 Thread James Cipar

Oh, forgot this detail: I have no swap configured, so swapping is not the
cause of the crash. Could it be that I'm running out of memory on a 15GB
machine? That seems unlikely. I grepped dmesg for oom and didn't see
anything from the oom killer, and I used the instructions from the following
web page and didn't see that the oom killer had killed anything.

http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer

jcipar@172-19-149-62:~$ sudo cat /var/log/messages | grep --ignore-case killed
process
jcipar@172-19-149-62:~$

Also, this is pretty subjective, so I can't say for sure until it finishes, but
this seems to be running *much* slower after setting the heap size and setting
up JNA.

On May 12, 2011, at 7:52 PM, James Cipar wrote:

It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my
physical memory. These are 15GB VMs, so that's 7.5GB for Cassandra. I would
have expected that to work, but I will override to 13 GB just to see what
happens.

I've also got the JNA thing set up. Do you think this would cause the
crashes, or is it just a performance improvement?

On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:

The key JVM options for Cassandra are in cassandra.in.sh.

What is your min and max heap size?

The default setting of max heap size is 1GB. How much RAM do your nodes
have? You may want to increase this setting. You can also set the -Xmx and
-Xms options to the same value to keep Java from having to manage heap
growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you
can get a lot more on 64-bit.

Try messing with some of the other settings in the cassandra.in.sh file.

You may not have DEBUG mode turned on for Cassandra and therefore may not be
getting the full details of what's going on when the server crashes. In the
cassandra-home/conf/log4j-server.properties file, set this line from the
default of INFO to DEBUG:

log4j.rootLogger=INFO,stdout,R

Also, you haven't configured JNA on this server. Here's some info about it
and how to configure it:

JNA provides Java programs easy access to native shared libraries without
writing anything but Java code.

Note from Cassandra developers for why JNA is needed:
Linux aggressively swaps out infrequently used memory to make more room for
its file system buffer cache. Unfortunately, modern generational garbage
collectors like the JVM's leave parts of its heap un-touched for relatively
large amounts of time, leading Linux to swap it out. When the JVM finally
goes to use or GC that memory, swap hell ensues.

Setting swappiness to zero can mitigate this behavior but does not eliminate
it entirely. Turning off swap entirely is effective. But to avoid surprising
people who don't know about this behavior, the best solution is to tell
Linux not to swap out the JVM, and that is what we do now with mlockall via
JNA.

Because of licensing issues, we can't distribute JNA with Cassandra, so you
must manually add it to the Cassandra lib/ directory or otherwise place it
on the classpath. If the JNA jar is not present, Cassandra will continue as
before.

Get JNA with:
cd ~
wget
http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb

To install:
techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
(Reading database ... 44334 files and directories currently installed.)
Preparing to replace libjna-java 3.2.4-2 (using
libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
Unpacking replacement libjna-java ...
Setting up libjna-java (3.2.7-0~nmu.2) ...

The deb package will install the JNA jar file to /usr/share/java/jna.jar,
but Cassandra only loads it if its in the class path. The easy way to do
this is just create a symlink into your Cassandra lib directory (note:
replace /home/techlabs with your home dir location):
ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib

Research:
http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/

- Sameer

On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote:
I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB
unique data), to a cluster of 10 servers. I'm using batch_mutate, and
breaking the data up into chunks of about 10k records. Each record is about
5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data
set, everything works fine. When I upload the 20 GB data set, servers will
occasionally crash. Currently I have my client code automatically detect
this and restart the server, but that is less than ideal.

I'm not sure what information to gather to determine what's going on here.
Here is a sample of a log file from when a crash occurred. The crash was
immediately after the log entry tagged 2011-05-12 19:02:19,377. Any idea
what's going on here? Any other info I can

Re: Crash when uploading large data sets

If it's a jvm crash there should be a hs_err_pid.log file left around
in the directory you started Cassandra from.

On Thu, May 12, 2011 at 6:15 PM, James Cipar jci...@cmu.edu wrote:
 I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
 unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
 breaking the data up into chunks of about 10k records.  Each record is about 
 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data 
 set, everything works fine.  When I upload the 20 GB data set, servers will 
 occasionally crash.  Currently I have my client code automatically detect 
 this and restart the server, but that is less than ideal.

 I'm not sure what information to gather to determine what's going on here.  
 Here is a sample of a log file from when a crash occurred.  The crash was 
 immediately after the log entry tagged 2011-05-12 19:02:19,377.  Any idea 
 what's going on here?  Any other info I can gather to try to debug this?







  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) 
 GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) 
 GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
 7774142464
  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
 50) Creating new commitlog segment 
 /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 
 1115783 operations)
  INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) 
 Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
  INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) 
 GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 
 7774142464
  INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) 
 GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 
 7774142464
  INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) 
 Completed flushing 
 /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 
 bytes)
  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) 
 Discarding obsolete commit 
 log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log)
  INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) 
 GC for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 
 7774142464
  INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 
 1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 
 operations)
  INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) 
 Writing Memtable-Standard1@479849353(51941121 bytes, 1115783 operations)
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67
  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) 
 Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61
  INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) 
 Logging initialized
  INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) 
 Heap size: 7634681856/7635730432
  INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. 
 Native methods will be disabled.
  INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) 
 Loading settings from 
 file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml
  INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) 
 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
  INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening 
 /mnt/scratch/jcipar/cassandra/data/system/Schema-f-1
  INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening 
 /mnt/scratch/jcipar/cassandra/data/system/Schema-f-2
  INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening 
 /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1
  INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening 
 /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2
  INFO [main]

Re: Crash when uploading large data sets

2011-05-12 Thread Jeffrey Kesselman

If this a 64bit VM?

A 32bit Java VM with default c-heap settings can only actually use
about 2GB of Java Heap.

On Thu, May 12, 2011 at 8:08 PM, James Cipar jci...@cmu.edu wrote:
Oh, forgot this detail: I have no swap configured, so swapping is not the
cause of the crash. Could it be that I'm running out of memory on a 15GB
machine? That seems unlikely. I grepped dmesg for oom and didn't see
anything from the oom killer, and I used the instructions from the following
web page and didn't see that the oom killer had killed anything.

http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer

jcipar@172-19-149-62:~$ sudo cat /var/log/messages | grep --ignore-case
killed process
jcipar@172-19-149-62:~$

Also, this is pretty subjective, so I can't say for sure until it finishes,
but this seems to be running *much* slower after setting the heap size and
setting up JNA.

On May 12, 2011, at 7:52 PM, James Cipar wrote:

It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my
physical memory. These are 15GB VMs, so that's 7.5GB for Cassandra. I
would have expected that to work, but I will override to 13 GB just to see
what happens.

I've also got the JNA thing set up. Do you think this would cause the
crashes, or is it just a performance improvement?

On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:

The key JVM options for Cassandra are in cassandra.in.sh.

What is your min and max heap size?

Try messing with some of the other settings in the cassandra.in.sh file.

You may not have DEBUG mode turned on for Cassandra and therefore may not
be getting the full details of what's going on when the server crashes. In
the cassandra-home/conf/log4j-server.properties file, set this line from
the default of INFO to DEBUG:

log4j.rootLogger=INFO,stdout,R

Also, you haven't configured JNA on this server. Here's some info about it
and how to configure it:

JNA provides Java programs easy access to native shared libraries without
writing anything but Java code.

Note from Cassandra developers for why JNA is needed:
Linux aggressively swaps out infrequently used memory to make more room
for its file system buffer cache. Unfortunately, modern generational
garbage collectors like the JVM's leave parts of its heap un-touched for
relatively large amounts of time, leading Linux to swap it out. When the
JVM finally goes to use or GC that memory, swap hell ensues.

Setting swappiness to zero can mitigate this behavior but does not
eliminate it entirely. Turning off swap entirely is effective. But to avoid
surprising people who don't know about this behavior, the best solution is
to tell Linux not to swap out the JVM, and that is what we do now with
mlockall via JNA.

Get JNA with:
cd ~
wget
http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb

Research:
http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/

- Sameer

On Thu, May 12, 2011 at 4:15 PM, James Cipar jci...@cmu.edu wrote:
I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB
unique data), to a cluster of 10 servers. I'm using batch_mutate, and
breaking the data up into chunks of about 10k records. Each record is
about 5KB, so a total of about 50MB per batch. When I upload a smaller 2
GB data set, everything works fine. When I upload the 20 GB data set,
servers will occasionally crash. Currently I have my client code
automatically detect this and restart the server, but that is less than
ideal.

I'm not sure what information to gather to determine what's going on here.
Here is a sample of a log

Re: running TPC-C on cassandra clusters

Thanks Jonathan, but can you provide some links about 0.7 svn branch?

2011/5/12 Jonathan Ellis jbel...@gmail.com

 I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)

 On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Hi all,
 
  My partner and I currently using cassandra cluster to run TPC-C. We first
  use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,
 the
  other(worker node) has 4 cores. During the loading time, either the
 client
  node or the worker node will down(cannot be detected) randomly and then
  up again in a short time. If the two nodes both down, we failed in
  loading. If only one of them down, we can continue to load data.
 
  The problem is if we use multiple threads(we write multiprocess code),
 say 4
  clients threads, some of them might be stop at the point one of the nodes
  first down, and the dead threads will never come back This will not
 only
  enlarge our loading time, but also effect the amount of data we can load.
 
  So we need to figure out why the nodes continue to be up and down and fix
  this problem.
 
  Thanks for any help!
 
  Best,
  Xiaowei
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: running TPC-C on cassandra clusters

https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7

On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com wrote:
 Thanks Jonathan, but can you provide some links about 0.7 svn branch?

 2011/5/12 Jonathan Ellis jbel...@gmail.com

 I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)

 On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Hi all,
 
  My partner and I currently using cassandra cluster to run TPC-C. We
  first
  use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,
  the
  other(worker node) has 4 cores. During the loading time, either the
  client
  node or the worker node will down(cannot be detected) randomly and
  then
  up again in a short time. If the two nodes both down, we failed in
  loading. If only one of them down, we can continue to load data.
 
  The problem is if we use multiple threads(we write multiprocess code),
  say 4
  clients threads, some of them might be stop at the point one of the
  nodes
  first down, and the dead threads will never come back This will not
  only
  enlarge our loading time, but also effect the amount of data we can
  load.
 
  So we need to figure out why the nodes continue to be up and down and
  fix
  this problem.
 
  Thanks for any help!
 
  Best,
  Xiaowei
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: running TPC-C on cassandra clusters

Oh sorry, we use cassandra-0.7.4 already. Is the version fine?

2011/5/12 Jonathan Ellis jbel...@gmail.com

 https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7

 On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Thanks Jonathan, but can you provide some links about 0.7 svn branch?
 
  2011/5/12 Jonathan Ellis jbel...@gmail.com
 
  I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)
 
  On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
  wrote:
   Hi all,
  
   My partner and I currently using cassandra cluster to run TPC-C. We
   first
   use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,
   the
   other(worker node) has 4 cores. During the loading time, either the
   client
   node or the worker node will down(cannot be detected) randomly and
   then
   up again in a short time. If the two nodes both down, we failed in
   loading. If only one of them down, we can continue to load data.
  
   The problem is if we use multiple threads(we write multiprocess code),
   say 4
   clients threads, some of them might be stop at the point one of the
   nodes
   first down, and the dead threads will never come back This will
 not
   only
   enlarge our loading time, but also effect the amount of data we can
   load.
  
   So we need to figure out why the nodes continue to be up and down and
   fix
   this problem.
  
   Thanks for any help!
  
   Best,
   Xiaowei
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

assertion error in cassandra when doing nodetool move

2011-05-12 Thread Anurag Gujral

Hi All,
I run following command on one of my nodes to  move the token
from 0 to 2.
/usr/cassandra/cassandra/bin/nodetool -h 10.170.195.204 -p 8080 move 2. I
dont understand why is this happening?

I am getting the following assertion error:
Exception in thread main java.lang.AssertionError
at
org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:389)
at
org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:414)
at
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:94)
at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:929)
at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:895)
at
org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1595)
at
org.apache.cassandra.service.StorageService.move(StorageService.java:1733)
at
org.apache.cassandra.service.StorageService.move(StorageService.java:1708)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

Thanks
Anurag

Re: running TPC-C on cassandra clusters

Not if you want the pausing/marking down fixes that were done more recently. :)

On Thu, May 12, 2011 at 8:39 PM, Xiaowei Wang xiaowei...@gmail.com wrote:
 Oh sorry, we use cassandra-0.7.4 already. Is the version fine?

 2011/5/12 Jonathan Ellis jbel...@gmail.com

 https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7

 On Thu, May 12, 2011 at 8:33 PM, Xiaowei Wang xiaowei...@gmail.com
 wrote:
  Thanks Jonathan, but can you provide some links about 0.7 svn branch?
 
  2011/5/12 Jonathan Ellis jbel...@gmail.com
 
  I'd recommend trying the 0.7 svn branch (soon to be voted on as 0.7.6)
 
  On Thu, May 12, 2011 at 3:09 PM, Xiaowei Wang xiaowei...@gmail.com
  wrote:
   Hi all,
  
   My partner and I currently using cassandra cluster to run TPC-C. We
   first
   use 2 ec2 nodes to load 20 warehouses. One(client node)  has 8 cores,
   the
   other(worker node) has 4 cores. During the loading time, either the
   client
   node or the worker node will down(cannot be detected) randomly and
   then
   up again in a short time. If the two nodes both down, we failed in
   loading. If only one of them down, we can continue to load data.
  
   The problem is if we use multiple threads(we write multiprocess
   code),
   say 4
   clients threads, some of them might be stop at the point one of the
   nodes
   first down, and the dead threads will never come back This will
   not
   only
   enlarge our loading time, but also effect the amount of data we can
   load.
  
   So we need to figure out why the nodes continue to be up and down and
   fix
   this problem.
  
   Thanks for any help!
  
   Best,
   Xiaowei
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: running TPC-C on cassandra clusters