Re: Multiple cursors

2013-05-27 Thread Vitalii Tymchyshyn
Sorry, it was T9. Of course, it was async thrift client, not "a sync".
21 трав. 2013 11:16, "aaron morton"  напис.

> We were successfully using a sync thrift client. With it we could send
> multiple requests through the single connection and wait for answers.
>
> Can you provide an example ?
>
> With sync the server thread that handles your client socket blocks waiting
> for the request to complete.  There is also state associated with the
> connection that from memory is considered to be essentially request state.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/05/2013, at 9:57 PM, Vitalii Tymchyshyn  wrote:
>
> We were successfully using a sync thrift client. With it we could send
> multiple requests through the single connection and wait for answers.
> 17 трав. 2013 02:51, "aaron morton"  напис.
>
>> We don't have cursors in the RDBMS sense of things.
>>
>> If you are using thrift the recommendation is to use connection pooling
>> and re-use connections for different requests. Note that you can not
>> multiplex queries over the same thrift connection, you must wait for the
>> response before issuing another request. The native binary transport allows
>> multiplexing though.
>>
>> In general you should use one of the pre build client libraries as they
>> will take care of connection pooling etc for you
>> https://wiki.apache.org/cassandra/ClientOptions
>>
>> Cheers
>>
>>-
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 16/05/2013, at 9:03 AM, Sam Mandes  wrote:
>>
>> Hello All,
>>
>> Is using multiple cursors simultaneously on the same C* connection a good
>> practice?
>>
>> I've an internal api for a project running thrift, I then need to query
>> something from C*. I do not like to create a new connection for every api
>> request. Thus, when my service initially starts I open a connection to C*
>> and with every request I create a new cursor.
>>
>> Thanks a lot
>>
>>
>>
>


Re: Multiple cursors

2013-05-18 Thread Vitalii Tymchyshyn
We were successfully using a sync thrift client. With it we could send
multiple requests through the single connection and wait for answers.
17 трав. 2013 02:51, "aaron morton"  напис.

> We don't have cursors in the RDBMS sense of things.
>
> If you are using thrift the recommendation is to use connection pooling
> and re-use connections for different requests. Note that you can not
> multiplex queries over the same thrift connection, you must wait for the
> response before issuing another request. The native binary transport allows
> multiplexing though.
>
> In general you should use one of the pre build client libraries as they
> will take care of connection pooling etc for you
> https://wiki.apache.org/cassandra/ClientOptions
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/05/2013, at 9:03 AM, Sam Mandes  wrote:
>
> Hello All,
>
> Is using multiple cursors simultaneously on the same C* connection a good
> practice?
>
> I've an internal api for a project running thrift, I then need to query
> something from C*. I do not like to create a new connection for every api
> request. Thus, when my service initially starts I open a connection to C*
> and with every request I create a new cursor.
>
> Thanks a lot
>
>
>


Re: Any plans for read-before-write update operations in CQL3?

2013-04-04 Thread Vitalii Tymchyshyn
Well, a schema've just came to my mind, that looks interesting, so I want
to share:
1) Actions are introduced. Each action receives unique I'd at coordinator
node. Client can ask for a block of ids beforehand, to make actions
idempotent.
2) Actions are applied to given row+column value. It's possible that
special column family type should be created that support actions.
3) Actions are stored for grace period to ensure repair will be working
well.
4) Along with all the actions for grace period, old value, current value
and old value hash is stored.
5) Old value is the value without currently stored actions, current value
has all currently stored actions applied
6) Old value hash has number of actions applied, time of last action
applied and hash of all the applied actions ids  (only actions applied to
old value of course).
7) Current value is updated on read. So there can be actions that are not
applied yet. So on read, if there are unapplied actions, they are applied
and information about current value/applied actions is updated.
8) Actions can rely on order or not rely on order. If actions rely on order
and during update it is needed to apply out of order action, value is
recalculated, starting from old value.
9) During repair, highest (based on number of actions applied, then lowest
by time) old value is selected. Then all actions older or of the same time
of old value are dropped as already applied. Newer are merged into union
set.
10) During compaction, old value is moved to the now-grace period time.
The schema looks solid. Minus is that all the values for grace period must
be stored. May be it should be combined with some auto confirmation
mechanism when coordinator, after receiving acks for all the writes does
the second round notifying that action is fully written. This should work
for hinted handoff too. Than, old value can be propagated to the last acked
action.

4 квіт. 2013 04:59, "aaron morton"  напис.
>
> I would guess not.
>
>> I know this goes against keeping updates idempotent,
>
> There are also issues with consistency. i.e. is the read local or does it
happen at the CL level ?
> And it makes things go slower.
>
>>  We currently do things like this in client code, but it would be great
to be able to this on the server side to minimize the chance of race
conditions.
>
> Sometimes you can write the plus one into a new column and then apply the
changes in the reading client thread.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/04/2013, at 12:48 AM, Drew Kutcharian  wrote:
>
>> Hi Guys,
>>
>> Are there any short/long term plans to support UPDATE operations that
require read-before-write, such as increment on a numeric non-counter
column?
>> i.e.
>>
>> UPDATE CF SET NON_COUNTER_NUMERIC_COLUMN = NON_COUNTER_NUMERIC_COLUMN +
1;
>>
>> UPDATE CF SET STRING_COLUMN = STRING_COLUMN + "postfix";
>>
>> etc.
>>
>> I know this goes against keeping updates idempotent, but there are times
you need to do these kinds of operations. We currently do things like this
in client code, but it would be great to be able to this on the server side
to minimize the chance of race conditions.
>>
>> -- Drew
>
>


Re: Consistency level for system_auth keyspace

2013-03-07 Thread Vitalii Tymchyshyn
Why not WRITE.ALL READ.ONE? I don't think permissions are updated often and
READ.ONE provides maximum availability.


2013/3/4 aaron morton 

> In this case, it means that if there is a network split between the 2
> datacenters, it is impossible to get the quorum, and all connections will
> be rejected.
>
> Yes.
>
> Is there a reason why Cassandra uses the Quorum consistency level ?
>
> I would guess to ensure there is a single, cluster wide, set of
> permissions.
>
> Using LOCAL or one could result in some requests that are rejected being
> allowed on other nodes.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/03/2013, at 6:40 AM, Jean-Armel Luce  wrote:
>
> Hi,
>
>
> I am using Cassandra 1.2.2.
> There are 16 nodes in my cluster in 2 datacenters (8 nodes in each
> datacenter).
> I am using NetworkTopologyStrategy.
>
> For information, I set a RF = 6 (3 replicas in each datacenter)
>
> With 1.2.2, I am using the new authentication backend
> PasswordAuthenticator with the authorizer CassandraAuthorizer.
>
> In the documentation (
> http://www.datastax.com/docs/1.2/security/security_keyspace_replication#security-keyspace-replication),
> it is written that for all system_auth-related queries, Cassandra uses the
> QUORUM consistency level.
>
> In this case, it means that if there is a network split between the 2
> datacenters, it is impossible to get the quorum, and all connections will
> be rejected.
>
> Is there a reason why Cassandra uses the Quorum consistency level ?
> Maybe a local_quorum conssitency level (or a one consistency level) could
> do the job ?
>
> Regards
> Jean Armel
>
>
>


-- 
Best regards,
 Vitalii Tymchyshyn


Re: How many BATCH inserts in to many?

2013-01-14 Thread Vitalii Tymchyshyn
Well, for me it was better to use async operations then batches. So, you
are not bitten by latency, but can control everything per-operation. You
will need to support a kind of "window" thought. But this windows can be
quite low, like 10-20 ops.


2013/1/14 Wei Zhu 

> Another potential issue is when some failure happens to some of the
> mutations. Is atomic batches in 1.2 designed to resolve this?
>
> http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
>
> -Wei
>
> - Original Message -
> From: "aaron morton" 
> To: user@cassandra.apache.org
> Sent: Sunday, January 13, 2013 7:57:56 PM
> Subject: Re: How many BATCH inserts in to many?
>
> With regard to a large number of records in a batch mutation there are
> some potential issues.
>
>
> Each row becomes a task in the write thread pool on each replica. If a
> single client sends 1,000 rows in a mutation it will take time for the
> (default) 32 threads in the write pool to work through the mutations. While
> they are doing this other clients / requests will appear to be starved /
> stalled.
>
>
> There are also issues with the max message size in thrift and cql over
> thrift.
>
>
> IMHO as a rule of thumb dont go over a few hundred if you have a high
> number of concurrent writers.
>
>
> Cheers
>
>
>
>
>
>
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
>
> @aaronmorton
> http://www.thelastpickle.com
>
>
> On 14/01/2013, at 12:56 AM, Radim Kolar < h...@filez.com > wrote:
>
>
> do not use cassandra for implementing queueing system with high
> throughput. It does not scale because of tombstone management. Use hornetQ,
> its amazingly fast broker but it has quite slow persistence if you want to
> create queues significantly larger then your memory and use selectors for
> searching for specific messages in them.
>
> My point is for implementing queue message broker is what you want.
>
>
>


-- 
Best regards,
 Vitalii Tymchyshyn


Re: Failing operations & repair

2012-06-14 Thread Vitalii Tymchyshyn

Hello.

For sure. Here they are: 
http://www.slideshare.net/vittim1/practical-cassandra

Slides are in english.
I've presented this presentation some time ago at JEEConf and once more 
yesterday in local developers club.
There should be video recording (russian) available somewhen, but it's 
not here yet.


Best regards, Vitalii Tymchyshyn

13.06.12 02:27, crypto five ???(??):
It would be really great to look at your slides. Do you have any plans 
to share your presentation?


On Sat, Jun 9, 2012 at 1:14 AM, ???  <mailto:tiv...@gmail.com>> wrote:


Thanks a lot. I was not sure if coordinator somehow tries to
"roll-back" transactions that failed to reach it's consistency level.
(Yet I could not imagine a method to do this, without 2-phase
commit :) )


2012/6/8 aaron morton mailto:aa...@thelastpickle.com>>


I am making some cassandra presentations in Kyiv and would
like to check that I am telling people truth :)

Thanks for spreading the word :)


1) Failed (from client-side view) operation may still be
applied to cluster

Yes.
If you fail with UnavailableException it's because from the
coordinators view of the cluster there is less than CL nodes
available. So retry. Somewhat similar story with
TimedOutException.


2) Coordinator does not try anything to "roll-back" operation
that failed because it was processed by less then consitency
level number of nodes.

Correct.


3) Hinted handoff works only for successfull operations.

HH will be stored if the coordinator proceeds with the request.
In 1.X HH is stored on the coordinator if a replica is down
when the request starts and if the node does not reply in
rpc_timeout.


4) Counters are not reliable because of (1)

If you get a TimedOutException when writing a counter you
should not re-send the request.


5) Read-repair may help to propagate operation that was
failed it's consistency level, but was persisted to some nodes.

Yes. It works in the background, by default is only enabled on
10% of requests.
Note that RR is not the same as the Consistent Level for read.
If you work as a CL > ONE the results from CL nodes are always
compared and differences resolved. RR is concerned with the
replicas not involved in the CL read.


6) Manual repair is still needed because of (2) and (3)

Manual repair is *the* was to achieve consistency of data on
disk. HH and RR are optimisations designed to reduce the
chance of a Digest Mismatch during a read with CL > ONE.
It is also essential for distributing Tombstones before they
are purged by compaction.

P.S. If some points apply only to some cassandra versions, I
will be happy to know this too.

Assume everyone for version 1.X

Thanks

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/06/2012, at 1:20 AM, ???  wrote:


Hello.

I am making some cassandra presentations in Kyiv and would
like to check that I am telling people truth :)
Could community tell me if next points are true:
1) Failed (from client-side view) operation may still be
applied to cluster
2) Coordinator does not try anything to "roll-back" operation
that failed because it was processed by less then consitency
level number of nodes.
3) Hinted handoff works only for successfull operations.
4) Counters are not reliable because of (1)
5) Read-repair may help to propagate operation that was
failed it's consistency level, but was persisted to some nodes.
6) Manual repair is still needed because of (2) and (3)

P.S. If some points apply only to some cassandra versions, I
    will be happy to know this too.
-- 
    Best regards,

     Vitalii Tymchyshyn





-- 
Best regards,

 Vitalii Tymchyshyn






Re: Cassandra dying when gets many deletes

2012-04-24 Thread Vitalii Tymchyshyn

Hello.

For me " there are no dirty column families" in your message tells it's 
possibly the same problem.
The issue is that column families that gets full row deletes only do not 
get ANY SINGLE dirty byte accounted and so can't be picked by flusher. 
Any ratio can't help simply because it is multiplied by 0. Check your 
cfstats.


24.04.12 09:54, crypto five написав(ла):

Thank you Vitalii.

Looking at the Jonathan's answer to your patch I think it's probably 
not my case. I see that LiveRatio is calculated in my case, but 
calculations look strange:


WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181) 
setting live ratio to maximum of 64 instead of Infinity
 INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186) 
CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0 
(just-counted was 64.0).  calculation took 63355ms for 0 columns


Looking at the comments in the code: "If it gets higher than 64 
something is probably broken.", looks like it's probably the problem.

Not sure how to investigate it.

2012/4/23 Віталій Тимчишин mailto:tiv...@gmail.com>>

See https://issues.apache.org/jira/browse/CASSANDRA-3741
I did post a fix there that helped me.


2012/4/24 crypto five mailto:cryptof...@gmail.com>>

Hi,

I have 50 millions of rows in column family on 4G RAM box. I
allocatedf 2GB to cassandra.
I have program which is traversing this CF and cleaning some
data there, it generates about 20k delete statements per second.
After about of 3 millions deletions cassandra stops responding
to queries: it doesn't react to CLI, nodetool etc.
I see in the logs that it tries to free some memory but can't
even if I wait whole day.
Also I see following in  the logs:

INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333
StorageService.java (line 2647) Unable to reduce heap usage
since there are no dirty column families

When I am looking at memory dump I see that memory goes to
ConcurrentSkipListMap(10%), HeapByteBuffer(13%),
DecoratedKey(6%), int[](6%), BigInteger(8.2%),
ConcurrentSkipListMap$HeadIndex(7.2%), ColumnFamily(6.5%),
ThreadSafeSortedColumns(13.7%), long[](5.9%).

What can I do to make cassandra stop dying?
Why it can't free the memory?
Any ideas?

Thank you.




-- 
Best regards,

 Vitalii Tymchyshyn






Re: Write performance compared to Postgresql

2012-04-03 Thread Vitalii Tymchyshyn
Note that having tons of TCP connections is not good. We are using async 
client to issue multiple calls over single connection at same time. You 
can do the same.


Best regards, Vitalii Tymchyshyn.

03.04.12 16:18, Jeff Williams написав(ла):

Ok, so you think the write speed is limited by the client and protocol, rather 
than the cassandra backend? This sounds reasonable, and fits with our use case, 
as we will have several servers writing. However, a bit harder to test!

Jeff

On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:


Hi Jeff,

Writing serially over one connection will be slower. If you run many threads 
hitting the server at once you will see throughput improve.

Jake



On Apr 3, 2012, at 7:08 AM, Jeff Williams  wrote:


Hi,

I am looking at cassandra for a logging application. We currently log to a 
Postgresql database.

I set up 2 cassandra servers for testing. I did a benchmark where I had 100 
hashes representing logs entries, read from a json file. I then looped over 
these to do 10,000 log inserts. I repeated the same writing to a postgresql 
instance on one of the cassandra servers. The script is attached. The cassandra 
writes appear to perform a lot worse. Is this expected?

jeff@transcoder01:~$ ruby cassandra-bm.rb
cassandra
3.17   0.48   3.65 ( 12.032212)
jeff@transcoder01:~$ ruby cassandra-bm.rb
postgres
2.14   0.33   2.47 (  7.002601)

Regards,
Jeff






Re: Max # of CFs

2012-03-21 Thread Vitalii Tymchyshyn
There is a forced flusher that kicks in when your heap becomes full. 
Look for log lines from GCInspector.
There is a bug that prevents flushing memtable when it has only full key 
delete mutations, see https://issues.apache.org/jira/browse/CASSANDRA-3741
For me it happened when we've started to move to new schema, so that old 
column families started to receive delete only operations. An 
indications is when GCInspector can't flush anything but system keyspace.


21.03.12 17:29, A J написав(ла):

I have increased index_interval. Will let you know if I see a difference.


My theory is that memtables are not getting flushed. If I manually
flush them, the heap consumption goes down drastically.

I think when memtable_total_space_in_mb is exceeded not enough
memtables are getting flushed. There are 5000 memtables (one for each
CF) but each memtable in itself is small. So flushing of one or two
memtable by Cassandra is not helping.

Question: How many memtables are flushed when
memtable_total_space_in_mb is exceeded ? Any way to flush all
memtables when the threshold is reached ?

Thanks.

On Wed, Mar 21, 2012 at 8:56 AM, Vitalii Tymchyshyn  wrote:

Hello.

There is also a primary row index. It's space can be controlled with
index_interval setting. Don't know if you can look for it's memory usage
somewhere. If I where you, I'd take jmap tool and examine heap histogram
first, heap dump second.

Best regards, Vitalii Tymchyshyn

20.03.12 18:12, A J написав(ла):


I have both row cache and column cache disabled for all my CFs.

cfstats says "Bloom Filter Space Used: 1760" per CF. Assuming it is in
bytes, it is total of about 9MB of bloom filter size for 5K CFs; which
is not a lot.


On Tue, Mar 20, 2012 at 11:09 AM, Vitalii Tymchyshyn
  wrote:

Hello.

  From my experience it's unwise to make many column families for same
keys
because you will have bloom filters and row indexes multiplied. If you
have
5000, you should expect your heap requirements multiplied by same factor.
Also check your cache sizes. Default AFAIR is 10 keys per column
family.

20.03.12 16:05, A J написав(ла):


ok, the last thread says that 1.0+ onwards, thousands of CFs should
not be a problem.

But I am finding that all the allocated heap memory is getting consumed.
I started with 8GB heap and then on reading


http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
realized that minimum of 1MB per memtable is used by the per-memtable
arena allocator.
So with 5K CFs, 5GB will be used just by arena allocators.

But even on increasing the heap to 16GB, am finding that all the heap
is getting consumed. Is there a different formula for heap calculation
when you have thousands of CFs ?
Any other configuration that I need to change ?

Thanks.

On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZ
  wrote:

This subject was already discussed, this may help you :


http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results

If you still got questions after reading this thread or some others
about
the same topic, do not hesitate asking again,

Alain


2012/3/19 A J

How many Column Families are one too many for Cassandra ?
I created a db with 5000 CFs (I can go into the reasons later) but the
latency seems to be very erratic now. Not sure if it is because of the
number of CFs.

Thanks.






Re: high level of MemtablePostFlusher pending events

2012-03-21 Thread Vitalii Tymchyshyn
To note: I still have the problem in before beta 1.1 custom build that 
seems to have the fix. I am going to upgrade to 1.1 beta and check if 
problem will go away and will file a bug if problem still exists.
BTW: It would be great for cassandra to exit on any fatal errors, like 
assertion problems or OOMs.


14.03.12 09:55, aaron morton ???(??):

Fixed in 1.0.3
https://issues.apache.org/jira/browse/CASSANDRA-3482

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/03/2012, at 3:45 PM, David Hawthorne wrote:

5 node cluster running 1.0.2, doing about 1300 reads and 1300 
writes/sec into 3 column families in the same keyspace.  2 client 
machines, doing about the same amount of reads/writes, but one has an 
average response time in the 4-40ms range and the other in the 
200-800ms range.  Both running identical software, homebrew with 
hector-1.0-3 client.


Traffic was peaking out at 6k reads and 6k writes/sec, according to 
reporting from our software, and now it's topping out at 1300/sec 
each.  The cpus on the cassy boxes are bored.  None of the threads 
within cassandra are chewing more than 3% cpu.  Disk is only 10% full 
on the most loaded box.


MemtablePostFlusher   1   102 36

Not all servers have the same number of pending tasks.  They have 0, 
1, 17, 37, and 105.


It looks like it's stuck and not recovering, cuz it's been like this 
for an hour.  I've attached the end of the cassandra.log from the 
server with the most pending tasks.  There are some interesting 
exceptions in there.


As always, all help is always appreciated!  :p









Re: Max # of CFs

2012-03-21 Thread Vitalii Tymchyshyn

Hello.

There is also a primary row index. It's space can be controlled with 
index_interval setting. Don't know if you can look for it's memory usage 
somewhere. If I where you, I'd take jmap tool and examine heap histogram 
first, heap dump second.


Best regards, Vitalii Tymchyshyn

20.03.12 18:12, A J написав(ла):

I have both row cache and column cache disabled for all my CFs.

cfstats says "Bloom Filter Space Used: 1760" per CF. Assuming it is in
bytes, it is total of about 9MB of bloom filter size for 5K CFs; which
is not a lot.


On Tue, Mar 20, 2012 at 11:09 AM, Vitalii Tymchyshyn  wrote:

Hello.

 From my experience it's unwise to make many column families for same keys
because you will have bloom filters and row indexes multiplied. If you have
5000, you should expect your heap requirements multiplied by same factor.
Also check your cache sizes. Default AFAIR is 10 keys per column family.

20.03.12 16:05, A J написав(ла):


ok, the last thread says that 1.0+ onwards, thousands of CFs should
not be a problem.

But I am finding that all the allocated heap memory is getting consumed.
I started with 8GB heap and then on reading

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
realized that minimum of 1MB per memtable is used by the per-memtable
arena allocator.
So with 5K CFs, 5GB will be used just by arena allocators.

But even on increasing the heap to 16GB, am finding that all the heap
is getting consumed. Is there a different formula for heap calculation
when you have thousands of CFs ?
Any other configuration that I need to change ?

Thanks.

On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZ
  wrote:

This subject was already discussed, this may help you :

http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results

If you still got questions after reading this thread or some others about
the same topic, do not hesitate asking again,

Alain


2012/3/19 A J

How many Column Families are one too many for Cassandra ?
I created a db with 5000 CFs (I can go into the reasons later) but the
latency seems to be very erratic now. Not sure if it is because of the
number of CFs.

Thanks.






Re: Max # of CFs

2012-03-20 Thread Vitalii Tymchyshyn

Hello.

From my experience it's unwise to make many column families for same 
keys because you will have bloom filters and row indexes multiplied. If 
you have 5000, you should expect your heap requirements multiplied by 
same factor. Also check your cache sizes. Default AFAIR is 10 keys 
per column family.


20.03.12 16:05, A J написав(ла):

ok, the last thread says that 1.0+ onwards, thousands of CFs should
not be a problem.

But I am finding that all the allocated heap memory is getting consumed.
I started with 8GB heap and then on reading
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
realized that minimum of 1MB per memtable is used by the per-memtable
arena allocator.
So with 5K CFs, 5GB will be used just by arena allocators.

But even on increasing the heap to 16GB, am finding that all the heap
is getting consumed. Is there a different formula for heap calculation
when you have thousands of CFs ?
Any other configuration that I need to change ?

Thanks.

On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZ  wrote:

This subject was already discussed, this may help you :
http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results

If you still got questions after reading this thread or some others about
the same topic, do not hesitate asking again,

Alain


2012/3/19 A J

How many Column Families are one too many for Cassandra ?
I created a db with 5000 CFs (I can go into the reasons later) but the
latency seems to be very erratic now. Not sure if it is because of the
number of CFs.

Thanks.






Re: Server crashed due to "OutOfMemoryError: Java heap space"

2012-02-28 Thread Vitalii Tymchyshyn

Hello.

Any messages about GC earlier in the logs? Cassandra server monitors 
memory and starts complaining in advance if memory gets full.
Any chance you've got a full key delete-only scenario for some column 
families? Cassandra has a bug not being able to flush such memtables. 
I've filled a bug with patch on the issue.


24.02.12 23:14, Feng Qu ???(??):

Hello,

We have a 6-node ring running 0.8.6 on RHEL 6.1. The first node also 
runs OpsCenter community. This node has crashed few time recently with 
"OutOfMemoryError: Java heap space" while several compactions on few 
200-300 GB SSTables were running. We are using 8GB Java heap on host 
with 96GB RAM.


I would appreciate for help to figure out the root cause and solution.

Feng Qu





Re: 1.0.6 - High CPU troubleshooting

2012-01-26 Thread Vitalii Tymchyshyn
That's once in few days, so I don't think it's too important. Especially 
since 0.77 is much better than 0.99 I've seen sometimes :)


26.01.12 02:49, aaron morton написав(ла):

You are running into GC issues.

WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java 
(line 146) Heap is 0.7767292149986439 full.  You may need to reduce 
memtable and/or cache sizes.  Cassandra will now flush up to the two 
largest memtables to free up memory.  Adjust 
flush_largest_memtables_at threshold in cassandra.yaml if you don't 
want Cassandra to do this automatically


Can you reduce the size of the caches ?

As you are under low load, does it correlate with compaction or repair 
processes ? Check node tool compactioninfo


Do you have wide rows ? Checlk the max row size with nodetool cfstats.

Also, if you have made any changes to the default memory and gc 
settings try reverting them.







Re: 1.0.6 - High CPU troubleshooting

2012-01-25 Thread Vitalii Tymchyshyn
According to the log, I don't see much time spent for GC. You can still 
check it with jstat or uncomment GC logging in cassandra-env.sh. Are you 
sure you've identified the thread correctly?
It's still possible that you have memory spike where GCInspector simply 
has no chance to run between Full GC rounds. Checking with jstat or 
adding GC logging may help to diagnose.


25.01.12 17:24, Matthew Trinneer написав(ла):

Here is a snippet of what I'm getting out of system.log for GC.  Anything is 
there provide a clue?

  WARN [ScheduledTasks:1] 2012-01-22 12:53:42,804 GCInspector.java (line 146) 
Heap is 0.7767292149986439 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
  INFO [ScheduledTasks:1] 2012-01-22 12:54:57,685 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 240 ms for 1 collections, 111478936 used; max is 
8547991552
  INFO [ScheduledTasks:1] 2012-01-22 15:12:21,710 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 1141 ms for 1 collections, 167667688 used; max is 
8547991552
  INFO [ScheduledTasks:1] 2012-01-23 14:20:32,862 GCInspector.java (line 123) 
GC for ParNew: 205 ms for 1 collections, 2894546328 used; max is 8547991552
  INFO [ScheduledTasks:1] 2012-01-23 20:25:06,541 GCInspector.java (line 123) 
GC for ParNew: 240 ms for 1 collections, 4602331064 used; max is 8547991552
  INFO [ScheduledTasks:1] 2012-01-24 13:24:57,473 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 27869 ms for 1 collections, 6376733632 used; max is 
8547991552
  INFO [ScheduledTasks:1] 2012-01-24 13:25:24,879 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 26306 ms for 1 collections, 6392079368 used; max is 
8547991552
  INFO [ScheduledTasks:1] 2012-01-24 13:27:12,991 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 238 ms for 1 collections, 131710776 used; max is 
8547991552
  INFO [ScheduledTasks:1] 2012-01-24 13:55:48,326 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 609 ms for 1 collections, 50380160 used; max is 
8547991552
  INFO [ScheduledTasks:1] 2012-01-24 14:34:41,392 GCInspector.java (line 123) 
GC for ParNew: 325 ms for 1 collections, 1340375240 used; max is 8547991552
  INFO [ScheduledTasks:1] 2012-01-24 20:55:19,636 GCInspector.java (line 123) 
GC for ParNew: 233 ms for 1 collections, 6387236992 used; max is 8547991552
  INFO [ScheduledTasks:1] 2012-01-25 14:43:28,921 GCInspector.java (line 123) 
GC for ParNew: 337 ms for 1 collections, 7031219304 used; max is 8547991552
  INFO [ScheduledTasks:1] 2012-01-25 14:43:51,043 GCInspector.java (line 123) 
GC for ParNew: 211 ms for 1 collections, 7025723712 used; max is 8547991552
  INFO [ScheduledTasks:1] 2012-01-25 14:50:00,012 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 51534 ms for 2 collections, 6844998736 used; max is 
8547991552
  INFO [ScheduledTasks:1] 2012-01-25 14:51:22,249 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 250 ms for 1 collections, 154848440 used; max is 
8547991552
  INFO [ScheduledTasks:1] 2012-01-25 14:57:46,519 GCInspector.java (line 123) 
GC for ParNew: 244 ms for 1 collections, 190838344 used; max is 8547991552
  INFO [ScheduledTasks:1] 2012-01-25 15:00:21,693 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 389 ms for 1 collections, 28748448 used; max is 
8547991552



On 2012-01-25, at 10:09 AM, Vitalii Tymchyshyn wrote:


Hello.

What's in the logs? It should output something like "Hey, you've got most of your memory used. 
I am going to flush some of memtables". Sorry, I don't remember exact spelling, but it's gong 
from GC, so it should be greppable by "GC".

25.01.12 16:26, Matthew Trinneer написав(ла):

Hello Community,

Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 
node cluster.  This takes place when there is minimal load on the servers, and 
continues indefinitely until I stop and restart a node.  All nodes (3) seem to 
be effected by the same issue, however it doesn't occur simultaneous on them 
(although all potentially could be effected at the same time).

Here's what I've been doing to troubleshoot.

* vmstat - have run on all servers and there is *no* swap going on

* iostat - have run on all servers and there is no indication that the disk is 
in anyway a bottleneck

* nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).

* Have tracked it down to a specific thread (top -H).  When I find that 
thread's hex equivalent in a jstack dump I see the following

"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x01d48800 
nid=0x4534 runnable


Which makes me think that perhaps it's something to do with GC configuration.   
Unfortunately I'm not able to determin

Re: 1.0.6 - High CPU troubleshooting

2012-01-25 Thread Vitalii Tymchyshyn

Hello.

What's in the logs? It should output something like "Hey, you've got 
most of your memory used. I am going to flush some of memtables". Sorry, 
I don't remember exact spelling, but it's gong from GC, so it should be 
greppable by "GC".


25.01.12 16:26, Matthew Trinneer написав(ла):

Hello Community,

Am troubleshooting an issue with sudden and sustained high CPU on nodes in a 3 
node cluster.  This takes place when there is minimal load on the servers, and 
continues indefinitely until I stop and restart a node.  All nodes (3) seem to 
be effected by the same issue, however it doesn't occur simultaneous on them 
(although all potentially could be effected at the same time).

Here's what I've been doing to troubleshoot.

* vmstat - have run on all servers and there is *no* swap going on

* iostat - have run on all servers and there is no indication that the disk is 
in anyway a bottleneck

* nodetool - nothing is backed up.  Not using all available heap (about 2/3rds).

* Have tracked it down to a specific thread (top -H).  When I find that 
thread's hex equivalent in a jstack dump I see the following

"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x01d48800 
nid=0x4534 runnable


Which makes me think that perhaps it's something to do with GC configuration.   
Unfortunately I'm not able to determine why this might be happening.  Any 
suggestions on how to continue troubleshooting?

btw - here's my jvm (just in case)

java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)


Thanks!

Matt





Re: Should I throttle deletes?

2012-01-06 Thread Vitalii Tymchyshyn
Do you mean on writes? Yes, your timeouts must be so that your write 
batch could complete until timeout elapsed. But this will lower write 
load, so reads should not timeout.


Best regards, Vitalii Tymchyshym

06.01.12 17:37, Philippe написав(ла):


But you will then get timeouts.

Le 6 janv. 2012 15:17, "Vitalii Tymchyshyn" <mailto:tiv...@gmail.com>> a écrit :


05.01.12 22:29, Philippe написав(ла):


Then I do have a question, what do people generally use as
the batch size?

I used to do batches from 500 to 2000 like you do.
After investigating issues such as the one you've encountered
I've moved to batches of 20 for writes and 256 for reads.
Everything is a lot smoother : no more timeouts.


I'd better reduce mutation thread pool with concurrent_writes
setting. This will lower server load no matter, how many clients
are sending batches, at the same time you still have good batching.

    Best regards, Vitalii Tymchyshyn





Re: is it bad to have lots of column families?

2012-01-06 Thread Vitalii Tymchyshyn
Yes, as far as I know. Note, that it's not full index, but "sampling" 
one. See index_interval configuration parameter and it's description.
As of bloom filters, it's not configurable now, yet there is a ticket 
with a patch that should make it configurable.


05.01.12 22:45, Carlo Pires написав(ла):

Does index for CFs must fit in node's memory?

2012/1/5 Віталій Тимчишин mailto:tiv...@gmail.com>>



2012/1/5 Michael Cetrulo mailto:mail2sa...@gmail.com>>

in a traditional database it's not a good a idea to have
hundreds of tables but is it also bad to have hundreds of
column families in cassandra? thank you.


As far as I can see, this may raise memory requirements for you,
since you need to have index/bloom filter for each column family
in memory.


--
Best regards,
 Vitalii Tymchyshyn


Re: Should I throttle deletes?

2012-01-06 Thread Vitalii Tymchyshyn

05.01.12 22:29, Philippe ???(??):


Then I do have a question, what do people generally use as the
batch size?

I used to do batches from 500 to 2000 like you do.
After investigating issues such as the one you've encountered I've 
moved to batches of 20 for writes and 256 for reads. Everything is a 
lot smoother : no more timeouts.


I'd better reduce mutation thread pool with concurrent_writes setting. 
This will lower server load no matter, how many clients are sending 
batches, at the same time you still have good batching.


Best regards, Vitalii Tymchyshyn


Re: Cassandra OOM

2012-01-04 Thread Vitalii Tymchyshyn

04.01.12 14:25, Radim Kolar написав(ла):

> So, what are cassandra memory requirement? Is it 1% or 2% of disk data?
It depends on number of rows you have. if you have lot of rows then 
primary memory eaters are index sampling data and bloom filters. I use 
index sampling 512 and bloom filters set to 4% to cut down memory needed.
I've raised index sampling and bloom filter setting seems not to be on 
trunk yet. For me memtables is what's eating heap :(


Best regards, Vitalii Tymchyshyn.


Re: Cassandra OOM

2012-01-04 Thread Vitalii Tymchyshyn

Hello.

BTW: It would be great for cassandra to shutdown on Errors like OOM 
because now I am not sure if the problem described in previous email is 
the root cause or some of OOM error found in log made some "writer" stop.


I am now looking at different OOMs in my cluster. Currently each node 
has up to 300G of data in ~10 column families. Previous Heap Size of 3G 
seems to be not enough, I am raising to to 5G. Looking at heap dumps, a 
lot of memory is taken by memtables, much more than 1/3 of heap. At the 
same time, logs say that it has nothing to flush since there are not 
dirty memtables. So, what are cassandra memory requirement? Is it 1% or 
2% of disk data? Or may be I am doing something wrong?


Best regards, Vitalii Tymchyshyn

03.01.12 20:58, aaron morton написав(ла):
The DynamicSnitch can result in less read operations been sent to a 
node, but as long as a node is marked as UP mutations are sent to all 
replicas. Nodes will shed load when they pull messages off the queue 
that have expired past rpc_timeout, but they will not feed back flow 
control to the other nodes. Other than going down or performing slow 
enough for the dynamic snitch to route reads around them.


There are also safety valves in there to reduce the size of the 
memtables and caches in response to low memory. Perhaps that process 
could also shed messages from thread pools with a high number of 
pending messages.


**But** going OOM with 2M+ mutations in the thread pool sounds like 
the server was going down anyway. Did you look into why all the 
messages were there ?


Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/01/2012, at 11:18 PM, Віталій Тимчишин wrote:


Hello.

We are using cassandra for some time in our project. Currently we are 
on 1.1 trunk (it was accidental migration, but since it's hard to 
migrate back and it's performing nice enough we are currently on 1.1).
During New Year holidays one of the servers've produces a number of 
OOM messages in the log.
According to heap dump taken, most of the memory is taken by 
MutationStage queue (over 2millions of items).
So, I am curious now if cassandra have any flow control for messages? 
We are using Quorum for writes and it seems to me that one slow 
server may start getting more messages than it can consume. The 
writes will still succeed performed by other servers in the 
replication set.
If there is no flow control, it should eventually get OOM. Is it the 
case? Are there any plans to handle this?
BTW: A lot of memory (~half) is taken by Inet4Address objects, so 
making a cache of such objects would make this problem less possible.


--
Best regards,
 Vitalii Tymchyshyn