date:20150108

C* throws OOM error despite use of automatic paging

2015-01-08 Thread Mohammed Guller

Hi -

We have an ETL application that reads all rows from Cassandra (2.1.2), filters 
them and stores a small subset in an RDBMS. Our application is using Datastax's 
Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver 
supports automatic paging, I was under the impression that SELECT queries 
should not cause an OOM error on the C* nodes. However, even with just 16GB 
data on each nodes, the C* nodes start throwing OOM error as soon as the 
application starts iterating through the rows of a table.

The application code looks something like this:

Statement stmt = new SimpleStatement("SELECT x,y,z FROM cf").setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
  row = rs.one()
  process(row)
}

Even after we reduced the page size to 1000, the C* nodes still crash. C* is 
running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap 
size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap 
usage climbs up to 7.6GB. That does not make sense. Either automatic paging is 
not working or we are missing something.

Does anybody have insights as to what could be happening? Thanks.

Mohammed

Re: High read latency after data volume increased

2015-01-08 Thread Robert Coli

On Thu, Jan 8, 2015 at 6:38 PM, Roni Balthazar 
wrote:

> We downgraded to 2.1.1, but got the very same result. The read latency is
> still high, but we figured out that it happens only using a specific
> keyspace.
>

Note that downgrading is officially unsupported, but is probably safe
between those two versions.

Enable tracing and paste results for a high latency query.

Also, how much RAM is used for heap?

=Rob

Re: High read latency after data volume increased

2015-01-08 Thread Roni Balthazar

Hi Robert,

We downgraded to 2.1.1, but got the very same result. The read latency is
still high, but we figured out that it happens only using a specific
keyspace.
Please see the graphs below...

Trying another keyspace with 600+ reads/sec, we are getting the acceptable
~30ms read latency.

Let me know if I need to provide more information.

Thanks,

Roni Balthazar

On Thu, Jan 8, 2015 at 5:23 PM, Robert Coli  wrote:

> On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar 
> wrote:
>
>> We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.
>>
>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>
> 2.1.2 in particular is known to have significant issues. You'd be better
> off running 2.1.1 ...
>
> =Rob
>
>

Re: Rebooted cassandra node timing out all requests but recovers after a while

2015-01-08 Thread Anand Somani

I will keep an eye for that if it happens again. Times at this point are
synchronized

On Wed, Jan 7, 2015 at 10:37 PM, Duncan Sands 
wrote:

> Hi Anand,
>
>
> On 08/01/15 02:02, Anand Somani wrote:
>
>> Hi,
>>
>> We have a 3 node cluster (on VM). Eg. host1, host2, host3. One of the VM
>> rebooted (host1) and when host1 came up it would see the others as down
>> and the
>> others (host2 and host3) see it as down. So we restarted host2 and now
>> the ring
>> seems fine(everybody sees everybody as up).
>>
>> But now the clients timeout talking to host1. Have not figured out what is
>> causing it. There is nothing in the logs that indicates a problem.
>> Looking for
>> indicators/help on what debug/tracing to turn on to find out what could be
>> causing it.
>>
>> Now this happens only when a VM reboots (not otherwise), also it seems to
>> have
>> recovered itself after some hours!!( or restarts) not sure which one.
>>
>> This is 1.2.15, we are using ssl and cassandra authorizers.
>>
>
> perhaps time is not synchronized between the nodes to begin with, and
> eventually becomes synchronized.
>
> Ciao, Duncan.
>

Re: High read latency after data volume increased

2015-01-08 Thread Robert Coli

On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar 
wrote:

> We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.
>

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

2.1.2 in particular is known to have significant issues. You'd be better
off running 2.1.1 ...

=Rob

High read latency after data volume increased

2015-01-08 Thread Roni Balthazar

Hi there,

We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.

While our data volume is increasing (34 TB now), we are running into
some problems:

1) Read latency is around 1000 ms when running 600 reads/sec (DC1
CL.LOCAL_ONE). At the same time the load average is about 20-30 on all
DC1 nodes(8 cores CPU - 32 GB RAM). C* starts timing out connections.
Still in this scenario OpsCenter has some issues as well. Opscenter
resets all Graphs layout and backs to the default layout on every
refresh. It doesn't back to normal after the load decrease. I only
managed to put OpsCenter to it's normal behavior after reinstalling
it.
Just for reference, we are using SATA HDDs on all nodes and running
hdparm to check disk performance under this load, some nodes are
reporting very low read rates (under 10 MB/sec), while others above
100 MB/sec. Under low load average this rate is above 250 MB/sec.

2) Repair takes at least 4-5 days to complete. Last repair was 20 days
ago. Running repair under high loads is bringing some nodes down with
the exception: "JVMStabilityInspector.java:94 - JVM state determined
to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError:
Java heap space"

Any hints?

Regards,

Roni Balthazar

Re: incremental repairs

2015-01-08 Thread Robert Coli

On Thu, Jan 8, 2015 at 12:28 AM, Marcus Eriksson  wrote:

> But, if you are running 2.1 in production, I would recommend that you wait
> until 2.1.3 is out, https://issues.apache.org/jira/browse/CASSANDRA-8316
> fixes a bunch of issues with incremental repairs
>

There are other serious issues with 2.1.2, I +1 recommend no one run it in
production. :D

=Rob

Re: incremental repairs

2015-01-08 Thread Roland Etzenhammer


Hi Marcus,

thanks a lot for those pointers. Now further testing can begin - and 
I'll wait for 2.1.3. Right now on production repair times are really 
painful, maybe that will become better. At least I hope so :-)

Updated JMX metrics overview

2015-01-08 Thread Reik Schatz

I am adding some jmx metrics to our monitoring tool. Is there a good
overview of all existing jmx metrics and their description?
http://wiki.apache.org/cassandra/Metrics has only a few metrics.

In particular I have questions about these now:

org.apache.cassandra.db type=StorageProxy TotalHints - is this the
number of hints since the node was started or a lifetime value

org.apache.cassandra.db type=StorageProxy ReadRepairRepairedBackground
- is this the total number of background read repair requests that the
node received since
restart?

org.apache.cassandra.db type=StorageProxy ReadRepairRepairedBlocking -
same as above, I assume thats the number of read repairs that have
blocked a query (when does that even happen - aren't read repairs run
in the background)

org.apache.cassandra.metrics type=Storage name=Exceptions Count - is
this the total number of unhandled exceptions since node restart?

Re: How to bulkload into a specific data center?

2015-01-08 Thread Ryan Svihla

Just noticed you'd sent this to the dev list, this is a question for only
the user list, and please do not send questions of this type to the
developer list.

On Thu, Jan 8, 2015 at 8:33 AM, Ryan Svihla  wrote:

> The nature of replication factor is such that writes will go wherever
> there is replication. If you're wanting responses to be faster, and not
> involve the REST data center in the spark job for response I suggest using
> a cql driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the
> spark cassandra connector here
> https://github.com/datastax/spark-cassandra-connector ) . While write
> traffic will still be replicated to the REST service data center, because
> you do want those results available, you will not be waiting on the remote
> data center to respond "successful".
>
> Final point, bulk loading sends a copy per replica across the wire, so
> lets say you have RF3 in each data center that means bulk loading will send
> out 6 copies from that client at once, with normal mutations via thrift or
> cql writes between data centers go out as 1 copy, then that node will
> forward on to the other replicas. This means intra data center traffic in
> this case would be 3x more with the bulk loader than with using a
> traditional cql or thrift based client.
>
>
>
> On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang  wrote:
>
>> I set up two virtual data centers, one for analytics and one for REST
>> service. The analytics data center sits top on Hadoop cluster. I want to
>> bulk load my ETL results into the analytics data center so that the REST
>> service won't have the heavy load. I'm using CQLTableInputFormat in my
>> Spark Application, and I gave the nodes in analytics data center as
>> Intialial address.
>>
>> However, I found my jobs were connecting to the REST service data center.
>>
>> How can I specify the data center?
>>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


-- 

Thanks,
Ryan Svihla

Re: How to bulkload into a specific data center?

2015-01-08 Thread Ryan Svihla

The nature of replication factor is such that writes will go wherever there
is replication. If you're wanting responses to be faster, and not involve
the REST data center in the spark job for response I suggest using a cql
driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the spark
cassandra connector here
https://github.com/datastax/spark-cassandra-connector ) . While write
traffic will still be replicated to the REST service data center, because
you do want those results available, you will not be waiting on the remote
data center to respond "successful".

Final point, bulk loading sends a copy per replica across the wire, so lets
say you have RF3 in each data center that means bulk loading will send out
6 copies from that client at once, with normal mutations via thrift or cql
writes between data centers go out as 1 copy, then that node will forward
on to the other replicas. This means intra data center traffic in this case
would be 3x more with the bulk loader than with using a traditional cql or
thrift based client.

On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang  wrote:

> I set up two virtual data centers, one for analytics and one for REST
> service. The analytics data center sits top on Hadoop cluster. I want to
> bulk load my ETL results into the analytics data center so that the REST
> service won't have the heavy load. I'm using CQLTableInputFormat in my
> Spark Application, and I gave the nodes in analytics data center as
> Intialial address.
>
> However, I found my jobs were connecting to the REST service data center.
>
> How can I specify the data center?
>

-- 

Thanks,
Ryan Svihla

Re: Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread mck


> Are you using Leveled compaction strategy? 


And if you're using Date Tiered compaction strategy on a table that
isn't time-series data, for example deletes happen, you find it
compacting over and over.

~mck

Re: Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread Eric Stevens

Are you using Leveled compaction strategy?  If you fall behind on
compaction in leveled (and you will during bootstrap), by default Cassandra
will fall back to size tiered compaction in level 0.  This will cause
SSTables larger than sstable_size_in_mb, and those will be recompacted away
into level 1.  When level 1 gets full, those will be recompacted away into
level 2.  If level 2 gets full, into level 3 and so on.

Leveled compaction is more I/O intensive than size tiered compaction for
reasons such as this.  The same data can be compacted numerous times before
a node settles down.  The upside is that it puts a practical upper bound on
the number of sstables which can be involved in a read (not more than one
per level, not counting false positives from bloom filters).

Leveled compaction is essentially trading increased I/O at write time
(including and especially bootstrap or repair) for decreased I/O at read
time.

On Thu, Jan 8, 2015 at 6:12 AM, Robert Wille  wrote:

> After bootstrapping a node, the node repeatedly compacts the same tables
> over and over, even though my cluster is completely idle. I’ve noticed the
> same behavior after extended periods of heavy writes. I realize that during
> bootstrapping (or extended periods of heavy writes) that compaction could
> get seriously behind, but once a table has been compacted, I don’t see the
> need to recompact the table dozens of more times.
>
> Possibly related, I often see that OpsCenter reports that nodes have a
> large number of pending tasks, when Pending column of the Thread Pool Stats
> doesn’t reflect that.
>
> Robert
>
>

Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread Robert Wille

After bootstrapping a node, the node repeatedly compacts the same tables over 
and over, even though my cluster is completely idle. I’ve noticed the same 
behavior after extended periods of heavy writes. I realize that during 
bootstrapping (or extended periods of heavy writes) that compaction could get 
seriously behind, but once a table has been compacted, I don’t see the need to 
recompact the table dozens of more times.

Possibly related, I often see that OpsCenter reports that nodes have a large 
number of pending tasks, when Pending column of the Thread Pool Stats doesn’t 
reflect that.

Robert

User audit in Cassandra

2015-01-08 Thread Ajay

Hi,

Is there a way to enable user audit or trace if we have enabled
PasswordAuthenticator in cassandra.yaml and set up the users as well. I
noticed there are keyspaces system_auth and system_trace. But there is no
way to find out which user initiated which session. Is there anyway to find
out?. Also is it recommended to enable system_trace in production or to
know how many sessions started by a user?

Thanks
Ajay

Re: incremental repairs

2015-01-08 Thread Marcus Eriksson

Yes, you should reenable autocompaction

/Marcus

On Thu, Jan 8, 2015 at 10:33 AM, Roland Etzenhammer <
r.etzenham...@t-online.de> wrote:

> Hi Marcus,
>
> thanks for that quick reply. I did also look at:
>
> http://www.datastax.com/documentation/cassandra/2.1/
> cassandra/operations/ops_repair_nodes_c.html
>
> which describes the same process, it's 2.1.x, so I see that 2.1.2+ is not
> covered there. I did upgrade my testcluster to 2.1.2 and with your hint I
> take a look at sstablemetadata from a non "migrated" node and there are
> indeed "Repaired at" entries on some sstables already. So if I got this
> right, in 2.1.2+ there is nothing to do to switch to incremental repairs
> (apart from running the repairs themself).
>
> But one thing I see during testing is that there are many sstables, with
> small size:
>
> - in total there are 5521 sstables on one node
> - 115 sstables are bigger than 1MB
> - 4949 sstables are smaller than 10kB
>
> I don't know where they came from - I found one piece of information where
> this happend when cassandra was low on heap which happend to me while
> running tests (the suggested solution is to trigger compaction via JMX).
>
> Question for me: I did disable autocompaction on some nodes of our test
> cluster as the blog and docs said. Should/can I reenable autocompaction
> again with incremental repairs?
>
> Cheers,
> Roland
>
>
>
>

Re: incremental repairs

2015-01-08 Thread Roland Etzenhammer


Hi Marcus,

thanks for that quick reply. I did also look at:

http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html

which describes the same process, it's 2.1.x, so I see that 2.1.2+ is 
not covered there. I did upgrade my testcluster to 2.1.2 and with your 
hint I take a look at sstablemetadata from a non "migrated" node and 
there are indeed "Repaired at" entries on some sstables already. So if I 
got this right, in 2.1.2+ there is nothing to do to switch to 
incremental repairs (apart from running the repairs themself).


But one thing I see during testing is that there are many sstables, with 
small size:


- in total there are 5521 sstables on one node
- 115 sstables are bigger than 1MB
- 4949 sstables are smaller than 10kB

I don't know where they came from - I found one piece of information 
where this happend when cassandra was low on heap which happend to me 
while running tests (the suggested solution is to trigger compaction via 
JMX).


Question for me: I did disable autocompaction on some nodes of our test 
cluster as the blog and docs said. Should/can I reenable autocompaction 
again with incremental repairs?


Cheers,
Roland

Re: incremental repairs

2015-01-08 Thread Marcus Eriksson

If you are on 2.1.2+ (or using STCS) you don't those steps (should probably
update the blog post).

Now we keep separate levelings for the repaired/unrepaired data and move
the sstables over after the first incremental repair

But, if you are running 2.1 in production, I would recommend that you wait
until 2.1.3 is out, https://issues.apache.org/jira/browse/CASSANDRA-8316
fixes a bunch of issues with incremental repairs

-pr is sufficient, same rules apply as before, if you run -pr you need to
repair every node

/Marcus

On Thu, Jan 8, 2015 at 9:16 AM, Roland Etzenhammer <
r.etzenham...@t-online.de> wrote:

> Hi,
>
> I am currently trying to migrate my test cluster to incremental repairs.
> These are the steps I'm doing on every node:
>
> - touch marker
> - nodetool disableautocompation
> - nodetool repair
> - cassandra stop
> - find all *Data*.db files older then marker
> - invoke sstablerepairedset on those
> - cassandra start
>
> This is essentially what http://www.datastax.com/dev/
> blog/anticompaction-in-cassandra-2-1 says. After all nodes migrated this
> way, I think I need to run my regular repairs more often and they should be
> faster afterwards. But do I need to run "nodetool repair" or is "nodetool
> repair -pr" sufficient?
>
> And do I need to reenable autocompation? Oder do I need to compact myself?
>
> Thanks for any input,
> Roland
>

incremental repairs

2015-01-08 Thread Roland Etzenhammer


Hi,

I am currently trying to migrate my test cluster to incremental repairs. 
These are the steps I'm doing on every node:


- touch marker
- nodetool disableautocompation
- nodetool repair
- cassandra stop
- find all *Data*.db files older then marker
- invoke sstablerepairedset on those
- cassandra start

This is essentially what 
http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 says. 
After all nodes migrated this way, I think I need to run my regular 
repairs more often and they should be faster afterwards. But do I need 
to run "nodetool repair" or is "nodetool repair -pr" sufficient?


And do I need to reenable autocompation? Oder do I need to compact myself?

Thanks for any input,
Roland

C* throws OOM error despite use of automatic paging

Re: High read latency after data volume increased

Re: High read latency after data volume increased

Re: Rebooted cassandra node timing out all requests but recovers after a while

Re: High read latency after data volume increased

High read latency after data volume increased

Re: incremental repairs

Re: incremental repairs

Updated JMX metrics overview

Re: How to bulkload into a specific data center?

Re: How to bulkload into a specific data center?

Re: Why does C* repeatedly compact the same tables over and over?

Re: Why does C* repeatedly compact the same tables over and over?

Why does C* repeatedly compact the same tables over and over?

User audit in Cassandra

Re: incremental repairs

Re: incremental repairs

Re: incremental repairs

incremental repairs

19 matches

Site Navigation

Mail list logo

Footer information