Re: Log application Queries

2018-05-28 Thread Horia Mocioi
Hello,

Another way to do it would be to create your own QueryHandler:

  *   create a class that would implement the QueryHandler interface and make 
Cassandra aware of it
  *   in that class you can maintain a list of the queries (add to this list 
when prepare method is being called) and the current query that you will get 
from the list when getPrepared it's called; you can get it from the list using 
the MD5Digest id
  *   when processPrepared is called you can replace the ? in the query string 
with the values in the QueryOptions options.getValues().

On fre, 2018-05-25 at 11:24 -0400, Nitan Kainth wrote:
Hi,

I would like to log all C* queries hitting cluster. Could someone please tell 
me how can I do it at cluster level?
Will nodetool setlogginglevel work? If so, please share example with library 
name.

C* version 3.11


Re: cassandra concurrent read performance problem

2018-05-28 Thread Alain RODRIGUEZ
Hi,

Would you share some more context with us?

- What Cassandra version do you use?
- What is the data size per node?
- How much RAM does the hardware have?
- Does your client use paging?

A few ideas to explore:

- Try tracing the query, see what's taking time (and resources)
- From the tracing, logs, sstablemetadata tool or monitoring dashboard, do
you see any tombstone?
- What is the percentage of GC pause per second? 128 GB seems huge to me,
even with G1GC. Do you still have memory for page caching? Also from
general logs, gc logs or dashboard. Reallocating 70GB every minute does not
seem right. Maybe using a smaller size for the heap (more common) would
have more frequent but smaller pauses?
- Any pending/blocked thread (monitoring charts about thread pool or
'nodetool tpstats'. Also 'watch -d "nodetool tpstats' will make evolution
and newly pending/blocked thread obvious to you (or a cassandra restart
reset stats as well).
- What is the number of SSTable touched per read operations on the main
tables?
- Are the bloom filters efficient?
- Is key cache efficient (ratio of hit 0.8, 0.9+)
- The logs should be reporting something during the 10 minutes the machines
were unresponsive, give a try to: grep -e "WARN" -e "ERROR"
/var/log/cassandra/system.log

More than 200 MB per partitions is quite big. Explore improving what can be
operationally, but you might have to reduce the partition size ultimately.
On the other side, Cassandra tends to evolve allowing bigger partition
sizes, as it handles them with a better efficiency over time. If you can
work on the operational side, you might be able to keep this model.

If it is possible to experiment on a canary node and observe, I would
probably go this path after identifying a possible origin and solution for
this issue.

Other tips that might help here:
- Disabling 'dynamic snitching' proved to improve performances (often
clearly visible looking at p99) as there is a better usage of page caching
(disk) mostly.
- Making sure that most of your partitions fit within the read block size
(buffer) you are using can also make reads more efficient (when data is
compressed, the chunk size determines the buffer size.

I hope, this helps. I am curious about that one, please let us know what
you find out :).

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-05-26 14:21 GMT+01:00 onmstester onmstester :

> By reading 90 partitions concurrently(each having size > 200 MB), My
> single node Apache Cassandra became unresponsive,
> no read and write works for almost 10 minutes.
> I'm using this configs:
> memtable_allocation_type: offheap_buffers
> gc: G1GC
> heap: 128GB
> concurrent_reads: 128 (having more than 12 disk)
>
> There is not much pressure on my resources except for the memory that the
> eden with 70GB is filled and reallocated in less than a minute.
> Cpu is about 20% while read is crashed and iostat shows no significant
> load on disk.
>
> Sent using Zoho Mail 
>
>
>


nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-28 Thread Steinmaurer, Thomas
Hello,

on a quite capable machine with 32 physical cores (64 vCPUs) we see sporadic 
CPU usage up to 50% caused by nodetool on this box, thus digged a bit further. 
A few observations:

1) nodetool is reusing the $MAX_HEAP_SIZE environment variable, thus if we are 
running Cassandra with e.g. Xmx31G, nodetool is started with Xmx31G as well
2) As -XX:ParallelGCThreads is not explicitly set upon startup, this basically 
defaults to a value dependent on the number of cores. In our case, with the 
machine above, the number of parallel GC threads for the JVM is set to 43!
3) Test-wise, we have adapted the nodetool startup script in a way to get a 
Java Flight Recording file on JVM exit, thus with each nodetool invocation we 
can inspect a JFR file. Here we may have seen System.gc() calls (without 
visible knowledge where they come from), GC times for the entire JVM life-time 
(e.g. ~1min) showing high cpu. This happened for both Xmx128M (default as it 
seems) and Xmx31G

After explicitly setting -XX:ParallelGCThreads=1 in the nodetool startup 
script, CPU usage spikes by nodetool are entirely gone.

Is this something which has been already adapted/tackled in Cassandra versions 
> 2.1 or worth to be considered as some sort of RFC?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Certified Cassandra for Enterprise use

2018-05-28 Thread Pranay akula
Is there any third party who provides security patches/releases for Apache
cassandra

For Enterprise use is there any third party who provides certified Apache
cassandra packages ??

Thanks
Pranay


Re: Certified Cassandra for Enterprise use

2018-05-28 Thread Rane, Sanjay
datastax

From: Pranay akula 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, May 28, 2018 at 5:12 PM
To: "user@cassandra.apache.org" 
Subject: Certified Cassandra for Enterprise use

Is there any third party who provides security patches/releases for Apache 
cassandra

For Enterprise use is there any third party who provides certified Apache 
cassandra packages ??

Thanks
Pranay


Re: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-28 Thread kurt greaves
>
> 1) nodetool is reusing the $MAX_HEAP_SIZE environment variable, thus if we
> are running Cassandra with e.g. Xmx31G, nodetool is started with Xmx31G as
> well

This was fixed in 3.0.11/3.10 in CASSANDRA-12739
. Not sure why it
didn't make it into 2.1/2.2.

> 2) As -XX:ParallelGCThreads is not explicitly set upon startup, this
> basically defaults to a value dependent on the number of cores. In our
> case, with the machine above, the number of parallel GC threads for the JVM
> is set to 43!
> 3) Test-wise, we have adapted the nodetool startup script in a way to get
> a Java Flight Recording file on JVM exit, thus with each nodetool
> invocation we can inspect a JFR file. Here we may have seen System.gc()
> calls (without visible knowledge where they come from), GC times for the
> entire JVM life-time (e.g. ~1min) showing high cpu. This happened for both
> Xmx128M (default as it seems) and Xmx31G
>
> After explicitly setting -XX:ParallelGCThreads=1 in the nodetool startup
> script, CPU usage spikes by nodetool are entirely gone.
>
> Is this something which has been already adapted/tackled in Cassandra
> versions > 2.1 or worth to be considered as some sort of RFC?
>
Can you create a JIRA for this (and a patch, if you like)? We should be
explicitly setting this on nodetool invocations.
​


Re: Snapshot SSTable modified??

2018-05-28 Thread Elliott Sims
Unix timestamps are a bit odd.  "mtime/Modify" is file changes,
"ctime/Change/(sometimes called create)" is file metadata changes, and a
link count change is a metadata change.  This seems like an odd decision on
the part of GNU tar, but presumably there's a good reason for it.

When the original sstable is compacted away, it's removed and therefore the
link count on the snapshot file is decremented.  The file's contents
haven't changed so mtime is identical, but ctime does get updated.  BSDtar
doesn't seem to interpret link count changes as a file change, so it's
pretty effective as a workaround.



On Fri, May 25, 2018 at 8:00 PM, Max C  wrote:

> I looked at the source code for GNU tar, and it looks for a change in the
> create time or (more likely) a change in the size.
>
> This seems very strange to me — I would think that creating a snapshot
> would cause a flush and then once the SSTables are written, hardlinks would
> be created and the SSTables wouldn't be written to after that.
>
> Our solution is to wait 5 minutes and retry the tar if an error occurs.
> This isn't ideal - but it's the best I could come up with.  :-/
>
> Thanks Jeff & others for your responses.
>
> - Max
>
> On May 25, 2018, at 5:05pm, Elliott Sims  wrote:
>
> I've run across this problem before - it seems like GNU tar interprets
> changes in the link count as changes to the file, so if the file gets
> compacted mid-backup it freaks out even if the file contents are
> unchanged.  I worked around it by just using bsdtar instead.
>
> On Thu, May 24, 2018 at 6:08 AM, Nitan Kainth 
> wrote:
>
>> Jeff,
>>
>> Shouldn't Snapshot get consistent state of sstables? -tmp file shouldn't
>> impact backup operation right?
>>
>>
>> Regards,
>> Nitan K.
>> Cassandra and Oracle Architect/SME
>> Datastax Certified Cassandra expert
>> Oracle 10g Certified
>>
>> On Wed, May 23, 2018 at 6:26 PM, Jeff Jirsa  wrote:
>>
>>> In versions before 3.0, sstables were written with a -tmp filename and
>>> copied/moved to the final filename when complete. This changes in 3.0 - we
>>> write into the file with the final name, and have a journal/log to let uss
>>> know when it's done/final/live.
>>>
>>> Therefore, you can no longer just watch for a -Data.db file to be
>>> created and uploaded - you have to watch the log to make sure it's not
>>> being written.
>>>
>>>
>>> On Wed, May 23, 2018 at 2:18 PM, Max C.  wrote:
>>>
 Hi Everyone,

 We’ve noticed a few times in the last few weeks that when we’re doing
 backups, tar has complained with messages like this:

 tar: /var/lib/cassandra/data/mars/test_instances_by_test_id-6a944
 0a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
 file changed as we read it

 Any idea what might be causing this?

 We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of
 our backup process:

 
 SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
 nodetool snapshot -t $SNAPSHOT_NAME

 for each keyspace
 - dump schema to “schema.cql"
 - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz
 schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME

 nodetool clearsnapshot -t $SNAPSHOT_NAME

 Thanks.

 - Max
 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org


>>>
>>
>
>


Re: Using K8s to Manage Cassandra in Production

2018-05-28 Thread Hassaan Pasha
Thank you everyone.
This thread has been really useful!

On Wed, May 23, 2018 at 8:59 PM, Ben Bromhead  wrote:

> Here is the expectations around compatibility levels https://github.com/
> kubernetes/community/blob/master/contributors/design-
> proposals/api-machinery/csi-new-client-library-procedure.
> md#client-capabilities Though references to gold, silver, bronze etc seem
> to have largely gone away... not sure what's going on there?
>
> For a full reference just browser through the repo, https://github.com/
> kubernetes-client/java/blob/master/kubernetes/README.md is a good place
> to start as is https://github.com/kubernetes-client/java/tree/
> master/examples
>
> The Java driver doesn't have as much of the nice things in
> https://github.com/kubernetes/client-go/tree/master/tools but it does
> have some good helper classes in the util package so I guess we spent a
> little more time wiring things together?
>
> Code Generation is done via the jsonschema2pojo maven plugin and we also
> just have the raw CRD definition in a resource directory.
>
> On Wed, May 23, 2018 at 11:23 AM vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> Thanks ! Do you have some pointers on the available features ? I am more
>> afraid of the lack of custom controller integration, for instance the code
>> generator...
>>
>> 2018-05-23 17:17 GMT+02:00 Ben Bromhead :
>>
>>> The official Kubernetes Java driver is actually pretty feature complete,
>>> if not exactly idiomatic Java...  it's only missing full examples to get it
>>> to GOLD compatibility levels iirc.
>>>
>>> A few reasons we went down the Java path:
>>>
>>>- Cassandra community engagement was the primary concern. If you are
>>>a developer in the Cassandra community you have a base level of Java
>>>knowledge, so it means if you want to work on the Kubernetes operator you
>>>only have to learn 1 thing, Kubernetes. If the operator was in Go,
>>>you would then have two things to learn, Go and Kubernetes :)
>>>- We actually wrote an initial PoC in Go (based off the etcd
>>>operator, you can find it here https://github.com/
>>>benbromhead/cassandra-operator-old
>>> ), but
>>>because it was in Go we ended up making architectural decisions simply
>>>because Go doesn't do JMX, so it felt like we were just fighting 
>>> different
>>>ecosystems just to be part of the cool group.
>>>
>>> Some other less important points weighed the decision in Java's favour:
>>>
>>>- The folk at Instaclustr all know Java, and are productive in it
>>>from day 1. Go is fun and relatively simple, but not our forte.
>>>-  Mature package management, Generics/inability to write DRY
>>>code, a million if err statements  (:
>>>- Some other awesome operators/controllers are written in JVM based
>>>languages. The sparkKubernetes resource manager (which is a k8s 
>>> controller)
>>>is written in Scala.
>>>
>>>
>>> On Wed, May 23, 2018 at 10:04 AM vincent gromakowski <
>>> vincent.gromakow...@gmail.com> wrote:
>>>
 Why did you choose java for the operator implementation when everybody
 seems to use the go client (probably for greater functionalities) ?

 2018-05-23 15:39 GMT+02:00 Ben Bromhead :

> You can get a good way with StatefulSets, but as Tom mentioned there
> are still some issues with this, particularly around scaling up and down.
>
> We are working on an Operator for Apache Cassandra, you can find it
> here https://github.com/instaclustr/cassandra-operator. This is a
> joint project between Instaclustr, Pivotal and a few other folk.
>
> Currently it's a work in progress, but we would love any or all early
> feedback/PRs/issues etc. Our first GA release will target the following
> capabilities:
>
>- Safe scaling up and down (including decommissioning)
>- Backup/restore workflow (snapshots only initially)
>- Built in prometheus integration and discovery
>
> Other features like repair, better PV support, maybe even a nice
> dashboard will be on the way.
>
>
> On Wed, May 23, 2018 at 7:35 AM Tom Petracca 
> wrote:
>
>> Using a statefulset should get you pretty far, though will likely be
>> less effective than a coreos-style “operator”. Some random points:
>>
>>- For scale-up: a node shouldn’t report “ready” until it’s in the
>>NORMAL state; this will prevent multiple nodes from bootstrapping at 
>> once.
>>- For scale-down: as of now there isn’t a mechanism to know if a
>>pod is getting decommissioned because you’ve permanently lowered 
>> replica
>>count, or because it’s just getting bounced/re-scheduled, thus knowing
>>whether or not to decommission is basically impossible. Relevant 
>> issue:
>>kubernetes/kubernetes#1462
>>