Re: Materialised view for sets of UUID

2016-12-21 Thread Benjamin Roth
The question is what matters and how big cardinality is.

1. MV updates are atomic
2. Updates on 2 tables are not. You'd require a logged batch to ensure
atomicity and so the write performance is also a little bit lower than
without batches
3. If you have a hand full of groups per user, collections are a way to go.
If you have thousands of memberships per user, you should consider MVs.
Collections are not made to store "tons of data".
4. MVs are not so super-production-stable. They work but there are still
some issues. So if you have a good alternative, you probably want not to
use MVs


2016-12-22 7:31 GMT+01:00 Torsten Bronger :

> Hallöchen!
>
> In RDBMS terms, I have a n:m relationship between "users" and
> "groups".  I need to answer the questions "who's in that group" and
> "in which groups is he".  In my Cassandra DB, this looks like this:
>
> CREATE TABLE users (
>   id uuid PRIMARY KEY,
>   groups_member set
>   groups_admin set
>   groups_pending set
> );
>
> CREATE TABLE groups (
>   id uuid PRIMARY KEY,
>   members set
>   admins set
>   pending set
> );
>
> But someone suggested to me to express the membership relation by
> this:
>
> CREATE TABLE group_status (
>   group uuid,
>   user uuid,
>   status text,  /* "member", "admin", "pending" */
>   PRIMARY KEY ((group, user))
> );
>
> CREATE MATERIALIZED VIEW group_status_group AS
>   SELECT user, status FROM group_status
>   WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL
>   PRIMARY KEY (group, user);
>
> CREATE MATERIALIZED VIEW group_status_user AS
>   SELECT group, status FROM group_status
>   WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL
>   PRIMARY KEY (user, group);
>
> The answer to "who's in that group" is here "SELECT * FROM
> group_status_group WHERE group = ".
>
>
> Let's analyse both, and please interrrupt me if I write something
> wrong.
>
> Simplicity: Table layout is easier to understand in the first
> variant, however, code is simpler with the second variant as you
> only need one update instead of a batch with up to six sets.  The
> second variant is easier to extend to further states.
>
> Consistenty: Eventual consistency can be guaranteed in both cases.
>
> Performance: Read performance is much better for the first variant,
> because the second variant has to go through many rows to collect
> all non-deleted clustering key values.  Write performance is
> slightly better for the first variant because one table + two
> materialised views is more expensive than two tables.
>
> What would you prefer?
>
> Tschö,
> Torsten.
>
> --
> Torsten Bronger
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Materialised view for sets of UUID

2016-12-21 Thread Torsten Bronger
Hallöchen!

In RDBMS terms, I have a n:m relationship between "users" and
"groups".  I need to answer the questions "who's in that group" and
"in which groups is he".  In my Cassandra DB, this looks like this:

CREATE TABLE users (
  id uuid PRIMARY KEY,
  groups_member set
  groups_admin set
  groups_pending set
);

CREATE TABLE groups (
  id uuid PRIMARY KEY,
  members set
  admins set
  pending set
);

But someone suggested to me to express the membership relation by
this:

CREATE TABLE group_status (
  group uuid,
  user uuid,
  status text,  /* "member", "admin", "pending" */
  PRIMARY KEY ((group, user))
);

CREATE MATERIALIZED VIEW group_status_group AS
  SELECT user, status FROM group_status
  WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL
  PRIMARY KEY (group, user);

CREATE MATERIALIZED VIEW group_status_user AS
  SELECT group, status FROM group_status
  WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL
  PRIMARY KEY (user, group);

The answer to "who's in that group" is here "SELECT * FROM
group_status_group WHERE group = ".


Let's analyse both, and please interrrupt me if I write something
wrong.

Simplicity: Table layout is easier to understand in the first
variant, however, code is simpler with the second variant as you
only need one update instead of a batch with up to six sets.  The
second variant is easier to extend to further states.

Consistenty: Eventual consistency can be guaranteed in both cases.

Performance: Read performance is much better for the first variant,
because the second variant has to go through many rows to collect
all non-deleted clustering key values.  Write performance is
slightly better for the first variant because one table + two
materialised views is more expensive than two tables.

What would you prefer?

Tschö,
Torsten.

-- 
Torsten Bronger



RE: Handling Leap second delay

2016-12-21 Thread Amit Singh F
Hi ,

Attached conversation can be of some help to you.

Regards
Amit Singh

From: Sanjeev T [mailto:san...@gmail.com]
Sent: Wednesday, December 21, 2016 9:24 AM
To: user@cassandra.apache.org
Subject: Handling Leap second delay

Hi,

Can some of you share points on, the versions and handling leap second delay on 
Dec 31, 2016.

Regards
-Sanjeev

--- Begin Message ---
Based on most of what I've said previously pretty much most ways of avoiding 
your ordering issue of the leap second is going to be a "hack" and there will 
be some amount of hope involved.


If the updates occur more than 300ms apart and you are confident your nodes 
have clocks that are within 150ms of each other, then I'd close my eyes and 
hope they all leap second at the same time within that 150ms.


If they are less then 300ms (I'm guessing you meant less 300ms), then I would 
look to figure out what the smallest gap is between those two updates and make 
sure your nodes clocks are close enough in that gap that the leap second will 
occur on all nodes within that gap.

If that's not good enough, you could just halt those scenarios for 2 seconds 
over the leap second and then resume them once you've confirmed all clocks have 
skipped.


On Wed, 2 Nov 2016 at 18:13 Anuj Wadehra 
mailto:anujw_2...@yahoo.co.in>> wrote:


   Thanks Ben for taking out time for the detailed reply !!

   We dont need strict ordering for all operations but we are looking for 
scenarios where 2 quick updates to same column of same row are possible. By 
quick updates, I mean >300 ms. Configuring NTP properly (as mentioned in some 
blogs in your link) should give fair relative accuracy between the Cassandra 
nodes. But leap second takes the clock back for an ENTIRE one sec (huge) and 
the probability of old write overwriting the new one increases drastically. So, 
we want to be proactive with things.

   I agree that you should avoid such scebaruos with design (if possible).

   Good to know that you guys have setup your own NTP servers as per the 
recommendation. Curious..Do you also do some monitoring around NTP?



   Thanks
   Anuj


  On Fri, 28 Oct, 2016 at 12:25 AM, Ben Bromhead

  mailto:b...@instaclustr.com>> wrote:
  If you need guaranteed strict ordering in a distributed system, I would 
not use Cassandra, Cassandra does not provide this out of the box. I would look 
to a system that uses lamport or vector clocks. Based on your description of 
how your systems runs at the moment (and how close your updates are together), 
you have either already experienced out of order updates or there is a real 
possibility you will in the future.

  Sorry to be so dire, but if you do require causal consistency / strict 
ordering, you are not getting it at the moment. Distributed systems theory is 
really tricky, even for people that are "experts" on distributed systems over 
unreliable networks (I would certainly not put myself in that category). People 
have made a very good name for themselves by showing that the vast majority of 
distributed databases have had bugs when it comes to their various consistency 
models and the claims these databases make.

  So make sure you really do need guaranteed causal consistency/strict 
ordering or if you can design around it (e.g. using conflict free replicated 
data types) or choose a system that is designed to provide it.

  Having said that... here are some hacky things you could do in Cassandra 
to try and get this behaviour, which I in no way endorse doing :)

  * Cassandra counters do leverage a logical clock per shard and you could 
hack something together with counters and lightweight transactions, but you 
would want to do your homework on counters accuracy during before diving into 
it... as I don't know if the implementation is safe in the context of your 
question. Also this would probably require a significant rework of your 
application plus a significant performance hit. I would invite a counter guru 
to jump in here...

  * You can leverage the fact that timestamps are monotonic if you isolate 
writes to a single node for a single shared... but you then loose Cassandra's 
availability guarantees, e.g. a keyspace with an RF of 1 and a CL of > ONE will 
get monotonic timestamps (if generated on the server side).

  * Continuing down the path of isolating writes to a single node for a 
given shard you could also isolate writes to the primary replica using your 
client driver during the leap second (make it a minute either side of the 
leap), but again you lose out on availability and you are probably already 
experiencing out of ordered writes given how close your writes and updates are.



  A note on NTP: NTP is generally fine if you use it to keep the clocks 
synced between the Cassandra nodes. If you are interested in how we have 
implemented NTP at Instaclustr, see our blogpost on it 
https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/.



 

Re: Join_ring=false Use Cases

2016-12-21 Thread Anuj Wadehra
Thanks All !!
I think the intent of the JIRA https://issues.apache.org/ 
jira/browse/CASSANDRA-6961 was to primarily deal with stale information after 
outages and give opportunity for repairing the data before a node joins the 
cluster. If a node started with join_ring=false doesn't accept writes while the 
repair is happening, the purpose of JIRA is defeated as it will anyways lead to 
stale information. Seems to be a defect.

ThanksAnuj


On Wednesday, 21 December 2016 2:53 AM, kurt Greaves  
wrote:
 

 It seems that you're correct in saying that writes don't propagate to a node 
that has join_ring set to false, so I'd say this is a flaw. In reality I can't 
see many actual use cases in regards to node outages with the current 
implementation. The main usage I'd think would be to have additional 
coordinators for CPU heavy workloads.

It seems to make it actually useful for repairs/outages we'd need to have 
another option to turn on writes so that it behaved similarly to write survey 
mode (but on already bootstrapped nodes).

Is there a reason we don't have this already? Or does it exist somewhere I'm 
not aware of? 

On 20 December 2016 at 17:40, Anuj Wadehra  wrote:

No responses yet :)
Any C* expert who could help on join_ring use case and the concern raised?
Thanks
Anuj 
 
 On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra wrote:  
 Hi,
I need to understand the use case of join_ring=false in case of node outages. 
As per https://issues.apache.org/ jira/browse/CASSANDRA-6961, you would want 
join_ring=false when you have to repair a node before bringing a node back 
after some considerable outage. The problem I see with join_ring=false is that 
unlike autobootstrap, the node will NOT accept writes while you are running 
repair on it. If a node was down for 5 hours and you bring it back with 
join_ring=false, repair the node for 7 hours and then make it join the ring, it 
will STILL have missed writes because while the time repair was running (7 
hrs), writes only went to other others. So, if you want to make sure that reads 
served by the restored node at CL ONE will return consistent data after the 
node has joined, you wont get that as writes have been missed while the node is 
being repaired. And if you work with Read/Write CL=QUORUM, even if you bring 
back the node without join_ring=false, you would anyways get the desired 
consistency. So, how join_ring would provide any additional consistency in this 
case ??
I can see join_ring=false useful only when I am restoring from Snapshot or 
bootstrapping and there are dropped mutations in my cluster which are not fixed 
by hinted handoff.
For Example: 3 nodes A,B,C working at Read/Write CL QUORUM. Hinted Handoff=3 
hrs.10 AM Snapshot taken on all 3 nodes11 AM: Node B goes down for 4 hours3 PM: 
Node B comes up but data is not repaired. So, 1 hr of dropped mutations (2-3 
PM) not fixed via Hinted Handoff.5 PM: Node A crashes.6 PM: Node A restored 
from 10 AM Snapshot, Node A started with join_ring=false, repaired and then 
joined the cluster.
In above restore snapshot example, updates from 2-3 PM were outside hinted 
handoff window of 3 hours. Thus, node B wont get those updates. Node A data for 
2-3 PM is already lost. So, 2-3 PM updates are only on one replica i.e. node C 
and minimum consistency needed is QUORUM so join_ring=false would help. But 
this is very specific use case.  
ThanksAnuj
  




   

Re: Python Upgrade to 2.7

2016-12-21 Thread Stefania Alborghetti
Python is missing the zlib module.

The solution to this problem depends on whether you've compiled Python from
source, or are using a distribution package.

Googling the error "can't decompress data; zlib not available" should
provide an answer on how to solve this. If not, send us more details on
your Python installation.

On Thu, Dec 22, 2016 at 6:51 AM, Jacob Shadix  wrote:

> I am running Cassandra 2.1.14. Upgraded to Python 2.7 from 2.6.6 and
> getting the following error with CQLSH.
> ---
>
> Python Cassandra driver not installed, or not on PYTHONPATH.
>
> You might try "pip install cassandra-driver".
>
> Python: /opt/isv/python27/bin/python
>
> Error: can't decompress data; zlib not available
>
> ---
>
> What am I missing?
> -- Jacob Shadix
>



-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: Processing time series data in order

2016-12-21 Thread Ali Akhtar
The batch size can be large, so in memory ordering isn't an option,
unfortunately.

On Thu, Dec 22, 2016 at 7:09 AM, Jesse Hodges 
wrote:

> Depending on the expected max out of order window, why not order them in
> memory? Then you don't need to reread from Cassandra, in case of a problem
> you can reread data from Kafka.
>
> -Jesse
>
> > On Dec 21, 2016, at 7:24 PM, Ali Akhtar  wrote:
> >
> > - I'm receiving a batch of messages to a Kafka topic.
> >
> > Each message has a timestamp, however the messages can arrive / get
> processed out of order. I.e event 1's timestamp could've been a few seconds
> before event 2, and event 2 could still get processed before event 1.
> >
> > - I know the number of messages that are sent per batch.
> >
> > - I need to process the messages in order. The messages are basically
> providing the history of an item. I need to be able to track the history
> accurately (i.e, if an event occurred 3 times, i need to accurately log the
> dates of the first, 2nd, and 3rd time it occurred).
> >
> > The approach I'm considering is:
> >
> > - Creating a cassandra table which is ordered by the timestamp of the
> messages.
> >
> > - Once a batch of messages has arrived, writing them all to cassandra,
> counting on them being ordered by the timestamp even if they are processed
> out of order.
> >
> > - Then iterating over the messages in the cassandra table, to process
> them in order.
> >
> > However, I'm concerned about Cassandra's eventual consistency. Could it
> be that even though I wrote the messages, they are not there when I try to
> read them (which would be almost immediately after they are written)?
> >
> > Should I enforce consistency = ALL to make sure the messages will be
> available immediately after being written?
> >
> > Is there a better way to handle this thru either Kafka streams or
> Cassandra?
>


Re: Cassandra cluster performance

2016-12-21 Thread Ben Slater
Given you’re using replication factor 1 (so each piece of data is only
going to get written to one node) something definitely seems wrong. Some
questions/ideas:
- are there any errors in the Cassandra logs or are you seeing any errors
at the client?
- is your test data distributed across your partition key or is it possible
all your test data is going to a single partition?
- have you tried manually running a few inserts to see if you get any
errors?

Cheers
Ben


On Thu, 22 Dec 2016 at 11:48 Branislav Janosik -T (bjanosik - AAP3 INC at
Cisco)  wrote:

> Hi,
>
>
>
> - Consistency level is set to ONE
>
> -  Keyspace definition:
>
> *"CREATE KEYSPACE  IF NOT EXISTS  onem2m " *+
> *"WITH replication = " *+
> *"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"*;
>
>
>
> - yes, the client is on separate VM
>
> - In our project we use Cassandra API version 3.0.2 but the database 
> (cluster) is version 3.9
>
> - for 2node cluster:
>
>  first VM: 25 GB RAM, 16 CPUs
>
>  second VM: 16 GB RAM, 16 CPUs
>
>
>
>
>
>
>
> *From: *Ben Slater 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Wednesday, December 21, 2016 at 2:32 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Cassandra cluster performance
>
>
>
> You would expect some drop when moving to single multiple nodes but on the
> face of it that feels extreme to me (although I’ve never personally tested
> the difference). Some questions that might help provide an answer:
>
> - what consistency level are you using for the test?
>
> - what is your keyspace definition (replication factor most importantly)?
>
> - where are you running your test client (is it a separate box to
> cassandra)?
>
> - what C* version?
>
> - what are specs (CPU, RAM) of the test servers?
>
>
>
> Cheers
>
> Ben
>
>
>
> On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at
> Cisco)  wrote:
>
> Hi all,
>
>
>
> I’m working on a project and we have Java benchmark test for testing the
> performance when using Cassandra database. Create operation on a single
> node Cassandra cluster is about 15K operations per second. Problem we have
> is when I set up cluster with 2 or more nodes (each of them are on separate
> virtual machines and servers), the performance goes down to 1K ops/sec. I
> follow the official instructions on how to set up a multinode cluster – the
> only things I change in Cassandra.yaml file are: change seeds to IP address
> of one node, change listen and rpc address to IP address of the node and
> finally change endpoint snitch to GossipingPropertyFileSnitch. The
> replication factor is set to 1 when having 2-node cluster. I use only one
> datacenter. The cluster seems to be doing fine (I can see nodes
> communicating) and so is the CPU, RAM usage on the machines.
>
>
>
> Does anybody have any ideas? Any help would be very appreciated.
>
>
>
> Thanks!
>
>
>
>


Re: Not timing out some queries (Java driver)

2016-12-21 Thread Voytek Jarnot
cassandra.yaml has various timeouts such as read_request_timeout,
range_request_timeout, write_request_timeout, etc.  The driver does as well
(via Cluster -> Configuration -> SocketOptions -> setReadTimeoutMillis).

Not sure if you can (or would want to) set them to "forever", but it's a
starting point.

On Wed, Dec 21, 2016 at 7:10 PM, Ali Akhtar  wrote:

> I have some queries which need to be processed in a consistent manner. I'm
> setting the consistently level = ALL option on these queries.
>
> However, I've noticed that sometimes these queries fail because of a
> timeout (2 seconds).
>
> In my use case, for certain queries, I want them to never time out and
> block until they have been acknowledged by all nodes.
>
> Is that possible thru the Datastax Java driver, or another way?
>


Re: Processing time series data in order

2016-12-21 Thread Jesse Hodges
Depending on the expected max out of order window, why not order them in 
memory? Then you don't need to reread from Cassandra, in case of a problem you 
can reread data from Kafka. 

-Jesse 

> On Dec 21, 2016, at 7:24 PM, Ali Akhtar  wrote:
> 
> - I'm receiving a batch of messages to a Kafka topic.
> 
> Each message has a timestamp, however the messages can arrive / get processed 
> out of order. I.e event 1's timestamp could've been a few seconds before 
> event 2, and event 2 could still get processed before event 1.
> 
> - I know the number of messages that are sent per batch.
> 
> - I need to process the messages in order. The messages are basically 
> providing the history of an item. I need to be able to track the history 
> accurately (i.e, if an event occurred 3 times, i need to accurately log the 
> dates of the first, 2nd, and 3rd time it occurred).
> 
> The approach I'm considering is:
> 
> - Creating a cassandra table which is ordered by the timestamp of the 
> messages.
> 
> - Once a batch of messages has arrived, writing them all to cassandra, 
> counting on them being ordered by the timestamp even if they are processed 
> out of order.
> 
> - Then iterating over the messages in the cassandra table, to process them in 
> order.
> 
> However, I'm concerned about Cassandra's eventual consistency. Could it be 
> that even though I wrote the messages, they are not there when I try to read 
> them (which would be almost immediately after they are written)?
> 
> Should I enforce consistency = ALL to make sure the messages will be 
> available immediately after being written?
> 
> Is there a better way to handle this thru either Kafka streams or Cassandra?


Processing time series data in order

2016-12-21 Thread Ali Akhtar
- I'm receiving a batch of messages to a Kafka topic.

Each message has a timestamp, however the messages can arrive / get
processed out of order. I.e event 1's timestamp could've been a few seconds
before event 2, and event 2 could still get processed before event 1.

- I know the number of messages that are sent per batch.

- I need to process the messages in order. The messages are basically
providing the history of an item. I need to be able to track the history
accurately (i.e, if an event occurred 3 times, i need to accurately log the
dates of the first, 2nd, and 3rd time it occurred).

The approach I'm considering is:

- Creating a cassandra table which is ordered by the timestamp of the
messages.

- Once a batch of messages has arrived, writing them all to cassandra,
counting on them being ordered by the timestamp even if they are processed
out of order.

- Then iterating over the messages in the cassandra table, to process them
in order.

However, I'm concerned about Cassandra's eventual consistency. Could it be
that even though I wrote the messages, they are not there when I try to
read them (which would be almost immediately after they are written)?

Should I enforce consistency = ALL to make sure the messages will be
available immediately after being written?

Is there a better way to handle this thru either Kafka streams or Cassandra?


Not timing out some queries (Java driver)

2016-12-21 Thread Ali Akhtar
I have some queries which need to be processed in a consistent manner. I'm
setting the consistently level = ALL option on these queries.

However, I've noticed that sometimes these queries fail because of a
timeout (2 seconds).

In my use case, for certain queries, I want them to never time out and
block until they have been acknowledged by all nodes.

Is that possible thru the Datastax Java driver, or another way?


Re: Cassandra cluster performance

2016-12-21 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi,

- Consistency level is set to ONE
-  Keyspace definition:

"CREATE KEYSPACE  IF NOT EXISTS  onem2m " +
"WITH replication = " +
"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}";



- yes, the client is on separate VM

- In our project we use Cassandra API version 3.0.2 but the database (cluster) 
is version 3.9

- for 2node cluster:

 first VM: 25 GB RAM, 16 CPUs

 second VM: 16 GB RAM, 16 CPUs




From: Ben Slater 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, December 21, 2016 at 2:32 PM
To: "user@cassandra.apache.org" 
Subject: Re: Cassandra cluster performance

You would expect some drop when moving to single multiple nodes but on the face 
of it that feels extreme to me (although I’ve never personally tested the 
difference). Some questions that might help provide an answer:
- what consistency level are you using for the test?
- what is your keyspace definition (replication factor most importantly)?
- where are you running your test client (is it a separate box to cassandra)?
- what C* version?
- what are specs (CPU, RAM) of the test servers?

Cheers
Ben

On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) mailto:bjano...@cisco.com>> wrote:
Hi all,

I’m working on a project and we have Java benchmark test for testing the 
performance when using Cassandra database. Create operation on a single node 
Cassandra cluster is about 15K operations per second. Problem we have is when I 
set up cluster with 2 or more nodes (each of them are on separate virtual 
machines and servers), the performance goes down to 1K ops/sec. I follow the 
official instructions on how to set up a multinode cluster – the only things I 
change in Cassandra.yaml file are: change seeds to IP address of one node, 
change listen and rpc address to IP address of the node and finally change 
endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set 
to 1 when having 2-node cluster. I use only one datacenter. The cluster seems 
to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage 
on the machines.

Does anybody have any ideas? Any help would be very appreciated.

Thanks!



Python Upgrade to 2.7

2016-12-21 Thread Jacob Shadix
I am running Cassandra 2.1.14. Upgraded to Python 2.7 from 2.6.6 and
getting the following error with CQLSH.
---

Python Cassandra driver not installed, or not on PYTHONPATH.

You might try "pip install cassandra-driver".

Python: /opt/isv/python27/bin/python

Error: can't decompress data; zlib not available

---

What am I missing?
-- Jacob Shadix


Re: Cassandra cluster performance

2016-12-21 Thread Ben Slater
You would expect some drop when moving to single multiple nodes but on the
face of it that feels extreme to me (although I’ve never personally tested
the difference). Some questions that might help provide an answer:
- what consistency level are you using for the test?
- what is your keyspace definition (replication factor most importantly)?
- where are you running your test client (is it a separate box to
cassandra)?
- what C* version?
- what are specs (CPU, RAM) of the test servers?

Cheers
Ben

On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at
Cisco)  wrote:

> Hi all,
>
>
>
> I’m working on a project and we have Java benchmark test for testing the
> performance when using Cassandra database. Create operation on a single
> node Cassandra cluster is about 15K operations per second. Problem we have
> is when I set up cluster with 2 or more nodes (each of them are on separate
> virtual machines and servers), the performance goes down to 1K ops/sec. I
> follow the official instructions on how to set up a multinode cluster – the
> only things I change in Cassandra.yaml file are: change seeds to IP address
> of one node, change listen and rpc address to IP address of the node and
> finally change endpoint snitch to GossipingPropertyFileSnitch. The
> replication factor is set to 1 when having 2-node cluster. I use only one
> datacenter. The cluster seems to be doing fine (I can see nodes
> communicating) and so is the CPU, RAM usage on the machines.
>
>
>
> Does anybody have any ideas? Any help would be very appreciated.
>
>
>
> Thanks!
>
>
>


Cassandra cluster performance

2016-12-21 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi all,

I’m working on a project and we have Java benchmark test for testing the 
performance when using Cassandra database. Create operation on a single node 
Cassandra cluster is about 15K operations per second. Problem we have is when I 
set up cluster with 2 or more nodes (each of them are on separate virtual 
machines and servers), the performance goes down to 1K ops/sec. I follow the 
official instructions on how to set up a multinode cluster – the only things I 
change in Cassandra.yaml file are: change seeds to IP address of one node, 
change listen and rpc address to IP address of the node and finally change 
endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set 
to 1 when having 2-node cluster. I use only one datacenter. The cluster seems 
to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage 
on the machines.

Does anybody have any ideas? Any help would be very appreciated.

Thanks!



Re: Query on Cassandra clusters

2016-12-21 Thread Sumit Anvekar
Thank you Alain for the detailed explanation.

To answer you question on Java version, JVM settings and Memory usage. We
are using using 1.8.0_45. precisely
>java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

JVM settings are identical on all nodes (cassandra-env.sh is identical).

Further when I say high on memory usage, Cassandra is using heap
(-Xmx3767M) and off heap of about 6GB out of the total system memory of
14.7 GB. Along with this there are other processes running on this system
which is bring the overall memory usage to >95%. This bring me to another
point whether *heap memory* + *off heap (sum of values of Space used
(total)) from nodetool cfstats* is the total memory used by Cassandra on a
node?

Also, on the disk front, what is a good amount of empty space to be left
out unused in the partition(~50%
 should be?) considering we use SizeTieredCompaction strategy?

On Wed, Dec 21, 2016 at 6:30 PM, Alain RODRIGUEZ  wrote:

> Hi Sumit,
>
> 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
>> version 3.0.3 and then newer 5 nodes have 3.6.0 version.
>
>
> I strongly recommend to:
>
>
>- Stick with one version of Apache Cassandra per cluster.
>- Always be as close as possible from the last minor release of the
>Cassandra version in use.
>
>
> So you *really should* not be using 3.0.6 *AND* 3.6.0 but rather 3.0.10
> *OR* 3.7 (currently). Note that Cassandra 3.X (with X > 0) uses a tic toc
> release cycle where odd are bug fixes only and even numbers introduce new
> features as well.
>
> Running multiple version for a long period can induces errors, Cassandra
> is built to handle multiple versions only to give the time to operators to
> run a rolling restart. No streaming (adding / removing / repairing nodes)
> should happen during this period. Also, I have seen in the past some cases
> where changing the schema was also an issue with multiple versions leading
> to schema disagreements.
>
> Due to this scenario, a couple boxes are running very high on memory (95%
>> usage) whereas some of the older version nodes have just 60-70% memory
>> usage.
>
>
> Hard to say if this is related to the mutiple versions of Cassandra but it
> could. Are you sure nodes are using the same JVM / GC options
> (cassandra-env.sh) and Java version?
>
> Also, what is exactly "high on memory 95%"? Are we talking about heap or
> Native memory. Isn't the memory used as page cache (that would still be
> available for the system)?
>
> 2. To counter #1, I am planning to upgrade system configuration of the
>> nodes where there is higher memory usage. But the question is, will it be a
>> problem if we have a Cassandra cluster, where in a couple of nodes have
>> double the system configuration than other nodes in the cluster.
>>
>
> It is not a problem per se to have distinct configurations on distinct
> nodes. Cassandra does it very well, and it is frequently used to test some
> configuration change on a canary node, to prevent it from impacting the
> whole service.
>
> Yet, all the nodes should be doing the same work (unless you have some
> heterogenous hardware and are using distinct number of vnodes on each
> node). Keeping things homogenous allows the operator to easily compare how
> nodes are doing and it makes reasoning about Cassandra, as well as
> troubleshooting issues a way easier.
>
> So I would:
>
> - Fully upgrade / downgrade asap to a chosen version (3.X is known as
> being not yet stable, but going back to 3.0.X might be more painful)
> - Make sure nodes are well balanced and using the same number of ranges
> 'nodetool status '
> - Make sure the node are using the same Java version and JVM settings.
>
> Hope that helps,
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-12-21 8:22 GMT+01:00 Sumit Anvekar :
>
>> I have a couple questions.
>>
>> 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
>> version 3.0.3 and then newer 5 nodes have 3.6.0 version. I has been running
>> fine until recently I am seeing higher amount of data residing in newer
>> boxes. The configuration file (YAML file) is exactly same on all nodes
>> (except for the node host names). Wondering if the version has something to
>> do with this scenario. Due to this scenario, a couple boxes are running
>> very high on memory (95% usage) whereas some of the older version nodes
>> have just 60-70% memory usage.
>>
>> 2. To counter #1, I am planning to upgrade system configuration of the
>> nodes where there is higher memory usage. But the question is, will it be a
>> problem if we have a Cassandra cluster, where in a couple of nodes have
>> double the system configuration than other nodes in the cluster.
>>
>> Appreciate any comment on the same.
>>
>> Sumit.
>

Advice in upgrade plan from 1.2.18 to 2.2.8

2016-12-21 Thread Aiman Parvaiz
Hi everyone,
I have 2 C* DCs with 12 nodes in each running 1.2.18. I plan to upgrade them to 
2.2.latest and wanted to run by you experts my plan.


  1.  Install 2.0.latest on one node at a time, start and wait for it to join 
the ring.
  2.  Run upgradesstables on this node.
  3.  Repeat Step 1,2 on each node installing cassandra2.0 in a rolling manner 
and running upgradesstables in parallel. (Please let me know if running 
upgrades stables in parallel is not right here. My cluster is not under much 
load really)
  4.  Now I will have both my DCs running 2.0.latest.
  5.  Install cassandra 2.1.latest on one node at a time (same as above)
  6.  Do I need to run upgradesstables here again after the node has started 
and joined? (I think yes, but seek advice. 
https://docs.datastax.com/en/latest-upgrade/upgrade/cassandra/upgrdCassandra.html)
  7.  Following the above pattern, I would install cassandra2.1 in a rolling 
manner across 2 DCs (depending on response to 6 I might or might not run 
upgradesstables)
  8.  At this point both DCs would have 2.1.latest and again in rolling manner 
I install 2.2.8.

My assumption is that while this upgrade would be happening C* would still be 
able to serve reads and writes and running different versions at various point 
in the upgrade process will not affect the apps reading/writing from C*.

Thanks



Openstack and Cassandra

2016-12-21 Thread Shalom Sagges
Hi Everyone,

I am looking into the option of deploying a Cassandra cluster on Openstack
nodes instead of physical nodes due to resource management considerations.

Does anyone has any insights regarding this?
Can this combination work properly?
Since the disks (HDDs) are part of one physical machine that divide their
capacity to various instances (not only Cassandra), will this affect
performance, especially when the commitlog directory will probably reside
with the data directory?

I'm at a loss here and don't have any answers for that matter.

Can anyone assist please?

Thanks!



Shalom Sagges
DBA
T: +972-74-700-4035
 
 We Create Meaningful Connections

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.


Re: High CPU on nodes

2016-12-21 Thread Nate McCall
https://issues.apache.org/jira/browse/CASSANDRA-6908

Disable DynamicSnitch by adding the following to cassandra.yaml (it is a
not in the file by default):

dynamic_snitch: false



On Wed, Dec 21, 2016 at 8:40 AM, Anubhav Kale 
wrote:

> CIL
>
>
>
> *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Sent:* Saturday, December 17, 2016 5:18 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: High CPU on nodes
>
>
>
> Hi,
>
>
>
> What does 'nodetool netstats' looks like on those nodes?
>
>
>
> *Its not doing any streaming.*
>
>
>
> we have 30GB heap
>
>
>
> How is the JVM / GC doing? Are you using G1GC or CMS? This setting would
> be bad for CMS.
>
>
>
> *G1. GC is doing fine. I don’t see any long pauses beyond 200 ms.*
>
>
>
> You can use this tool to understand were the CPU is being used
> https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#
> ttop-command
> 
> .
>
>
>
> I hope that helps,
>
>
>
> C*heers,
>
> ---
>
> Alain Rodriguez - @arodream - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> 
>
>
>
>
>
>
>
> 2016-12-17 0:10 GMT+01:00 Anubhav Kale :
>
> Hello,
>
>
>
> I am trying to fight a high CPU problem on some of our nodes. Thread dumps
> show that it’s not GC threads (we have 30GB heap), iostat %iowait confirms
> it’s not disk (ranges between 0.3 – 0.9%). One of the ways in which the
> problem manifests is that the nodes can’t compact SSTables and it happens
> randomly. We run Cassandra 2.1.13 on Azure Premium Storage (network
> attached SSDs).
>
>
>
> One of the sample threads that was taking high CPU shows :
>
>
>
> "pool-13-thread-1" #3352
> 
>  prio=5 os_prio=0 tid=0x7f2275340bb0 nid=0x1b0b runnable
> [0x7f33ffaae000]
> java.lang.Thread.State: RUNNABLE
> at java.util.TimSort.gallopRight(TimSort.java:632)
> at java.util.TimSort.mergeLo(TimSort.java:739)
> at java.util.TimSort.mergeAt(TimSort.java:514)
> at java.util.TimSort.mergeCollapse(TimSort.java:441)
> at java.util.TimSort.sort(TimSort.java:245)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at org.apache.cassandra.locator.DynamicEndpointSnitch.
> sortByProximityWithScore(DynamicEndpointSnitch.java:163)
> at org.apache.cassandra.locator.DynamicEndpointSnitch.
> sortByProximityWithBadness(DynamicEndpointSnitch.java:200)
> at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(
> DynamicEndpointSnitch.java:152)
> at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(
> StorageProxy.java:1581)
> at org.apache.cassandra.service.StorageProxy.getRangeSlice(
> StorageProxy.java:1739)
>
>
>
> Looking at code, I can’t figure out why things like this would require a
> high CPU and I don’t find any JIRAs relating this as well. So, what can I
> do next to troubleshoot this ?
>
>
>
> Thanks !
>
>
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


RE: High CPU on nodes

2016-12-21 Thread Anubhav Kale
CIL

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Saturday, December 17, 2016 5:18 AM
To: user@cassandra.apache.org
Subject: Re: High CPU on nodes

Hi,

What does 'nodetool netstats' looks like on those nodes?

Its not doing any streaming.

we have 30GB heap

How is the JVM / GC doing? Are you using G1GC or CMS? This setting would be bad 
for CMS.

G1. GC is doing fine. I don’t see any long pauses beyond 200 ms.

You can use this tool to understand were the CPU is being used 
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#ttop-command.

I hope that helps,

C*heers,
---
Alain Rodriguez - @arodream - 
al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-12-17 0:10 GMT+01:00 Anubhav Kale 
mailto:anubhav.k...@microsoft.com>>:
Hello,

I am trying to fight a high CPU problem on some of our nodes. Thread dumps show 
that it’s not GC threads (we have 30GB heap), iostat %iowait confirms it’s not 
disk (ranges between 0.3 – 0.9%). One of the ways in which the problem 
manifests is that the nodes can’t compact SSTables and it happens randomly. We 
run Cassandra 2.1.13 on Azure Premium Storage (network attached SSDs).

One of the sample threads that was taking high CPU shows :

"pool-13-thread-1" 
#3352
 prio=5 os_prio=0 tid=0x7f2275340bb0 nid=0x1b0b runnable 
[0x7f33ffaae000]
java.lang.Thread.State: RUNNABLE
at java.util.TimSort.gallopRight(TimSort.java:632)
at java.util.TimSort.mergeLo(TimSort.java:739)
at java.util.TimSort.mergeAt(TimSort.java:514)
at java.util.TimSort.mergeCollapse(TimSort.java:441)
at java.util.TimSort.sort(TimSort.java:245)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1454)
at java.util.Collections.sort(Collections.java:175)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithScore(DynamicEndpointSnitch.java:163)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithBadness(DynamicEndpointSnitch.java:200)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(DynamicEndpointSnitch.java:152)
at 
org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1581)
at 
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1739)

Looking at code, I can’t figure out why things like this would require a high 
CPU and I don’t find any JIRAs relating this as well. So, what can I do next to 
troubleshoot this ?

Thanks !



Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Voytek Jarnot
Reading that article the only conclusion I can reach (unless I'm
misreading) is that all the stuff that was never free is still not free -
the change is that Oracle may actually be interested in the fact that some
are using non-free products for free.

Pretty much a non-story, it seems like.

On Tue, Dec 20, 2016 at 11:55 PM, Kant Kodali  wrote:

> Looking at this http://www.theregister.co.uk/2016/12/16/oracle_
> targets_java_users_non_compliance/?mt=1481919461669 I don't know why
> Cassandra recommends Oracle JVM?
>
> JVM is a great piece of software but I would like to stay away from Oracle
> as much as possible. Oracle is just horrible the way they are dealing with
> Java in General.
>
>
>


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Michael Shuler
On 12/21/2016 08:38 AM, Eric Evans wrote:
> I don't really have any opinions on Oracle per say, but Cassandra is a
> Free Software project and I would prefer that we not depend on
> commercial software, (and that's kind of what we have here, an
> implicit dependency).

Just a bit of clarification. The debian packages depend on OpenJDK as
the first preference, then the meta-package that may be satisfied by a
custom rolled Oracle-based deb:

  https://github.com/apache/cassandra/blob/trunk/debian/control#L14

I dug through the pseudo packages that RHEL/CentOS provide in the vendor
OpenJDK rpms and selected 'jre' when I committed:

  https://github.com/apache/cassandra/blob/trunk/redhat/cassandra.spec#L23

In both cases, an install of packaged deb or rpm will pull in OpenJDK by
default, unless the user goes out of his/her way to override this.

-- 
Kind regards,
Michael


Re: Choosing a compaction strategy (TWCS)

2016-12-21 Thread Voytek Jarnot
Just want to bump this thread if possible... having trouble ferreting out
the specifics of TWCS configuration, google's not being particularly
helpful.

If tombstone compactions are disabled by default in TWCS, does one enable
them by setting values for tombstone_compaction_interval and
tombstone_threshold?  Or am I was off - is there more to it?



On Sat, Dec 17, 2016 at 11:08 AM, Voytek Jarnot 
wrote:

> Thanks again.
>
> I swear I'd look this up instead, but my google-fu is failing me
> completely ... That said, I presume that they're enabled by setting values
> for tombstone_compaction_interval and tombstone_threshold?  Or is there
> more to it?
>
> On Fri, Dec 16, 2016 at 10:41 PM, Jeff Jirsa 
> wrote:
>
>> With the caveat that tombstone compactions are disabled by default in
>> TWCS (and DTCS)
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Dec 16, 2016, at 8:34 PM, Voytek Jarnot 
>> wrote:
>>
>> Gotcha.  "never compacted" has an implicit asterisk referencing
>> tombstone_compaction_interval and tombstone_threshold, sounds like.  More
>> of a "never compacted" via strategy selection, but eligible for
>> tombstone-triggered compaction.
>>
>> On Fri, Dec 16, 2016 at 10:07 PM, Jeff Jirsa 
>> wrote:
>>
>>> Tombstone compaction subproperties can handle tombstone removal for you
>>> (you’ll set a ratio of tombstones worth compacting away – for example, 80%,
>>> and set an interval to prevent continuous compaction – for example, 24
>>> hours, and then anytime there’s no other work to do, if there’s an sstable
>>> over 24 hours old that’s at least 80% tombstones, it’ll compact it in a
>>> single sstable compaction).
>>>
>>>
>>>
>>> -  Jeff
>>>
>>>
>>>
>>> *From: *Voytek Jarnot 
>>> *Reply-To: *"user@cassandra.apache.org" 
>>> *Date: *Friday, December 16, 2016 at 7:34 PM
>>>
>>> *To: *"user@cassandra.apache.org" 
>>> *Subject: *Re: Choosing a compaction strategy (TWCS)
>>>
>>>
>>>
>>> Thanks again, Jeff.
>>>
>>>
>>>
>>> Thinking about this some more, I'm wondering if I'm overthinking or if
>>> there's a potential issue:
>>>
>>>
>>>
>>> If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on
>>> some (relatively small percentage) of my records - am I going to be leaving
>>> tombstones around all over the place?  My noob-read on this is that TWCS
>>> will not compact tables comprised of records older than 7 days (
>>> https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dm
>>> lHowDataMaintain.html#dmlHowDataMaintain__twcs
>>> ),
>>> but Cassandra will not evict my tombstones until 7 days + consideration for
>>> gc_grace_seconds have passed ... resulting in no tombstone removal (?).
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa 
>>> wrote:
>>>
>>> The issue is that your partitions will likely be in 2 sstables instead
>>> of “theoretically” 1. In practice, they’re probably going to bleed into 2
>>> anyway (memTable flush to sstable isn’t going to happen exactly when the
>>> window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.
>>>
>>>
>>>
>>> -  Jeff
>>>
>>>
>>>
>>> *From: *Voytek Jarnot 
>>> *Reply-To: *"user@cassandra.apache.org" 
>>> *Date: *Friday, December 16, 2016 at 11:12 AM
>>>
>>>
>>> *To: *"user@cassandra.apache.org" 
>>> *Subject: *Re: Choosing a compaction strategy (TWCS)
>>>
>>>
>>>
>>> Thank you Jeff - always nice to hear straight from the source.
>>>
>>>
>>>
>>> Any issues you can see with 3 (my calendar-week bucket not aligning with
>>> the arbitrary 7-day window)? Or am I confused (I'd put money on this
>>> option, but I've been wrong once or twice before)?
>>>
>>>
>>>
>>> On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa 
>>> wrote:
>>>
>>> I skipped over the more important question  - loading data in. Two
>>> options:
>>>
>>> 1)   Load data in order through the normal writepath and use “USING
>>> TIMESTAMP” to set the timestamp, or
>>>
>>> 2)   Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables,
>>> then sstableloader them into the cluster.
>>>
>>>
>>>
>>> Either way, try not to mix writes of old data and new data in the
>>> “normal” write path  at the same time, even if you write “USING TIMESTAMP”,
>>> because it’ll get mixed in the memTable, and flushed into the same sstable
>>> – it won’t kill you, but if you can avoid it, avoid it.
>>>
>>>
>>>
>>> -  Jeff
>>>
>>>
>>>
>>>
>>>
>>> *From: *Jeff Jirsa 
>>> *Date: *Friday, December 16, 2016 at 10:47 AM
>>> *To: *"user@cassandra.apache.org" 
>>> *Subject: *Re: Choosing a compaction strategy (TWCS)
>>>
>>>
>>>
>>> With a 10 year retention, just ignore the target sstable count (I should
>>>

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Eric Evans
On Tue, Dec 20, 2016 at 11:55 PM, Kant Kodali  wrote:
> Looking at this
> http://www.theregister.co.uk/2016/12/16/oracle_targets_java_users_non_compliance/?mt=1481919461669
> I don't know why Cassandra recommends Oracle JVM?

The long answer probably dates back to before the Oracle JVM was as
closely coupled to OpenJDK as it is now.  There were times when you'd
have to choose the sweet spot between a matrix of issues between
various versions of each, and Oracle often came out on top.

But that was in the past, OpenJDK is much more viable these days
(Cassandra works great with it IMO), and so the short answer is: It's
easier.  It's easier to test and reproduce with one version, and then
recommend that version to everyone.

> JVM is a great piece of software but I would like to stay away from Oracle
> as much as possible. Oracle is just horrible the way they are dealing with
> Java in General.

I don't really have any opinions on Oracle per say, but Cassandra is a
Free Software project and I would prefer that we not depend on
commercial software, (and that's kind of what we have here, an
implicit dependency).


-- 
Eric Evans
john.eric.ev...@gmail.com


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Edward Capriolo
On Wednesday, December 21, 2016, Kant Kodali  wrote:

> https://www.youtube.com/watch?v=9ei-rbULWoA
>
> On Wed, Dec 21, 2016 at 2:59 AM, Kant Kodali  > wrote:
>
>> https://www.elastic.co/guide/en/elasticsearch/guide/current/
>> _java_virtual_machine.html
>>
>> On Wed, Dec 21, 2016 at 2:58 AM, Kant Kodali > > wrote:
>>
>>> The fact is Oracle is horrible :)
>>>
>>>
>>> On Wed, Dec 21, 2016 at 2:54 AM, Brice Dutheil >> > wrote:
>>>
 Let's not debate opinion on the Oracle stewardship here, we certainly
 have different views that come from different experiences.

 Let's discuss facts instead :)

 -- Brice

 On Wed, Dec 21, 2016 at 11:34 AM, Kant Kodali >>> > wrote:

> yeah well I don't think Oracle is treating Java the way Google is
> treating Go and I am not a big fan of Go mainly because I understand the
> JVM is far more robust than anything that is out there.
>
> "Oracle just doesn't understand open source" These are the words from
> James Gosling himself
>
> I do think its better to stay away from Oracle as we never know when
> they would switch open source to closed source. Given their history of
> practices their statements are not credible.
>
> I am pretty sure the community would take care of OpenJDK.
>
>
>
>
>
> On Wed, Dec 21, 2016 at 2:04 AM, Brice Dutheil <
> brice.duth...@gmail.com
> > wrote:
>
>> The problem described in this article is different than what you have
>> on your servers and I’ll add this article should be reaad with caution, 
>> as
>> The Register is known for sensationalism. The article itself has no
>> substantial proof or enough details. In my opinion this article is
>> clickbait.
>>
>> Anyway there’s several point to think of instead of just swicthing to
>> OpenJDK :
>>
>>-
>>
>>There is technical differences between Oracle JDK and openjdk.
>>Where there’s licensing issues some libraries are closed source in 
>> Hotspot
>>like font, rasterizer or cryptography and OpenJDK use open source
>>alternatives which leads to different bugs or performance. I believe 
>> they
>>also have minor differences in the hotspot code to plug in stuff like 
>> Java
>>Mission Control or Flight Recorder or hotpost specific options.
>>Also I believe that Oracle JDK is more tested or more up to date
>>than OpenJDK.
>>
>>So while OpenJDK is functionnaly the same as Oracle JDK it may
>>not have the same performance or the same bugs or the same security 
>> fixes.
>>(Unless are your ready to test that with your production servers and 
>> your
>>production data).
>>
>>I don’t know if datastax have released the details of their
>>configuration when they test Cassandra.
>>-
>>
>>There’s also a question of support. OpeJDK is for the community.
>>Oracle can offer support but maybe only for Oracle JDK.
>>
>>Twitter uses OpenJDK, but they have their own JVM support team.
>>Not sure everyone can afford that.
>>
>> As a side note I’ll add that Oracle is paying talented engineers to
>> work on the JVM to make it great.
>>
>> Cheers,
>> ​
>>
>> -- Brice
>>
>> On Wed, Dec 21, 2016 at 6:55 AM, Kant Kodali > > wrote:
>>
>>> Looking at this http://www.theregister.co
>>> .uk/2016/12/16/oracle_targets_java_users_non_compliance/?mt=
>>> 1481919461669 I don't know why Cassandra recommends Oracle JVM?
>>>
>>> JVM is a great piece of software but I would like to stay away from
>>> Oracle as much as possible. Oracle is just horrible the way they are
>>> dealing with Java in General.
>>>
>>>
>>>
>>
>

>>>
>>
>
Generally a good decision is to balance between a platform you are familiar
with and a platform most commonly deployed in production.

Ie even if i saw a talk from facebook that says cassandra is awesome on
solaris x running on cool threads chips, but if i was at a windows intel
shop i might not pain myself with the burden.

Cassandra uses specific native/unsafe libraries not guarenteed to be
portable. Eg once i was using a non sun jvm and the saved key caches would
not load.

As to oracle not knowing open source, maybe not but sun had its own issues,
see the story about apache harmony and sun unwilling to certify the harmony
jvm. What about Micro$oft and jplus plus or how google managed to clone
java and create the android playform.

Leta not forget suns license which jas made it a total pita to get java
ported and installed on linux amd bsds alike. That non curlable download
process.


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Query on Cassandra clusters

2016-12-21 Thread Alain RODRIGUEZ
Hi Sumit,

1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
> version 3.0.3 and then newer 5 nodes have 3.6.0 version.


I strongly recommend to:


   - Stick with one version of Apache Cassandra per cluster.
   - Always be as close as possible from the last minor release of the
   Cassandra version in use.


So you *really should* not be using 3.0.6 *AND* 3.6.0 but rather 3.0.10 *OR*
3.7 (currently). Note that Cassandra 3.X (with X > 0) uses a tic toc
release cycle where odd are bug fixes only and even numbers introduce new
features as well.

Running multiple version for a long period can induces errors, Cassandra is
built to handle multiple versions only to give the time to operators to run
a rolling restart. No streaming (adding / removing / repairing nodes)
should happen during this period. Also, I have seen in the past some cases
where changing the schema was also an issue with multiple versions leading
to schema disagreements.

Due to this scenario, a couple boxes are running very high on memory (95%
> usage) whereas some of the older version nodes have just 60-70% memory
> usage.


Hard to say if this is related to the mutiple versions of Cassandra but it
could. Are you sure nodes are using the same JVM / GC options
(cassandra-env.sh) and Java version?

Also, what is exactly "high on memory 95%"? Are we talking about heap or
Native memory. Isn't the memory used as page cache (that would still be
available for the system)?

2. To counter #1, I am planning to upgrade system configuration of the
> nodes where there is higher memory usage. But the question is, will it be a
> problem if we have a Cassandra cluster, where in a couple of nodes have
> double the system configuration than other nodes in the cluster.
>

It is not a problem per se to have distinct configurations on distinct
nodes. Cassandra does it very well, and it is frequently used to test some
configuration change on a canary node, to prevent it from impacting the
whole service.

Yet, all the nodes should be doing the same work (unless you have some
heterogenous hardware and are using distinct number of vnodes on each
node). Keeping things homogenous allows the operator to easily compare how
nodes are doing and it makes reasoning about Cassandra, as well as
troubleshooting issues a way easier.

So I would:

- Fully upgrade / downgrade asap to a chosen version (3.X is known as being
not yet stable, but going back to 3.0.X might be more painful)
- Make sure nodes are well balanced and using the same number of ranges
'nodetool status '
- Make sure the node are using the same Java version and JVM settings.

Hope that helps,

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-12-21 8:22 GMT+01:00 Sumit Anvekar :

> I have a couple questions.
>
> 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra
> version 3.0.3 and then newer 5 nodes have 3.6.0 version. I has been running
> fine until recently I am seeing higher amount of data residing in newer
> boxes. The configuration file (YAML file) is exactly same on all nodes
> (except for the node host names). Wondering if the version has something to
> do with this scenario. Due to this scenario, a couple boxes are running
> very high on memory (95% usage) whereas some of the older version nodes
> have just 60-70% memory usage.
>
> 2. To counter #1, I am planning to upgrade system configuration of the
> nodes where there is higher memory usage. But the question is, will it be a
> problem if we have a Cassandra cluster, where in a couple of nodes have
> double the system configuration than other nodes in the cluster.
>
> Appreciate any comment on the same.
>
> Sumit.
>


Re: iostat -like tool to parse 'nodetool cfstats'

2016-12-21 Thread Alain RODRIGUEZ
Hi Kevin,


> nodetool cfstats has some valuable data but what I would like is a 1
> minute delta.


And you are right in what you think would be useful. In many cases
variation is a way more informative than an absolute value indeed. I have a
doubt regarding your approach though.

I want to see IO throughput and load on C* for each table.
>

+1 on Kurt comment:

Anything in cfstats you should be able to retrieve through the metrics
> Mbeans. See https://cassandra.apache.org/doc/latest/operating/metrics.html


I do not use 'nodetool cfstats' that much. As soon as a monitoring system
is in place, I prefer using a chart, making things a way more obvious, with
no need to build anything on top of nodetool cfstats. Plus you can mix the
chart with other relevant and related informations. That's probably the
best way to go for production.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-12-21 2:01 GMT+01:00 kurt Greaves :

> Anything in cfstats you should be able to retrieve through the metrics
> Mbeans. See https://cassandra.apache.org/doc/latest/operating/metrics.html
>
> On 20 December 2016 at 23:04, Richard L. Burton III 
> wrote:
>
>> I haven't seen anything like that myself. It would be nice to have
>> nodetool cfstats to be presented in a nicier format.
>>
>> If you plan to work on that, let me know. I would help contribute to it
>> next month.
>>
>> On Tue, Dec 20, 2016 at 5:59 PM, Kevin Burton  wrote:
>>
>>> nodetool cfstats has some valuable data but what I would like is a 1
>>> minute delta.
>>>
>>> Similar to iostat...
>>>
>>> It's easy to parse this but has anyone done it?
>>>
>>> I want to see IO throughput and load on C* for each table.
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>>
>>>
>>
>>
>> --
>> -Richard L. Burton III
>> @rburton
>>
>
>


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Kant Kodali
https://www.youtube.com/watch?v=9ei-rbULWoA

On Wed, Dec 21, 2016 at 2:59 AM, Kant Kodali  wrote:

> https://www.elastic.co/guide/en/elasticsearch/guide/
> current/_java_virtual_machine.html
>
> On Wed, Dec 21, 2016 at 2:58 AM, Kant Kodali  wrote:
>
>> The fact is Oracle is horrible :)
>>
>>
>> On Wed, Dec 21, 2016 at 2:54 AM, Brice Dutheil 
>> wrote:
>>
>>> Let's not debate opinion on the Oracle stewardship here, we certainly
>>> have different views that come from different experiences.
>>>
>>> Let's discuss facts instead :)
>>>
>>> -- Brice
>>>
>>> On Wed, Dec 21, 2016 at 11:34 AM, Kant Kodali  wrote:
>>>
 yeah well I don't think Oracle is treating Java the way Google is
 treating Go and I am not a big fan of Go mainly because I understand the
 JVM is far more robust than anything that is out there.

 "Oracle just doesn't understand open source" These are the words from
 James Gosling himself

 I do think its better to stay away from Oracle as we never know when
 they would switch open source to closed source. Given their history of
 practices their statements are not credible.

 I am pretty sure the community would take care of OpenJDK.





 On Wed, Dec 21, 2016 at 2:04 AM, Brice Dutheil >>> > wrote:

> The problem described in this article is different than what you have
> on your servers and I’ll add this article should be reaad with caution, as
> The Register is known for sensationalism. The article itself has no
> substantial proof or enough details. In my opinion this article is
> clickbait.
>
> Anyway there’s several point to think of instead of just swicthing to
> OpenJDK :
>
>-
>
>There is technical differences between Oracle JDK and openjdk.
>Where there’s licensing issues some libraries are closed source in 
> Hotspot
>like font, rasterizer or cryptography and OpenJDK use open source
>alternatives which leads to different bugs or performance. I believe 
> they
>also have minor differences in the hotspot code to plug in stuff like 
> Java
>Mission Control or Flight Recorder or hotpost specific options.
>Also I believe that Oracle JDK is more tested or more up to date
>than OpenJDK.
>
>So while OpenJDK is functionnaly the same as Oracle JDK it may not
>have the same performance or the same bugs or the same security fixes.
>(Unless are your ready to test that with your production servers and 
> your
>production data).
>
>I don’t know if datastax have released the details of their
>configuration when they test Cassandra.
>-
>
>There’s also a question of support. OpeJDK is for the community.
>Oracle can offer support but maybe only for Oracle JDK.
>
>Twitter uses OpenJDK, but they have their own JVM support team.
>Not sure everyone can afford that.
>
> As a side note I’ll add that Oracle is paying talented engineers to
> work on the JVM to make it great.
>
> Cheers,
> ​
>
> -- Brice
>
> On Wed, Dec 21, 2016 at 6:55 AM, Kant Kodali 
> wrote:
>
>> Looking at this http://www.theregister.co
>> .uk/2016/12/16/oracle_targets_java_users_non_compliance/?mt=
>> 1481919461669 I don't know why Cassandra recommends Oracle JVM?
>>
>> JVM is a great piece of software but I would like to stay away from
>> Oracle as much as possible. Oracle is just horrible the way they are
>> dealing with Java in General.
>>
>>
>>
>

>>>
>>
>


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Kant Kodali
https://www.elastic.co/guide/en/elasticsearch/guide/current/_java_virtual_machine.html

On Wed, Dec 21, 2016 at 2:58 AM, Kant Kodali  wrote:

> The fact is Oracle is horrible :)
>
>
> On Wed, Dec 21, 2016 at 2:54 AM, Brice Dutheil 
> wrote:
>
>> Let's not debate opinion on the Oracle stewardship here, we certainly
>> have different views that come from different experiences.
>>
>> Let's discuss facts instead :)
>>
>> -- Brice
>>
>> On Wed, Dec 21, 2016 at 11:34 AM, Kant Kodali  wrote:
>>
>>> yeah well I don't think Oracle is treating Java the way Google is
>>> treating Go and I am not a big fan of Go mainly because I understand the
>>> JVM is far more robust than anything that is out there.
>>>
>>> "Oracle just doesn't understand open source" These are the words from
>>> James Gosling himself
>>>
>>> I do think its better to stay away from Oracle as we never know when
>>> they would switch open source to closed source. Given their history of
>>> practices their statements are not credible.
>>>
>>> I am pretty sure the community would take care of OpenJDK.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Dec 21, 2016 at 2:04 AM, Brice Dutheil 
>>> wrote:
>>>
 The problem described in this article is different than what you have
 on your servers and I’ll add this article should be reaad with caution, as
 The Register is known for sensationalism. The article itself has no
 substantial proof or enough details. In my opinion this article is
 clickbait.

 Anyway there’s several point to think of instead of just swicthing to
 OpenJDK :

-

There is technical differences between Oracle JDK and openjdk.
Where there’s licensing issues some libraries are closed source in 
 Hotspot
like font, rasterizer or cryptography and OpenJDK use open source
alternatives which leads to different bugs or performance. I believe 
 they
also have minor differences in the hotspot code to plug in stuff like 
 Java
Mission Control or Flight Recorder or hotpost specific options.
Also I believe that Oracle JDK is more tested or more up to date
than OpenJDK.

So while OpenJDK is functionnaly the same as Oracle JDK it may not
have the same performance or the same bugs or the same security fixes.
(Unless are your ready to test that with your production servers and 
 your
production data).

I don’t know if datastax have released the details of their
configuration when they test Cassandra.
-

There’s also a question of support. OpeJDK is for the community.
Oracle can offer support but maybe only for Oracle JDK.

Twitter uses OpenJDK, but they have their own JVM support team. Not
sure everyone can afford that.

 As a side note I’ll add that Oracle is paying talented engineers to
 work on the JVM to make it great.

 Cheers,
 ​

 -- Brice

 On Wed, Dec 21, 2016 at 6:55 AM, Kant Kodali  wrote:

> Looking at this http://www.theregister.co
> .uk/2016/12/16/oracle_targets_java_users_non_compliance/?mt=
> 1481919461669 I don't know why Cassandra recommends Oracle JVM?
>
> JVM is a great piece of software but I would like to stay away from
> Oracle as much as possible. Oracle is just horrible the way they are
> dealing with Java in General.
>
>
>

>>>
>>
>


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Kant Kodali
The fact is Oracle is horrible :)


On Wed, Dec 21, 2016 at 2:54 AM, Brice Dutheil 
wrote:

> Let's not debate opinion on the Oracle stewardship here, we certainly have
> different views that come from different experiences.
>
> Let's discuss facts instead :)
>
> -- Brice
>
> On Wed, Dec 21, 2016 at 11:34 AM, Kant Kodali  wrote:
>
>> yeah well I don't think Oracle is treating Java the way Google is
>> treating Go and I am not a big fan of Go mainly because I understand the
>> JVM is far more robust than anything that is out there.
>>
>> "Oracle just doesn't understand open source" These are the words from
>> James Gosling himself
>>
>> I do think its better to stay away from Oracle as we never know when they
>> would switch open source to closed source. Given their history of practices
>> their statements are not credible.
>>
>> I am pretty sure the community would take care of OpenJDK.
>>
>>
>>
>>
>>
>> On Wed, Dec 21, 2016 at 2:04 AM, Brice Dutheil 
>> wrote:
>>
>>> The problem described in this article is different than what you have on
>>> your servers and I’ll add this article should be reaad with caution, as The
>>> Register is known for sensationalism. The article itself has no substantial
>>> proof or enough details. In my opinion this article is clickbait.
>>>
>>> Anyway there’s several point to think of instead of just swicthing to
>>> OpenJDK :
>>>
>>>-
>>>
>>>There is technical differences between Oracle JDK and openjdk. Where
>>>there’s licensing issues some libraries are closed source in Hotspot like
>>>font, rasterizer or cryptography and OpenJDK use open source alternatives
>>>which leads to different bugs or performance. I believe they also have
>>>minor differences in the hotspot code to plug in stuff like Java Mission
>>>Control or Flight Recorder or hotpost specific options.
>>>Also I believe that Oracle JDK is more tested or more up to date
>>>than OpenJDK.
>>>
>>>So while OpenJDK is functionnaly the same as Oracle JDK it may not
>>>have the same performance or the same bugs or the same security fixes.
>>>(Unless are your ready to test that with your production servers and your
>>>production data).
>>>
>>>I don’t know if datastax have released the details of their
>>>configuration when they test Cassandra.
>>>-
>>>
>>>There’s also a question of support. OpeJDK is for the community.
>>>Oracle can offer support but maybe only for Oracle JDK.
>>>
>>>Twitter uses OpenJDK, but they have their own JVM support team. Not
>>>sure everyone can afford that.
>>>
>>> As a side note I’ll add that Oracle is paying talented engineers to work
>>> on the JVM to make it great.
>>>
>>> Cheers,
>>> ​
>>>
>>> -- Brice
>>>
>>> On Wed, Dec 21, 2016 at 6:55 AM, Kant Kodali  wrote:
>>>
 Looking at this http://www.theregister.co.uk/2016/12/16/oracle_targets_
 java_users_non_compliance/?mt=1481919461669 I don't know why Cassandra
 recommends Oracle JVM?

 JVM is a great piece of software but I would like to stay away from
 Oracle as much as possible. Oracle is just horrible the way they are
 dealing with Java in General.



>>>
>>
>


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Brice Dutheil
Let's not debate opinion on the Oracle stewardship here, we certainly have
different views that come from different experiences.

Let's discuss facts instead :)

-- Brice

On Wed, Dec 21, 2016 at 11:34 AM, Kant Kodali  wrote:

> yeah well I don't think Oracle is treating Java the way Google is treating
> Go and I am not a big fan of Go mainly because I understand the JVM is far
> more robust than anything that is out there.
>
> "Oracle just doesn't understand open source" These are the words from
> James Gosling himself
>
> I do think its better to stay away from Oracle as we never know when they
> would switch open source to closed source. Given their history of practices
> their statements are not credible.
>
> I am pretty sure the community would take care of OpenJDK.
>
>
>
>
>
> On Wed, Dec 21, 2016 at 2:04 AM, Brice Dutheil 
> wrote:
>
>> The problem described in this article is different than what you have on
>> your servers and I’ll add this article should be reaad with caution, as The
>> Register is known for sensationalism. The article itself has no substantial
>> proof or enough details. In my opinion this article is clickbait.
>>
>> Anyway there’s several point to think of instead of just swicthing to
>> OpenJDK :
>>
>>-
>>
>>There is technical differences between Oracle JDK and openjdk. Where
>>there’s licensing issues some libraries are closed source in Hotspot like
>>font, rasterizer or cryptography and OpenJDK use open source alternatives
>>which leads to different bugs or performance. I believe they also have
>>minor differences in the hotspot code to plug in stuff like Java Mission
>>Control or Flight Recorder or hotpost specific options.
>>Also I believe that Oracle JDK is more tested or more up to date than
>>OpenJDK.
>>
>>So while OpenJDK is functionnaly the same as Oracle JDK it may not
>>have the same performance or the same bugs or the same security fixes.
>>(Unless are your ready to test that with your production servers and your
>>production data).
>>
>>I don’t know if datastax have released the details of their
>>configuration when they test Cassandra.
>>-
>>
>>There’s also a question of support. OpeJDK is for the community.
>>Oracle can offer support but maybe only for Oracle JDK.
>>
>>Twitter uses OpenJDK, but they have their own JVM support team. Not
>>sure everyone can afford that.
>>
>> As a side note I’ll add that Oracle is paying talented engineers to work
>> on the JVM to make it great.
>>
>> Cheers,
>> ​
>>
>> -- Brice
>>
>> On Wed, Dec 21, 2016 at 6:55 AM, Kant Kodali  wrote:
>>
>>> Looking at this http://www.theregister.co.uk/2016/12/16/oracle_targets_
>>> java_users_non_compliance/?mt=1481919461669 I don't know why Cassandra
>>> recommends Oracle JVM?
>>>
>>> JVM is a great piece of software but I would like to stay away from
>>> Oracle as much as possible. Oracle is just horrible the way they are
>>> dealing with Java in General.
>>>
>>>
>>>
>>
>


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Kant Kodali
yeah well I don't think Oracle is treating Java the way Google is treating
Go and I am not a big fan of Go mainly because I understand the JVM is far
more robust than anything that is out there.

"Oracle just doesn't understand open source" These are the words from James
Gosling himself

I do think its better to stay away from Oracle as we never know when they
would switch open source to closed source. Given their history of practices
their statements are not credible.

I am pretty sure the community would take care of OpenJDK.





On Wed, Dec 21, 2016 at 2:04 AM, Brice Dutheil 
wrote:

> The problem described in this article is different than what you have on
> your servers and I’ll add this article should be reaad with caution, as The
> Register is known for sensationalism. The article itself has no substantial
> proof or enough details. In my opinion this article is clickbait.
>
> Anyway there’s several point to think of instead of just swicthing to
> OpenJDK :
>
>-
>
>There is technical differences between Oracle JDK and openjdk. Where
>there’s licensing issues some libraries are closed source in Hotspot like
>font, rasterizer or cryptography and OpenJDK use open source alternatives
>which leads to different bugs or performance. I believe they also have
>minor differences in the hotspot code to plug in stuff like Java Mission
>Control or Flight Recorder or hotpost specific options.
>Also I believe that Oracle JDK is more tested or more up to date than
>OpenJDK.
>
>So while OpenJDK is functionnaly the same as Oracle JDK it may not
>have the same performance or the same bugs or the same security fixes.
>(Unless are your ready to test that with your production servers and your
>production data).
>
>I don’t know if datastax have released the details of their
>configuration when they test Cassandra.
>-
>
>There’s also a question of support. OpeJDK is for the community.
>Oracle can offer support but maybe only for Oracle JDK.
>
>Twitter uses OpenJDK, but they have their own JVM support team. Not
>sure everyone can afford that.
>
> As a side note I’ll add that Oracle is paying talented engineers to work
> on the JVM to make it great.
>
> Cheers,
> ​
>
> -- Brice
>
> On Wed, Dec 21, 2016 at 6:55 AM, Kant Kodali  wrote:
>
>> Looking at this http://www.theregister.co.uk/2016/12/16/oracle_targets_
>> java_users_non_compliance/?mt=1481919461669 I don't know why Cassandra
>> recommends Oracle JVM?
>>
>> JVM is a great piece of software but I would like to stay away from
>> Oracle as much as possible. Oracle is just horrible the way they are
>> dealing with Java in General.
>>
>>
>>
>


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Brice Dutheil
The problem described in this article is different than what you have on
your servers and I’ll add this article should be reaad with caution, as The
Register is known for sensationalism. The article itself has no substantial
proof or enough details. In my opinion this article is clickbait.

Anyway there’s several point to think of instead of just swicthing to
OpenJDK :

   -

   There is technical differences between Oracle JDK and openjdk. Where
   there’s licensing issues some libraries are closed source in Hotspot like
   font, rasterizer or cryptography and OpenJDK use open source alternatives
   which leads to different bugs or performance. I believe they also have
   minor differences in the hotspot code to plug in stuff like Java Mission
   Control or Flight Recorder or hotpost specific options.
   Also I believe that Oracle JDK is more tested or more up to date than
   OpenJDK.

   So while OpenJDK is functionnaly the same as Oracle JDK it may not have
   the same performance or the same bugs or the same security fixes. (Unless
   are your ready to test that with your production servers and your
   production data).

   I don’t know if datastax have released the details of their
   configuration when they test Cassandra.
   -

   There’s also a question of support. OpeJDK is for the community. Oracle
   can offer support but maybe only for Oracle JDK.

   Twitter uses OpenJDK, but they have their own JVM support team. Not sure
   everyone can afford that.

As a side note I’ll add that Oracle is paying talented engineers to work on
the JVM to make it great.

Cheers,
​

-- Brice

On Wed, Dec 21, 2016 at 6:55 AM, Kant Kodali  wrote:

> Looking at this http://www.theregister.co.uk/2016/12/16/oracle_
> targets_java_users_non_compliance/?mt=1481919461669 I don't know why
> Cassandra recommends Oracle JVM?
>
> JVM is a great piece of software but I would like to stay away from Oracle
> as much as possible. Oracle is just horrible the way they are dealing with
> Java in General.
>
>
>


unsubscribe

2016-12-21 Thread Reddy Raja