Re: Turn off automatic granting

2021-06-09 Thread Jonathan Koppenhofer
Looks like this is already logged at CASSANDRA-11305
<https://issues.apache.org/jira/browse/CASSANDRA-11305>. I will comment
there. I'd be interested if others have feedback.

On Wed, Jun 9, 2021 at 9:32 AM Jonathan Koppenhofer 
wrote:

> Thanks!
>
> I'll put in a Jira to make this configurable. Maybe submit a patch if I
> can find time.
>
> On Tue, Jun 8, 2021, 6:49 PM Erick Ramirez 
> wrote:
>
>> There's definitely a case for separation of duties. For example, admin
>> roles who have DDL permissions should not have DML access. To achieve this,
>> you'll need to manage the permissions at a granular level and revoke
>> permissions from the role. Cheers!
>>
>>>


Re: Turn off automatic granting

2021-06-09 Thread Jonathan Koppenhofer
Thanks!

I'll put in a Jira to make this configurable. Maybe submit a patch if I can
find time.

On Tue, Jun 8, 2021, 6:49 PM Erick Ramirez 
wrote:

> There's definitely a case for separation of duties. For example, admin
> roles who have DDL permissions should not have DML access. To achieve this,
> you'll need to manage the permissions at a granular level and revoke
> permissions from the role. Cheers!
>
>>


Turn off automatic granting

2021-06-08 Thread Jonathan Koppenhofer
Hi,

In a highly managed environment "automatics granting" (
https://cassandra.apache.org/doc/latest/cql/security.html#automatic-granting)
may not always be desirable. Is there any way to turn this off? Or what
have people done to work around cases where they don't want this.

Some use cases:
- We may have a user that can create schema, but don't want that user to
allow authorization to that resource
- the user already has keyspace permissions, and we don't want it
duplicated at the table level if they create a table.

Thanks


Re: 4.0 best feature/fix?

2021-05-09 Thread Jonathan Koppenhofer
I addition to Jeff's answer... Certificate hot reloading... For an
organization that uses annual certs :)


On Fri, May 7, 2021, 8:47 AM Durity, Sean R 
wrote:

> There is not enough 4.0 chatter here. What feature or fix of the 4.0
> release is most important for your use case(s)/environment? What is working
> well so far? What needs more work? Is there anything that needs more
> explanation?
>
>
>
> Sean Durity
>
> Staff Systems Engineer – Cassandra
>
> #cassandra - for the latest news and updates
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


Re: What does the community think of the DataStax 4.x Java driver changes?

2020-10-29 Thread Jonathan Koppenhofer
We actually feel the same where we have edge cases where downgrading CL was
useful. We did end up writing this in application logic as our code was
pretty well abstracted and centralized to do so.

I will also agree the driver seems overly prescriptive now with limited
ability to override. Good for protecting users from themselves, but bad for
edge cases.

Generally, we work with alot of application teams, and most of them are
putting off the inevitable of the changes required to implement 4.x. In a
reasonably well written moderately complex app, it took one of our best
developers 2 weeks make the changes, and we were finding bugs for a couple
releases after that. Not positive experience.

We now are done with the migration, and things are working reasonably well.


On Thu, Oct 29, 2020, 9:06 AM Johnny Miller  wrote:

> Joshua - thanks for the update, I have found the ASF slack channel
> (#cassandra-cep-drivers-donation) and google doc (
> https://docs.google.com/document/d/1e0SsZxjeTabzrMv99pCz9zIkkgWjUd4KL5Yp0GFzNnY/edit#).
> Will be watching it closely.
>
> In terms of the functional changes brought into the driver with 4.x the
> downgrading CL has always been a controversial feature, but the failover to
> remote DC being removed - I am curious to understand why?
>
> Thanks,
>
> Johnny
>
> On Thu, 29 Oct 2020 at 13:53, Joshua McKenzie 
> wrote:
>
>> That's an immense amount of incredibly useful feedback Johnny. Thanks for
>> taking the time and energy to write all this up.
>>
>> I work with some of the engineers who authored these changes in the
>> driver and have brought this thread to their attention. The authors have
>> offered the driver as a CEP donation to the C* project so we will have one
>> in tree which should give a clear path to fixing some of these API issues
>> as well as the loss of functionality on a major.
>>
>>
>> On Thu, Oct 29, 2020 at 8:37 AM, Johnny Miller 
>> wrote:
>>
>>> Hi Everybody,
>>>
>>>
>>> We wanted to reach out to the community around the v4 changes in the
>>> DataStax Java driver and gauge people's opinions on some of the changes.
>>> DataStax have done a tremendous job over the years on the Cassandra drivers
>>> and contributing to this community. However, we are currently struggling to
>>> adopt the latest driver due to these changes.
>>>
>>>
>>> We have been working on a project to upgrade an application from v3 to
>>> v4.9 of the driver and have encountered major changes between these
>>> versions.
>>>
>>>
>>> We have observed the latest version of the driver contains many more
>>> DataStax Enterprise (DSE) specific code, and this is not surprising as
>>> DataStax have been generous to build it for the Cassandra community.
>>>
>>>
>>> From our understanding, the DSE specific code must be included even if
>>> you are unable to use it or require it. For example, in CqlSessionBuilder
>>> class which is the main entry point into the driver,  there are APIs
>>> relating directly to DataStax Enterprise non-OSS functionality, their cloud
>>> DBaaS etc.. e.g.
>>>
>>>
>>> - withApplicationName (
>>> https://docs.datastax.com/en/drivers/java/4.9/com/datastax/oss/driver/api/core/session/SessionBuilder.html#withApplicationName-java.lang.String-
>>> )
>>>
>>> - withApplicationVersion (
>>> https://docs.datastax.com/en/drivers/java/4.9/com/datastax/oss/driver/api/core/session/SessionBuilder.html#withApplicationVersion-java.lang.String-
>>> )
>>>
>>> - withCloudProxyAddress (
>>> https://docs.datastax.com/en/drivers/java/4.9/com/datastax/oss/driver/api/core/session/SessionBuilder.html#withCloudProxyAddress-java.net.InetSocketAddress-
>>> )
>>>
>>> - withCloudProxyAddress (
>>> https://docs.datastax.com/en/drivers/java/4.9/com/datastax/oss/driver/api/core/session/SessionBuilder.html#withCloudSecureConnectBundle-java.io.InputStream-
>>> )
>>>
>>>
>>> plus more.
>>>
>>>
>>> All of these are sitting under the com.datastax.oss package - not the
>>> com.datastax.dse package.
>>>
>>>
>>> Additionally the reference.conf for the default driver settings contains
>>> a large number of DSE specific options:
>>>
>>>
>>> https://github.com/datastax/java-driver/blob/4.9.0/core/src/main/resources/reference.conf
>>>
>>>
>>> We would like to have seen this implemented in a subclass of the
>>> CqlSessionBuilder eg. DataStaxCqlSessionBuilder and the conf split into two
>>> separate config files.
>>>
>>>
>>> Additionally, the structure of the library is such that it is bundling
>>> all of the DSE driver code with the non-DSE driver code eg. graph, search
>>> etc. We would also like to have seen DataStax to have implemented it as
>>> separate libs and use a dependency on an OSS only lib in the datastax
>>> specific lib for the shared functionality.
>>>
>>>
>>> It would be great to be able to only take in the dependencies and code
>>> needed for Apache Cassandra and not the commercial products around it.
>>>
>>>
>>> However, the above observations are trivial compared to the two core
>>> features of 

Re: Consequences of dropping Materialized views

2020-02-18 Thread Jonathan Koppenhofer
Forensics are gone at this point, so I can't verify exact errors, but
wanted to mention we had seen something similar to corroborate your
experience and warn others.

The version would have been 3.0.15 or 3.11.3 as that is what we were
deploying on our clusters at the time. I think it was more likely 3.0.15.

So sorry for the "vagueness" :(

On Tue, Feb 18, 2020, 8:54 PM Surbhi Gupta  wrote:

> Jonathan,
> As per https://issues.apache.org/jira/browse/CASSANDRA-13696 the issue,
> Digest mismatch Exception if hints file has UnknownColumnFamily,  is fixed
> for 3.0.15 , did you still faced this issue on 3.0.15 ?
>
> Thanks
> Surbhi
>
> On Tue, 18 Feb 2020 at 17:40, Jonathan Koppenhofer 
> wrote:
>
>> I believe we had something similar happen on 3.0.15 a while back. We had
>> a cluster that created mass havoc by creating MVs on a large existing
>> dataset. We thought we had stabilized the cluster, but saw similar issues
>> as you when we dropped the MVs.
>>
>> We interpreted our errors to mean that we should not attempt to write to
>> base tables while also dropping downstream materialized views. We
>> essentially had the app stop their app, then drop the views 1 by 1 with
>> some pause between. That then seemed to work fine, but yes, be careful with
>> everything MVs.
>>
>> We now disallow the use of MVs globally.
>>
>> On Tue, Feb 18, 2020, 8:27 PM Surbhi Gupta 
>> wrote:
>>
>>> We are on cassandra 3.11 , we are using G1GC and using 16GB of heap.
>>>
>>> So we had to drop 7 MVs in production, as soon as we dropped the first
>>> Materialized View, our cluster became unstable and app started giving 100%
>>> error, what we noticed:
>>> 1. As soon as MV was dropped , cluster became unstable and nodes were
>>> showing down from each other.
>>> 2. We saw below warnings in system.log which is understood,
>>>  WARN [MessagingService-Incoming-/10.X.X.X] 2020-02-18 14:21:47,115
>>> IncomingTcpConnection.java:103 - UnknownColumnFamilyException reading from
>>> socket; closing
>>>
>>> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table 
>>> for cfId 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1. If a table was just created, 
>>> this is likely due to the schema not being fully propagated.  Please wait 
>>> for schema agreement on table creation.
>>>
>>> 3. We noticed below errors as well:
>>>
>>> ERROR [MutationStage-9] 2020-02-18 14:21:47,267 Keyspace.java:593 - 
>>> Attempting to mutate non-existant table 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1
>>>
>>> 4. We noticed messages like below:
>>>
>>> WARN  [BatchlogTasks:1] 2020-02-18 14:21:53,786 BatchlogManager.java:252 - 
>>> Skipped batch replay of a19c6480-5294-11ea-9e09-3948c59ad0f5 due to {}
>>>
>>> 5. Hints file corrupted:
>>>
>>> WARN  [HintsDispatcher:6737] 2020-02-18 14:22:24,932 HintsReader.java:237 - 
>>> Failed to read a hint for /10.X.X.X: f75e58e8-c255-4318-a553-06487b6bbe8c - 
>>> table with id 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1 is unknown in file 
>>> f75e58e8-c255-4318-a553-06487b6bbe8c-1582060925656-1.hints
>>> ERROR [HintsDispatcher:6737] 2020-02-18 14:22:24,933 
>>> HintsDispatchExecutor.java:243 - Failed to dispatch hints file 
>>> f75e58e8-c255-4318-a553-06487b6bbe8c-1582060925656-1.hints: file is 
>>> corrupted ({})
>>> org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch 
>>> exception
>>>
>>>
>>> 5. And then Cassandra shut down:
>>>
>>> ERROR [HintsDispatcher:6737] 2020-02-18 14:22:24,937 
>>> StorageService.java:430 - Stopping gossiper
>>> WARN  [HintsDispatcher:6737] 2020-02-18 14:22:24,937 
>>> StorageService.java:321 - Stopping gossip by operator request
>>> INFO  [HintsDispatcher:6737] 2020-02-18 14:22:24,937 Gossiper.java:1530 - 
>>> Announcing shutdown
>>>
>>>
>>> Any views? Below are the issues I
>>>
>>> https://support.datastax.com/hc/en-us/articles/36368126-Hints-file-with-unknown-CFID-can-cause-nodes-to-fail
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-13696
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6822
>>>
>>> https://support.datastax.com/hc/en-us/articles/36368126-Hints-file-with-unknown-CFID-can-cause-nodes-to-fail
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 12 Feb 2020 at 19:10, Surbhi Gupta 
>>> wrote:
>>>
>>>> Thanks Eric ...
&

Re: Consequences of dropping Materialized views

2020-02-18 Thread Jonathan Koppenhofer
I believe we had something similar happen on 3.0.15 a while back. We had a
cluster that created mass havoc by creating MVs on a large existing
dataset. We thought we had stabilized the cluster, but saw similar issues
as you when we dropped the MVs.

We interpreted our errors to mean that we should not attempt to write to
base tables while also dropping downstream materialized views. We
essentially had the app stop their app, then drop the views 1 by 1 with
some pause between. That then seemed to work fine, but yes, be careful with
everything MVs.

We now disallow the use of MVs globally.

On Tue, Feb 18, 2020, 8:27 PM Surbhi Gupta  wrote:

> We are on cassandra 3.11 , we are using G1GC and using 16GB of heap.
>
> So we had to drop 7 MVs in production, as soon as we dropped the first
> Materialized View, our cluster became unstable and app started giving 100%
> error, what we noticed:
> 1. As soon as MV was dropped , cluster became unstable and nodes were
> showing down from each other.
> 2. We saw below warnings in system.log which is understood,
>  WARN [MessagingService-Incoming-/10.X.X.X] 2020-02-18 14:21:47,115
> IncomingTcpConnection.java:103 - UnknownColumnFamilyException reading from
> socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1. If a table was just created, this 
> is likely due to the schema not being fully propagated.  Please wait for 
> schema agreement on table creation.
>
> 3. We noticed below errors as well:
>
> ERROR [MutationStage-9] 2020-02-18 14:21:47,267 Keyspace.java:593 - 
> Attempting to mutate non-existant table 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1
>
> 4. We noticed messages like below:
>
> WARN  [BatchlogTasks:1] 2020-02-18 14:21:53,786 BatchlogManager.java:252 - 
> Skipped batch replay of a19c6480-5294-11ea-9e09-3948c59ad0f5 due to {}
>
> 5. Hints file corrupted:
>
> WARN  [HintsDispatcher:6737] 2020-02-18 14:22:24,932 HintsReader.java:237 - 
> Failed to read a hint for /10.X.X.X: f75e58e8-c255-4318-a553-06487b6bbe8c - 
> table with id 7bb4b2d2-8622-11e8-a44b-a92f2dd93ff1 is unknown in file 
> f75e58e8-c255-4318-a553-06487b6bbe8c-1582060925656-1.hints
> ERROR [HintsDispatcher:6737] 2020-02-18 14:22:24,933 
> HintsDispatchExecutor.java:243 - Failed to dispatch hints file 
> f75e58e8-c255-4318-a553-06487b6bbe8c-1582060925656-1.hints: file is corrupted 
> ({})
> org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch 
> exception
>
>
> 5. And then Cassandra shut down:
>
> ERROR [HintsDispatcher:6737] 2020-02-18 14:22:24,937 StorageService.java:430 
> - Stopping gossiper
> WARN  [HintsDispatcher:6737] 2020-02-18 14:22:24,937 StorageService.java:321 
> - Stopping gossip by operator request
> INFO  [HintsDispatcher:6737] 2020-02-18 14:22:24,937 Gossiper.java:1530 - 
> Announcing shutdown
>
>
> Any views? Below are the issues I
>
> https://support.datastax.com/hc/en-us/articles/36368126-Hints-file-with-unknown-CFID-can-cause-nodes-to-fail
>
> https://issues.apache.org/jira/browse/CASSANDRA-13696
>
> https://issues.apache.org/jira/browse/CASSANDRA-6822
>
> https://support.datastax.com/hc/en-us/articles/36368126-Hints-file-with-unknown-CFID-can-cause-nodes-to-fail
>
>
>
>
>
> On Wed, 12 Feb 2020 at 19:10, Surbhi Gupta 
> wrote:
>
>> Thanks Eric ...
>> This is helpful...
>>
>>
>> On Wed, 12 Feb 2020 at 17:46, Erick Ramirez 
>> wrote:
>>
>>> There shouldn't be any negative impact from dropping MVs and there's
>>> certainly no risk to the base table if that is your concern. All it will do
>>> is remove all the data in the respective views plus drop any pending view
>>> mutations from the batch log. If anything, you should see some performance
>>> gain since updates to the base table will only trigger 4 view updates
>>> instead of the previous 11. Cheers!
>>>
>>> Erick Ramirez  |  Developer Relations
>>>
>>> erick.rami...@datastax.com | datastax.com 
>>> 
>>>  
>>>  
>>>
>>> 
>>>
>>>
>>>
>>> On Thu, 13 Feb 2020 at 04:26, Surbhi Gupta 
>>> wrote:
>>>
 Hi,

 So application team created 11 materialized views on a base table in
 production and we need to drop 7 Materialized views as they are not in use.
 Wanted to understand the impact of dropping the materialized views.
 We are on Cassandra 3.11.1 , multi datacenter with replication factor
 of 3 in each datacenter.
 We are using LOCAL_QUORUM for write consistency and LOCAL_ONE for read
 consistency.

 Any thoughts or suggestion to keep in mind before dropping the
 Materialized views.

 Thanks
 Surbhi






Re: Multiple C* instances on same machine

2019-09-20 Thread Jonathan Koppenhofer
We do this without containers quite successfully. As precaution, we...
- have dedicated disk per instance.
- have lots of network bandwidth, but also throttle throughout defaults.
Also monitor network closely
- share CPU completely. Limit Cassandra settings to limit CPU use
(concurrent threads, compaction throughput, etc) and monitor closely.
- have plenty of memory on top of jvm allocation
- never have 2 nodes from the same cluster on a single node.
- use VIPs so each instance gets its own IP address.
- use sane OS defaults as documented by Amy Tobey.

That said, using cgroups or containers would provide better isolation (but
worse bursting) when available.


On Fri, Sep 20, 2019, 8:42 PM Sandeep Nethi  wrote:

> Hi Nitan,
>
> You shouldn’t have any issues if you setup things properly.
>
> Few possible issues could be (can become a bottleneck)
>
> * CPU allocation (Instances can compete)
> * Disk throughput & IOPS &
> * Port allocations
> * Network throughout
> * Consistency issues.
>
> And we have work around for all above,
>
> * CPU: Use jvm file to limit number of CPU cores for each instance.
> * DISK: If possible allocate dedicated disks for each instance.
> * NETWORK & Ports:   Have a secondary NIC (or equivalent to num of
> instances). This will give you the flexibility to have Cassandra on same
> ports with better networking operations.
> * RACK: having multiple instances on one node can lead to consistency
> problems when a hosted node goes down for some reason with having RACK’s
> defined. So, this is very important to choose when going with this kind of
> setup.
>
> Hope this helps!
>
> Thanks,
> Sandeep.Nethi
>
>
>
> On Sat, 21 Sep 2019 at 12:20 PM, Nitan Kainth 
> wrote:
>
>> I am looking for possible issues doing this setup without containers.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Sep 20, 2019, at 5:22 PM, Elliott Sims  wrote:
>>
>> A container of some sort gives you better isolation and less risk of a
>> mistake that could cause the instances to conflict in some way.  Might be
>> better for balancing resources between them as well, though using cgroups
>> directly can also accomplish that.
>>
>> On Fri, Sep 20, 2019, 8:27 AM Nitan Kainth  wrote:
>>
>>> Hi There,
>>>
>>> Any feedback pros/cons for having multiple instances of C* on the same
>>> machine without Docker/container solution?
>>>
>>> The plan is to change the ports and run multiple C* processes, so we can
>>> isolate two applications as two different clusters.
>>>
>>


Re: Dse keypsaces in apache cluster

2019-07-24 Thread Jonathan Koppenhofer
To clarify... you have 2 datacenters with Datastax, and you want to expand
to a third DC with Opensource Cassandra? Is this a temporary migration
strategy? I would not want to run in this state for very long.

For Datastax, you should reach out to their support for questions. However,
speaking from experience, these are required and will be created upon
restart of DSE nodes. Even worse, they use custom replication strategies
(EverywhereStrategy), that will prevent open source from working, or the
nodes from even starting. You can update EverywhereStrategy to Network
topology strategy, but be forewarned it will reset back to
EverywhereStrategy when you simply restart a Datastax node.

Long story short, use Datastax support for info on their product, and be
very careful. Also note I don't have specific experience with your exact
scenario.

(My experience with Datastax is version 4.6-5.0)

On Wed, Jul 24, 2019, 10:34 AM Rahul Reddy  wrote:

> Hello,
>
> I have 2 data centers with dse Cassandra added new DC with apache
> Cassandra . Dse_perf,dse_systen,dse_security,dse_leases keypsaces created
> as well . Can we delete them ?
>


Re: AbstractLocalAwareExecutorService Exception During Upgrade

2019-06-05 Thread Jonathan Koppenhofer
Not sure about why repair is running, but we are also seeing the same
merkle tree issue in a mixed version cluster in which we have intentionally
started a repair against 2 upgraded DCs. We are currently researching, and
can post back if we find the issue, but also would appreciate if someone
has a suggestion. We have also run a local repair in an upgraded DC in this
same mixed version cluster without issue.

We are going 2.1.x to 3.0.x... and yes, we know you are not supposed to run
repairs in mixed version clusters, so don't do it :) this is kind of a
special circumstances where other things have gone wrong.

Thanks

On Wed, Jun 5, 2019, 5:23 PM shalom sagges  wrote:

> If anyone has any idea on what might cause this issue, it'd be great.
>
> I don't understand what could trigger this exception.
> But what I really can't understand is why repairs started to run suddenly
> :-\
> There's no cron job running, no active repair process, no Validation
> compactions, Reaper is turned off  I see repair running only in the
> logs.
>
> Thanks!
>
>
> On Wed, Jun 5, 2019 at 2:32 PM shalom sagges 
> wrote:
>
>> Hi All,
>>
>> I'm having a bad situation where after upgrading 2 nodes (binaries only)
>> from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows:
>>
>> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread
>> Thread[ReadStage-5,5,main]: {}
>> java.lang.ArrayIndexOutOfBoundsException: null
>>
>>
>> I also see errors on repairs but no repair is running at all. I verified
>> this with ps -ef command and nodetool compactionstats. The error I see is:
>> Failed creating a merkle tree for [repair
>> #a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], /1.2.3.4
>> (see log for details)
>>
>> I saw repair errors on data tables as well.
>> nodetool status shows all are UN and nodetool describecluster shows two
>> schema versions as expected.
>>
>>
>> After the warnings appeared, clients started to get timed out read/write
>> queries.
>> Restarting the 2 nodes solved the clients' connection issues, but the
>> warnings are still being generated in the logs.
>>
>> Did anyone encounter such an issue and knows what this means?
>>
>> Thanks!
>>
>>


Re: Sstableloader

2019-05-29 Thread Jonathan Koppenhofer
Has anyone tried to do a DC switch as a means to migrate from Datastax to
OSS? This would be the safest route as the ability to revert back to
Datastax is easy. However, I'm curious how the dse_system keyspace would be
replicated to OSS using their custom Everywhere strategy. You may have to
change the to Network topology strategy before firing up OSS nodes. Also,
keep in mind if you restart any DSE nodes, it will revert that keyspace
back to EverywhereStrategy.

I also posted a means to migrate in place on this mailing list a few months
back (thanks for help from others on the mailing list), but it is a little
more involved and risky. Let me know if you can't find it, and I'll dig it
up.

Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS
3.0 then up to 3.11.

On Wed, May 29, 2019, 5:56 PM Nitan Kainth  wrote:

> If cassandra version is same, it should work
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
>
> Hello,
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>
>


Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jonathan Koppenhofer
We do multiple nodes per host as a standard practice. In our case, we never
put 2 nodes from a single cluster  on the same host, though as mentioned
before, you could potentially get away with that if you properly use rack
awareness, just be careful of load.

We also do NOT use any other layer of segregation such as docker or VMs, we
just have multiple IPs per host, and bind each IP to a distinct node. We
have looked at VMs and Containers, but they either add abstraction
complexity or some kind of performance penalty.

As for system resources, we dedicate individual ssds for each node, but
CPU, memory, and network is shared. We are spoiled by good network and
beefy memory, so the only place we have to be careful is CPU. As such, we
pick fairly conservative Cassandra.yaml settings and monitor CPU usage. If
workloads get hot on a particular host, we have some flexibility to move
things around.

In any case, it sounds like you will be fine running 1 node per host. With
that many resources, be sure to tune you nodes to make use of them.

Good luck.

On Thu, Apr 18, 2019, 2:49 PM William R 
wrote:

> hi,
>
> Thank you for your answers, starting with the most important point from
> your answers I understand that
>
> "it is OK to go more than 1 TB in disk usage"
>
> so in this case if I am going to use the 50% of the disk capacity I will
> end up having around 3 TB per node which in this case I will not need to
> use a docker solution which is a very good usa case for us.
>
> The goal of my setup is to save large data volumes in every node (~ 3 TB -
> 50% usage of HD) with the current hardware that we possess. The high
> availability I consider it standard since we are going to have 2 DCs with
> RF3.
>
> I also have to note that Datastax also recommends usage no more than 500
> GB - 1 TB.
>
> Cheers,
>
> Vasilis
>
>
> Sent with ProtonMail  Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Thursday, April 18, 2019 6:56 PM, Jacques-Henri Berthemet <
> jacques-henri.berthe...@genesys.com> wrote:
>
> So how much data can you safely fit per node using SSDs with Cassandra
> 3.11? How much free space do you need on your disks?
>
> There should be some recommendations on node sizes on:
>
> http://cassandra.apache.org/doc/latest/operating/hardware.html
>
> Documentation - Apache Cassandra
> 
> cassandra.apache.org
> The Apache Cassandra database is the right choice when you need
> scalability and high availability without compromising performance. Linear
> scalability and proven fault-tolerance on commodity hardware or cloud
> infrastructure make it the perfect platform for mission-critical data.
> Cassandra's support for replicating across multiple datacenters is
> best-in-class, providing lower latency for your ...
>
>
> --
>
> *From:* Jon Haddad 
> *Sent:* Thursday, April 18, 2019 6:43:15 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] multiple Cassandra instances per server,
> possible?
>
> Agreed with Jeff here.  The whole "community recommends no more than
> 1TB" has been around, and inaccurate, for a long time.
>
> The biggest issue with dense nodes is how long it takes to replace
> them.  4.0 should help with that under certain circumstances.
>
>
> On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
> >
> > Agreed that you can go larger than 1T on ssd
> >
> > You can do this safely with both instances in the same cluster if you
> guarantee two replicas aren’t on the same machine. Cassandra provides a
> primitive to do this - rack awareness through the network topology snitch.
> >
> > The limitation (until 4.0) is that you’ll need two IPs per machine as
> both instances have to run in the same port.
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > On Apr 18, 2019, at 6:45 AM, Durity, Sean R 
> wrote:
> >
> > What is the data problem that you are trying to solve with Cassandra? Is
> it high availability? Low latency queries? Large data volumes? High
> concurrent users? I would design the solution to fit the problem(s) you are
> solving.
> >
> >
> >
> > For example, if high availability is the goal, I would be very cautious
> about 2 nodes/machine. If you need the full amount of the disk – you *can*
> have larger nodes than 1 TB. I agree that administration tasks (like
> adding/removing nodes, etc.) are more painful with large nodes – but not
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3
> TB of usable SSD disk.
> >
> >
> >
> > It is possible that your nodes might be under-utilized, especially at
> first. But if the hardware is already available, you have to use what you
> have.
> >
> >
> >
> > We have done multiple nodes on single physical hardware, but they were
> two separate clusters (for the same application). In that case, we had  a
> different install location and different ports for one of the clusters.
> >
> >
> >
> > Sean Durity
> >
> >
> >
> > 

Re: Five Questions for Cassandra Users

2019-03-28 Thread Jonathan Koppenhofer
I think it would also be interesting to hear how people are handling
automation (which to me is different than AI) and config management.

For us it is a combo of custom Java workflows and Saltstack.

On Thu, Mar 28, 2019, 5:03 AM Kenneth Brotman 
wrote:

> I’m looking to get a better feel for how people use Cassandra in
> practice.  I thought others would benefit as well so may I ask you the
> following five questions:
>
>
>
> 1.   Do the same people where you work operate the cluster and write
> the code to develop the application?
>
>
>
> 2.   Do you have a metrics stack that allows you to see graphs of
> various metrics with all the nodes displayed together?
>
>
>
> 3.   Do you have a log stack that allows you to see the logs for all
> the nodes together?
>
>
>
> 4.   Do you regularly repair your clusters - such as by using Reaper?
>
>
>
> 5.   Do you use artificial intelligence to help manage your clusters?
>
>
>
>
>
> Thank you for taking your time to share this information!
>
>
>
> Kenneth Brotman
>


Re: Five Questions for Cassandra Users

2019-03-28 Thread Jonathan Koppenhofer
1. PaaS model. We have a team responsible for the deployment and
self-service tooling, as well as SME for both development and Cassandra
operations. End users consume the service, and are responsible for app
development and operations. Larger apps have separate teams for this,
smaller apps have a single text for both

2. Homegrown with custom agent piping stats to a Cassandra cluster. Grafana
with custom http reader to read metrics from homegrown API. If it would
have existed when we first did this, we probably would have worked in
Prometheus.

3. Yes. ELK and/or Splunk

4. Used homegrown repair mechanism before 2.2. Now use reaper. PaaS
consumers responsible for configuring repairs.

5. No. Need to get better here, but "real" AI seems to be a.bew trend we
have seen talked about on this list.


On Thu, Mar 28, 2019, 5:03 AM Kenneth Brotman 
wrote:

> I’m looking to get a better feel for how people use Cassandra in
> practice.  I thought others would benefit as well so may I ask you the
> following five questions:
>
>
>
> 1.   Do the same people where you work operate the cluster and write
> the code to develop the application?
>
>
>
> 2.   Do you have a metrics stack that allows you to see graphs of
> various metrics with all the nodes displayed together?
>
>
>
> 3.   Do you have a log stack that allows you to see the logs for all
> the nodes together?
>
>
>
> 4.   Do you regularly repair your clusters - such as by using Reaper?
>
>
>
> 5.   Do you use artificial intelligence to help manage your clusters?
>
>
>
>
>
> Thank you for taking your time to share this information!
>
>
>
> Kenneth Brotman
>


Re: Migrating from DSE5.1.2 to Opensource cassandra

2018-12-06 Thread Jonathan Koppenhofer
Just to add a few additional notes on the in-place replacement.
* We had to remove system.local and system.peers
* Since we remove those system tables, you also have to put
replace_address_first_boot in cassandra-env with the same IP address.
* We also temporarily add the node as a seed to avoid the node from
bootstrapping
* Don't forget to switch your config back to "normal" after after the nodes
is back up and running
* Probably unrelated to this process, but even after drain when we
originally stopped the node, we noticed DSE did not cleanup the commitlogs
even though the logs said those files were drained. So we had to forcefully
remove commitlogs before bringing the node back up.

Finally... Be sure you test this pretty well. We did this on many clusters,
but your mileage may vary depending on version of DSE and the features you
use.



On Thu, Dec 6, 2018, 1:23 AM Brooke Thorley  Jonathan's high level process for in place conversion looks right.
>
> To answer your original question about versioning, DSE release notes lists
> the equivalent Cassandra version as 3.11.0.
>
> DataStax Enterprise 5.1.2 -
>
> DataStax Enterprise 5.1.10
>
> Apache Cassandra™ 3.11.0 (updated)
>
>
> Kind Regards,
> *Brooke Thorley*
> *VP Technical Operations & Customer Services*
> supp...@instaclustr.com | support.instaclustr.com
>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> Instaclustr values your privacy. Our privacy policy can be found at
> https://www.instaclustr.com/company/policies/privacy-policy
>
>
>
>
>
> On Thu, 6 Dec 2018 at 17:19, Dor Laor  wrote:
>
>> An alternative approach is to form another new cluster, leave the
>> original cluster alive (many times
>> it's a must since it needs to be 24x7 online). Double write to the two
>> clusters and later migrate the
>> data to it. Either by taking a snapshot and pass those files to the new
>> cluster or with sstableloader.
>> With this procedure, you'll need to have the same token range ownership.
>>
>> Another solution is to migrate using Spark which will full-table-scan. We
>> have generic code that
>> does it and we can open source it. This way the new cluster can be of any
>> size and speed is also good
>> with large amount of data (100s of TB). This process is also restartable
>> as it takes days to transfer such
>> amount of data.
>>
>> Good luck
>>
>> On Tue, Dec 4, 2018 at 9:04 PM dinesh.jo...@yahoo.com.INVALID
>>  wrote:
>>
>>> Thanks, nice summary of the overall process.
>>>
>>> Dinesh
>>>
>>>
>>> On Tuesday, December 4, 2018, 9:38:47 PM EST, Jonathan Koppenhofer <
>>> j...@koppedomain.com> wrote:
>>>
>>>
>>> Unfortunately, we found this to be a little tricky. We did migrations
>>> from DSE 4.8 and 5.0 to OSS 3.0.x, so you may run into additional issues. I
>>> will also say your best option may be to install a fresh cluster and stream
>>> the data. This wasn't feasible for us at the size and scale in the time
>>> frames and infrastructure restrictions we had. I will have to review my
>>> notes for more detail, but off the top of my head, for an in place
>>> migration...
>>>
>>> Pre-upgrade
>>> * Be sure you are not using any Enterprise features like Search or
>>> Graph. Not only are there not equivalent features in open source, but
>>> theses features require proprietary classes to be in the classpath, or
>>> Cassandra will not even start up.
>>> * By default, I think DSE uses their own custom authenticators,
>>> authorizors, and such. Make sure what you are doing has an open source
>>> equivalent.
>>> * The DSE system keyapaces use custom replication strategies. Convert
>>> these to NTS before upgrade.
>>> * Otherwise, follow the same processes you would do before an upgrade
>>> (repair, snapshot, etc)
>>>
>>> Upgrade
>>> * The easy part is just replacing the binaries as you would in normal
>>> up

Re: Migrating from DSE5.1.2 to Opensource cassandra

2018-12-04 Thread Jonathan Koppenhofer
Unfortunately, we found this to be a little tricky. We did migrations from
DSE 4.8 and 5.0 to OSS 3.0.x, so you may run into additional issues. I will
also say your best option may be to install a fresh cluster and stream the
data. This wasn't feasible for us at the size and scale in the time frames
and infrastructure restrictions we had. I will have to review my notes for
more detail, but off the top of my head, for an in place migration...

Pre-upgrade
* Be sure you are not using any Enterprise features like Search or Graph.
Not only are there not equivalent features in open source, but theses
features require proprietary classes to be in the classpath, or Cassandra
will not even start up.
* By default, I think DSE uses their own custom authenticators,
authorizors, and such. Make sure what you are doing has an open source
equivalent.
* The DSE system keyapaces use custom replication strategies. Convert these
to NTS before upgrade.
* Otherwise, follow the same processes you would do before an upgrade
(repair, snapshot, etc)

Upgrade
* The easy part is just replacing the binaries as you would in normal
upgrade. Drain and stop the existing node first. You can also do this same
process in a rolling fashion to maintain availability. In our case, we were
doing an in-place upgrade and reusing the same IPs
* DSE unfortunately creates a custom column in a system table that requires
you to remove one (or more) system tables (peers?) to be able to start the
node. You delete these system tables by  removing the sstbles on disk while
the node is down. This is a bit of a headache if using vnodes. As we are
using vnodes, it required us to manually specify num tokens, and the
specific tokens the node was responsible for in Cassandra.yaml. You have to
do this before you start the node. If not using vnodes, this is simpler,
but we used vnodes. Again, I'll double check my notes. Once the node is up,
you can revert to your normal vnodes/num tokens settings.

Post upgrade:
* Drop DSE system tables.

I'll revert with more detail if needed.

On Tue, Dec 4, 2018, 5:46 PM Nandakishore Tokala <
nandakishore.tok...@gmail.com wrote:

> HI All,
>
> we are migrating from DSE to open source Cassandra. if anyone has recently
> migrated, Can you please share their experience, steps you followed and
> challenges you guys faced.
>
> we want to migrate to the same computable version in open source, can you
> give us version number(even with the minor version) for DSE 5.1.2
>
> 5.1 DSE production-certified 3.10 + enhancements 3.4 + enhancements big m
>
> --
> Thanks & Regards,
> Nanda Kishore
>