RE: Check out new features in K8ssandra and Mission Control

2024-02-28 Thread Durity, Sean R via user
The k8ssandra requirement is a major blocker.


Sean R. Durity


INTERNAL USE
From: Christopher Bradford 
Sent: Tuesday, February 27, 2024 9:49 PM
To: user@cassandra.apache.org
Cc: Christopher Bradford 
Subject: [EXTERNAL] Re: Check out new features in K8ssandra and Mission Control

Hey Jon, * What aspects of Mission Control are dependent on using K8ssandra? 
Mission Control bundles in K8ssandra for the core automation workflows 
(lifecycle management, cluster operations, medusa &. reaper). In fact we 
include the K8ssandraSpec

Hey Jon,

* What aspects of Mission Control are dependent on using K8ssandra?

Mission Control bundles in K8ssandra for the core automation workflows 
(lifecycle management, cluster operations, medusa &. reaper). In fact we 
include the K8ssandraSpec in the top-level MissionControlCluster resource 
verbatim.

 * Can Mission Control work without K8ssandra?

Not at this time, K8ssandra powers a significant portion of the C* side of the 
stack. Mission Control provides additional functionality (web interface, 
certificate coordination, observability stack, etc) and applies some 
conventions to how K8ssandra objects are created / templated out, but the 
actually K8ssandra operator present in MC is the same one available via the 
Helm charts.

* Is mission control open source?

Not at this time. While the majority of the Kubernetes operators are open 
source as part of K8ssandra, there are some pieces which are closed source. I 
expect some of the components may move from closed source into K8ssandra over 
time.

* I'm not familiar with Vector - does it require an agent?

Vector 
[vector.dev]
 is a pretty neat project. We run a few of their components as part of the 
stack. There is a DaemonSet which runs on each worker to collect host level 
metrics and scrape logs being emitted by containers, a sidecar for collecting 
logs from the C* container, and an aggregator which performs some filtering and 
transformation before pushing to an object store.

* Is Reaper deployed separately or integrated in?

Reaper is deployed as part of the cluster creation workflow. It is spun up and 
configured to connect to the cluster automatically.

~Chris

Christopher Bradford



On Tue, Feb 27, 2024 at 6:55 PM Jon Haddad 
mailto:j...@jonhaddad.com>> wrote:
Hey Chris - this looks pretty interesting!  It looks like there's a lot of 
functionality in here.

* What aspects of Mission Control are dependent on using K8ssandra?
* Can Mission Control work without K8ssandra?
* Is mission control open source?
* I'm not familiar with Vector - does it require an agent?
* Is Reaper deployed separately or integrated in?

Thanks!  Looking forward to trying this out.
Jon


On Tue, Feb 27, 2024 at 7:07 AM Christopher Bradford 
mailto:bradfor...@gmail.com>> wrote:

Hey C* folks,


I'm excited to share that the DataStax team has just released Mission Control 
[datastax.com],
 a new operations platform for running Apache Cassandra and DataStax 
Enterprise. Built around the open source core of K8ssandra 
[k8ssandra.io]
 we've been hard at work expanding multi-region capabilities. If you haven't 
seen some of the new features coming in here are some highlights:


  *   Management API support in Reaper - no more JMX credentials, YAY
  *   Additional support for TLS across the stack- including operator to node, 
Reaper to management API, etc
  *   Updated metrics pipeline - removal of collectd from nodes, Vector for 
monitoring log files (goodbye tail -f)
  *   Deterministic node selection for cluster operations
  *   Top-level management tasks in the control plane (no more forced 
connections to data planes to trigger a restart)


On top of this Mission Control offers:


  *   A single web-interface to monitor and manage your clusters wherever 
they're deployed
  *   Automatic management of internode and operator to node certificates - 
this includes integration with third party CAs and rotation of all 
certificates, keys, and various Java stores
  *   Centralized metrics and logs aggregation, querying and storage with the 
capability to split the pipeline allowing for exporting of streams to other 
observability tools within your environment
  *   Per-node configuration (this is an edge case, but still something we 
wanted to make possible)


While building our Mission Control, K8ssandra has seen a number of releases 
with quite a few contributions from the community. From Helm chart updates to 
operator tweaks we want to send out a huge THAN

RE: Big Data Question

2023-08-18 Thread Durity, Sean R via user
Cost of availability is a fair question at some level of the discussion. In my 
experience, high availability is one of the top 2 or 3 reasons why Cassandra is 
chosen as the data solution. So, if I am given a Cassandra use case to build 
out, I would normally assume high availability is needed, even in a single data 
center scenario. Otherwise, there are other data options.


Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra



INTERNAL USE
From: daemeon reiydelle 
Sent: Thursday, August 17, 2023 7:38 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Big Data Question

I started to respond, then realized I and the other OP posters are not thinking 
the same: What is the business case for availability, data 
los/reload/recoverability? You all argue for higher availability and damn the 
cost. But noone asked "can

I started to respond, then realized I and the other OP posters are not thinking 
the same: What is the business case for availability, data 
los/reload/recoverability? You all argue for higher availability and damn the 
cost. But noone asked "can you lose access, for 20 minutes, to a portion of the 
data, 10 times a year, on a 250 node cluster in AWS, if it is not lost"? Can 
you lose access 1-2 times a year for the cost of a 500 node cluster holding the 
same data?

Then we can discuss 32/64g JVM and SSD's.
.
Arthur C. Clarke famously said that "technology sufficiently advanced is 
indistinguishable from magic." Magic is coming, and it's coming for all of 
us

Daemeon Reiydelle
email: daeme...@gmail.com
LI: https://www.linkedin.com/in/daemeonreiydelle/ 
[linkedin.com]
San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle


On Thu, Aug 17, 2023 at 1:53 PM Joe Obernberger 
mailto:joseph.obernber...@gmail.com>> wrote:
Was assuming reaper did incremental?  That was probably a bad assumption.

nodetool repair -pr
I know it well now!

:)

-Joe

On 8/17/2023 4:47 PM, Bowen Song via user wrote:
> I don't have experience with Cassandra on Kubernetes, so I can't
> comment on that.
>
> For repairs, may I interest you with incremental repairs? It will make
> repairs hell of a lot faster. Of course, occasional full repair is
> still needed, but that's another story.
>
>
> On 17/08/2023 21:36, Joe Obernberger wrote:
>> Thank you.  Enjoying this conversation.
>> Agree on blade servers, where each blade has a small number of SSDs.
>> Yeh/Nah to a kubernetes approach assuming fast persistent storage?  I
>> think that might be easier to manage.
>>
>> In my current benchmarks, the performance is excellent, but the
>> repairs are painful.  I come from the Hadoop world where it was all
>> about large servers with lots of disk.
>> Relatively small number of tables, but some have a high number of
>> rows, 10bil + - we use spark to run across all the data.
>>
>> -Joe
>>
>> On 8/17/2023 12:13 PM, Bowen Song via user wrote:
>>> The optimal node size largely depends on the table schema and
>>> read/write pattern. In some cases 500 GB per node is too large, but
>>> in some other cases 10TB per node works totally fine. It's hard to
>>> estimate that without benchmarking.
>>>
>>> Again, just pointing out the obvious, you did not count the off-heap
>>> memory and page cache. 1TB of RAM for 24GB heap * 40 instances is
>>> definitely not enough. You'll most likely need between 1.5 and 2 TB
>>> memory for 40x 24GB heap nodes. You may be better off with blade
>>> servers than single server with gigantic memory and disk sizes.
>>>
>>>
>>> On 17/08/2023 15:46, Joe Obernberger wrote:
 Thanks for this - yeah - duh - forgot about replication in my example!
 So - is 2TBytes per Cassandra instance advisable?  Better to use
 more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so
 assume 80Tbytes per server, you could do:
 (1024*3)/80 = 39 servers, but you'd have to run 40 instances of
 Cassandra on each server; maybe 24G of heap per instance, so a
 server with 1TByte of RAM would work.
 Is this what folks would do?

 -Joe

 On 8/17/2023 9:13 AM, Bowen Song via user wrote:
> Just pointing out the obvious, for 1PB of data on nodes with 2TB
> disk each, you will need far more than 500 nodes.
>
> 1, it is unwise to run Cassandra with replication factor 1. It
> usually makes sense to use RF=3, so 1PB data will cost 3PB of
> storage space, minimal of 1500 such nodes.
>
> 2, depending on the compaction strategy you use and the write
> access pattern, there's a disk space amplification to consider.
> For example, with STCS, the disk usage can be many times of the
> actual live data size.
>
> 3, you will need some extra free disk space as temporary space for
> running compactions.
>
> 4, the data 

RE: Big Data Question

2023-08-17 Thread Durity, Sean R via user
For a variety of reasons, we have clusters with 5 TB of disk per host as a 
“standard.” In our larger data clusters, it does take longer to add/remove 
nodes or do things like upgradesstables after an upgrade. These nodes have 3+TB 
of actual data on the drive. But, we were able to shrink the node count from 
our days of using 1 or 2 TB of disk. Lots of potential cost tradeoffs to 
consider – licensing/support, server cost, maintenance time, more or less 
servers to have failures, number of (expensive?!) switch ports used, etc.

NOTE: this is 3.x experience, not 4.x with faster streaming.

Sean R. Durity



INTERNAL USE
From: Joe Obernberger 
Sent: Thursday, August 17, 2023 10:46 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Big Data Question

Thanks for this - yeah - duh - forgot about replication in my example! So - is 
2TBytes per Cassandra instance advisable?  Better to use more/less?  Modern 2u 
servers can be had with 24 3. 8TBtyte SSDs; so assume 80Tbytes per server, you 
could


Thanks for this - yeah - duh - forgot about replication in my example!

So - is 2TBytes per Cassandra instance advisable?  Better to use

more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so

assume 80Tbytes per server, you could do:

(1024*3)/80 = 39 servers, but you'd have to run 40 instances of

Cassandra on each server; maybe 24G of heap per instance, so a server

with 1TByte of RAM would work.

Is this what folks would do?



-Joe



On 8/17/2023 9:13 AM, Bowen Song via user wrote:

> Just pointing out the obvious, for 1PB of data on nodes with 2TB disk

> each, you will need far more than 500 nodes.

>

> 1, it is unwise to run Cassandra with replication factor 1. It usually

> makes sense to use RF=3, so 1PB data will cost 3PB of storage space,

> minimal of 1500 such nodes.

>

> 2, depending on the compaction strategy you use and the write access

> pattern, there's a disk space amplification to consider. For example,

> with STCS, the disk usage can be many times of the actual live data size.

>

> 3, you will need some extra free disk space as temporary space for

> running compactions.

>

> 4, the data is rarely going to be perfectly evenly distributed among

> all nodes, and you need to take that into consideration and size the

> nodes based on the node with the most data.

>

> 5, enough of bad news, here's a good one. Compression will save you (a

> lot) of disk space!

>

> With all the above considered, you probably will end up with a lot

> more than the 500 nodes you initially thought. Your choice of

> compaction strategy and compression ratio can dramatically affect this

> calculation.

>

>

> On 16/08/2023 16:33, Joe Obernberger wrote:

>> General question on how to configure Cassandra.  Say I have 1PByte of

>> data to store.  The general rule of thumb is that each node (or at

>> least instance of Cassandra) shouldn't handle more than 2TBytes of

>> disk.  That means 500 instances of Cassandra.

>>

>> Assuming you have very fast persistent storage (such as a NetApp,

>> PorterWorx etc.), would using Kubernetes or some orchestration layer

>> to handle those nodes be a viable approach? Perhaps the worker nodes

>> would have enough RAM to run 4 instances (pods) of Cassandra, you

>> would need 125 servers.

>> Another approach is to build your servers with 5 (or more) SSD

>> devices - one for OS, four for each instance of Cassandra running on

>> that server.  Then build some scripts/ansible/puppet that would

>> manage Cassandra start/stops, and other maintenance items.

>>

>> Where I think this runs into problems is with repairs, or

>> sstablescrubs that can take days to run on a single instance. How is

>> that handled 'in the real world'?  With seed nodes, how many would

>> you have in such a configuration?

>> Thanks for any thoughts!

>>

>> -Joe

>>

>>



--

This email has been checked for viruses by AVG antivirus software.

https://urldefense.com/v3/__http://www.avg.com__;!!M-nmYVHPHQ!JNgRIPkjVYoJBn7hBrUMEUxlXhoB0f9NUYIcGYPiexUZA5rpWWgPiLJp37dwGzdXMyMVIJJn0hzkcljb0wokF_RwMJ_g6KRPXA$



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for 

RE: Cassandra p95 latencies

2023-08-11 Thread Durity, Sean R via user
I would expect single digit ms latency on reads and writes. However, we have 
not done any performance testing on Apache Cassandra 4.x.

Sean R. Durity


INTERNAL USE
From: Shaurya Gupta 
Sent: Friday, August 11, 2023 1:16 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra p95 latencies

The queries are rightly designed as I already explained. 40 ms is way too high 
as compared to what I seen with other DBs and many a times with Cassandra 3. x 
versions. CPU consumed as I mentioned is not high, it is around 20%. On Thu, 
Aug 10,

The queries are rightly designed as I already explained. 40 ms is way too high 
as compared to what I seen with other DBs and many a times with Cassandra 3.x 
versions.
CPU consumed as I mentioned is not high, it is around 20%.

On Thu, Aug 10, 2023 at 5:14 PM MyWorld 
mailto:timeplus.1...@gmail.com>> wrote:
Hi,
P95 should not be a problem if rightly designed. Levelled compaction strategy 
further reduces this, however it consume some resources. For read, caching is 
also helpful.
Can you check your cpu iowait as it could be the reason for delay

Regards,
Ashish

On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta, 
mailto:shaurya.n...@gmail.com>> wrote:
Hi community

What is the expected P95 latency for Cassandra Read and Write queries executed 
with Local_Quorum over a table with 3 replicas ? The queries are done using the 
partition + clustering key and row size in bytes is not too much, maybe 1-2 KB 
maximum.
Assuming CPU is not a crunch ?

We observe those to be 40 ms P95 Reads and same for Writes. This looks very 
high as compared to what we expected. We are using Cassandra 4.0.

Any documentation / numbers will be helpful.

Thanks
--
Shaurya Gupta



--
Shaurya Gupta




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Survey about the parsing of the tooling's output

2023-07-10 Thread Durity, Sean R via user
We also parse the output from nodetool info and nodetool status and (to a 
lesser degree) nodetool netstats. We have basically made info and status more 
operator-friendly in a multi-cluster environment. (And we added a useable 
return value to our info command that we can use to evaluate the node’s 
health.) While changes to the output wouldn’t be significantly difficult to 
adapt, there is the cost multiplier of deploying to hundreds of nodes across 
multiple clusters and all the testing and approvals that are required. I would 
agree with “only on major releases” as a rule to follow.

Zero desire to get JSON or YAML outputs – no, thank you. CQL/virtual tables is 
a good, additional goal. Other databases have had this kind of feature for a 
long time.

Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra



INTERNAL USE
From: Bowen Song via user 
Sent: Monday, July 10, 2023 7:25 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Survey about the parsing of the tooling's output

We parse the output of the following nodetool sub-commands in our custom 
scripts: status netstats tpstats ring We don't mind the output format change 
between major releases as long as all the following are true: major releases 
are not too frequent


We parse the output of the following nodetool sub-commands in our custom 
scripts:

  *   status
  *   netstats
  *   tpstats
  *   ring

We don't mind the output format change between major releases as long as all 
the following are true:

  1.  major releases are not too frequent
e.g. no more frequent than once every couple of years
  2.  the changes are clearly documented in the CHANGES.txt and mentioned in 
the NEWS.txt
e.g. clearly specify that "someStatistic:" in "nodetool somecommand" is renamed 
to "Some Statistic:"
  3.  the functionality is not lost
e.g. remove a value from the output with no obvious alternative
  4.  it doesn't become a lot harder to parse
e.g. split a value into multiple values with different units, and the new 
values need to be added up together to get the original one

We have Ansible palybooks, shell scripts, Python scripts, etc. parsing the 
output, and to my best knowledge, all of them are trivial to rework for minor 
cosmetic changes like the one given in the example.

Parsing JSON or YAML in vanilla POSIX shell (i.e. without tools such as jq 
installed) can be much harder, we would rather not to have to deal with that. 
For Ansible and Python script, it's a nonissue, but given the fact that we are 
already parsing the default output and it works fine, we are unlikely to change 
them to use JSON or YAML instead, unless the pain of dealing with breaking 
changes is too much and too often.

Querying via CQL is harder, and we would rather not to do that for the reasons 
below:

  *   it requires Cassandra credentials, instead the credential-less nodetool 
command on localhost
  *   for shell scripts, the cqlsh command output is harder to parse than the 
nodetool command, because its output is a human-friendly table with header, 
dynamic indentations, field separators, etc., which makes it a less attractive 
candidate than the nodetool
  *   for Ansible and Python scripts, using the CQL interface will require 
extra modules/libraries. The extra installation steps required make the scripts 
themselves less portable between different servers/environment, so we may still 
prefer the more portable nodetool approach where the localhost access is 
possible


On 10/07/2023 10:35, Miklosovic, Stefan wrote:

Hi Cassandra users,



I am a Cassandra developer and we in Cassandra project would love to know if 
there are users out there for whom the output of the tooling, like, nodetool, 
is important when it comes to parsing it.



We are elaborating on the consequences when nodetool's output for various 
commands is changed - we are not completely sure if users are parsing this 
output in some manner in their custom scripts so us changing the output would 
break their scripts which are parsing it.



Additionally, how big of a problem the output change would be if it was 
happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 
only. In other words, there would be a guarantee that no breaking changes in 
minor versions would ever occur. Only in majors.



Is somebody out there who is relying on the output of some particular nodetool 
commands (or any command in tools/bin) in production? How often do you rely on 
the parsing of nodetool's output and how much work it would be for you to 
rework some minor changes? For example, when the tool output prints 
"someStatistic: 10" and we would rework it to "Some Statistic: 10".



Would you be OK if the output changed but you would have a way how to get e.g. 
JSON or YAML output instead by some flag on nodetool command so it would be 
irrelevant what the default output would be?



It would be appreciated a lot if you gave us more feedback on this. I 
understand that not all questi

RE: Is cleanup is required if cluster topology changes

2023-05-05 Thread Durity, Sean R via user
I run clean-up in parallel, not serially, since it is a node-only kind of 
operation. And I only run in the impacted DC. With only 300 GB on a node, 
clean-up should not take very long. Check your compactionthroughput.

I ran clean-up in parallel on 53 nodes with over 3 TB of data each. It took 
like 6-8 hours. (And many nodes were done much earlier than that.) I restrict 
clean-up to one compactionthread, but I double the compactionthroughput for the 
duration of the cleanup. This protects against two large sstables being 
compacted at the same time and running out of disk space.

Sean Durity
From: manish khandelwal 
Sent: Friday, May 5, 2023 4:52 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Is cleanup is required if cluster topology changes

You can replace the node directly why to add a node and decommission the 
another node. Just replace the node with the new node and your topology remains 
the same so no need to run the cleanup . On Fri, May 5, 2023 at 10: 26 AM 
Jaydeep Chovatia

You can replace the node directly why to add a node and decommission the 
another node. Just replace the node with the new node and your topology remains 
the same so no need to run the cleanup .

On Fri, May 5, 2023 at 10:26 AM Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:

We use STCS, and our experience with cleanup is that it takes a long time to 
run in a 100-node cluster. We would like to replace one node every day for 
various purposes in our fleet.

If we run cleanup after each node replacement, then it might take, say, 15 days 
to complete, and that hinders our node replacement frequency.

Do you see any other options?

Jaydeep

On Thu, May 4, 2023 at 9:47 PM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
You should 100% trigger cleanup each time or you’ll almost certainly resurrect 
data sooner or later
If you’re using leveled compaction it’s especially cheap. Stcs and twcs are 
worse, but if you’re really scaling that often, I’d be considering lcs and 
running cleanup just before or just after each scaling


On May 4, 2023, at 9:25 PM, Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:

Thanks, Jeff!
But in our environment we replace nodes quite often for various optimization 
purposes, etc. say, almost 1 node per day (node addition followed by node 
decommission, which of course changes the topology), and we have a cluster of 
size 100 nodes with 300GB per node. If we have to run cleanup on 100 nodes 
after every replacement, then it could take forever.
What is the recommendation until we get this fixed in Cassandra itself as part 
of compaction (w/o externally triggering cleanup)?

Jaydeep

On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
Cleanup is fast and cheap and basically a no-op if you haven’t changed the ring
After cassandra has transactional cluster metadata to make ring changes 
strongly consistent, cassandra should do this in every compaction. But until 
then it’s left for operators to run when they’re sure the state of the ring is 
correct .




On May 4, 2023, at 7:41 PM, Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:

Isn't this considered a kind of bug in Cassandra because as we know cleanup is 
a lengthy and unreliable operation, so relying on the cleanup means higher 
chances of data resurrection?
Do you think we should discard the unowned token-ranges as part of the regular 
compaction itself? What are the pitfalls of doing this as part of compaction 
itself?

Jaydeep

On Thu, May 4, 2023 at 7:25 PM guo Maxwell 
mailto:cclive1...@gmail.com>> wrote:
compact ion will just merge duplicate data and remove delete data in this node 
.if you add or remove one node for the cluster, I think clean up is needed. if 
clean up failed, I think we should come to see the reason.

Runtian Liu mailto:curly...@gmail.com>> 于2023年5月5日周五 
06:37写道:
Hi all,

Is cleanup the sole method to remove data that does not belong to a specific 
node? In a cluster, where nodes are added or decommissioned from time to time, 
failure to run cleanup may lead to data resurrection issues, as deleted data 
may remain on the node that lost ownership of certain partitions. Or is it true 
that normal compactions can also handle data removal for nodes that no longer 
have ownership of certain data?

Thanks,
Runtian


--
you are the apple of my eye !



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home D

RE: Cleanup

2023-02-17 Thread Durity, Sean R via user
Cleanup, by itself, uses all the compactors available. So, it is important to 
see if you have the disk space for multiple large cleanup compactions running 
at the same time. We have a utility to do cleanup more intelligently – it 
temporarily doubles compaction throughput, operates on a single keyspace, sorts 
by table size ascending, and runs only 1 thread (-j 1) at a time to protect 
against the multiple large compactions at the same time issue. It also verifies 
that there is enough disk space to handle the largest sstable for the table 
about to be cleaned up.

It works very well in the use cases where we have a stair step arrangement of 
table sizes. We recover space from smaller tables and work up to the largest 
ones with whatever extra space we have acquired.


Sean R. Durity

From: Dipan Shah 
Sent: Friday, February 17, 2023 2:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cleanup

Hi Marc, Changes done using "nodetool setcompactionthroughput" will only be 
applicable till Cassandra service restart. The throughput value will revert 
back to the settings inside cassandra. yaml post service restart. On Fri, Feb 
17,

Hi Marc,

Changes done using "nodetool setcompactionthroughput" will only be applicable 
till Cassandra service restart.

The throughput value will revert back to the settings inside cassandra.yaml 
post service restart.

On Fri, Feb 17, 2023 at 1:04 PM Marc Hoppins 
mailto:marc.hopp...@eset.com>> wrote:
…and if it is altered via nodetool, is it altered until manually changed or 
service restart, so must be manually put pack?



INTERNAL USE
From: Aaron Ploetz mailto:aaronplo...@gmail.com>>
Sent: Thursday, February 16, 2023 4:50 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Cleanup

EXTERNAL
So if I remember right, setting compaction_throughput_per_mb to zero 
effectively disables throttling, which means cleanup and compaction will run as 
fast as the instance will allow.  For normal use, I'd recommend capping that at 
8 or 16.

Aaron


On Thu, Feb 16, 2023 at 9:43 AM Marc Hoppins 
mailto:marc.hopp...@eset.com>> wrote:
Compaction_throughtput_per_mb is 0 in cassandra.yaml. Is setting it in nodetool 
going to provide any increase?

From: Durity, Sean R via user 
mailto:user@cassandra.apache.org>>
Sent: Thursday, February 16, 2023 4:20 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Cleanup

EXTERNAL
Clean-up is constrained/throttled by compactionthroughput. If your system can 
handle it, you can increase that throughput (nodetool setcompactionthroughput) 
for the clean-up in order to reduce the total time.

It is a node-isolated operation, not cluster-involved. I often run clean up on 
all nodes in a DC at the same time. Think of it as compaction and consider your 
cluster performance/workload/timelines accordingly.

Sean R. Durity

From: manish khandelwal 
mailto:manishkhandelwa...@gmail.com>>
Sent: Thursday, February 16, 2023 5:05 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Cleanup

There is no advantage of running cleanup if no new nodes are introduced. So 
cleanup time should remain same when adding new nodes. Cleanup is a local to 
node so network bandwidth should have no effect on reducing cleanup time. Dont 
ignore cleanup

There is no advantage of running cleanup if no new nodes are introduced. So 
cleanup time should remain same when adding new nodes.

 Cleanup is a local to node so network bandwidth should have no effect on 
reducing cleanup time.

 Dont ignore cleanup as it can cause you disks occupied without any use.

 You should plan to run cleanup in a lean period (low traffic). Also you can 
use suboptions of keyspace and table names to plan it such a way that I/O 
pressure is not much.


Regards
Manish

On Thu, Feb 16, 2023 at 3:12 PM Marc Hoppins 
mailto:marc.hopp...@eset.com>> wrote:
Hulloa all,

I read a thing re. adding new nodes where the recommendation was to run cleanup 
on the nodes after adding a new node to remove redundant token ranges.

I timed this way back when we only had ~20G of data per node and it took 
approx. 5 mins per node.  After adding a node on Tuesday, I figured I’d run 
cleanup.

Per node, it is taking 6+ hours now as we have 2-2.5T per node.

Should we be running cleanup regularly regardless of whether or not new nodes 
have been added?  Would it reduce cleanup times for when we do add new nodes?
If we double the network bandwidth can we effectively reduce this lengthy 
cleanup?
Maybe just ignore cleanup entirely?
I appreciate that cleanup will increase the load but running cleanup on one 
node at a time seems impractical.  How many simultaneous nodes (per rack) 
should we limit cleanup to?

More experienced suggestions would be most appreciated.

Marc


INTERNAL USE


--

Thanks,

Dipan Shah

Data Engineer

[cid:~WRD.jpg]




RE: Cleanup

2023-02-16 Thread Durity, Sean R via user
Clean-up is constrained/throttled by compactionthroughput. If your system can 
handle it, you can increase that throughput (nodetool setcompactionthroughput) 
for the clean-up in order to reduce the total time.

It is a node-isolated operation, not cluster-involved. I often run clean up on 
all nodes in a DC at the same time. Think of it as compaction and consider your 
cluster performance/workload/timelines accordingly.

Sean R. Durity

From: manish khandelwal 
Sent: Thursday, February 16, 2023 5:05 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cleanup

There is no advantage of running cleanup if no new nodes are introduced. So 
cleanup time should remain same when adding new nodes. Cleanup is a local to 
node so network bandwidth should have no effect on reducing cleanup time. Dont 
ignore cleanup

There is no advantage of running cleanup if no new nodes are introduced. So 
cleanup time should remain same when adding new nodes.

 Cleanup is a local to node so network bandwidth should have no effect on 
reducing cleanup time.

 Dont ignore cleanup as it can cause you disks occupied without any use.

 You should plan to run cleanup in a lean period (low traffic). Also you can 
use suboptions of keyspace and table names to plan it such a way that I/O 
pressure is not much.


Regards
Manish

On Thu, Feb 16, 2023 at 3:12 PM Marc Hoppins 
mailto:marc.hopp...@eset.com>> wrote:
Hulloa all,

I read a thing re. adding new nodes where the recommendation was to run cleanup 
on the nodes after adding a new node to remove redundant token ranges.

I timed this way back when we only had ~20G of data per node and it took 
approx. 5 mins per node.  After adding a node on Tuesday, I figured I’d run 
cleanup.

Per node, it is taking 6+ hours now as we have 2-2.5T per node.

Should we be running cleanup regularly regardless of whether or not new nodes 
have been added?  Would it reduce cleanup times for when we do add new nodes?
If we double the network bandwidth can we effectively reduce this lengthy 
cleanup?
Maybe just ignore cleanup entirely?
I appreciate that cleanup will increase the load but running cleanup on one 
node at a time seems impractical.  How many simultaneous nodes (per rack) 
should we limit cleanup to?

More experienced suggestions would be most appreciated.

Marc


INTERNAL USE


RE: Startup fails - 4.1.0

2023-02-03 Thread Durity, Sean R via user
In most cases, I would delete the corrupt commit log file and restart. Then run 
repairs on that node. I have seen cases where multiple files are corrupted and 
it is easier to remove all commit log files to get the node restarted.

Sean R. Durity
From: Joe Obernberger 
Sent: Friday, February 3, 2023 3:15 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Startup fails - 4.1.0

Hi all - cluster had a power outage and one of the nodes in a 14 nodes cluster 
isn't starting with: DEBUG [MemtableFlushWriter: 1] 2023-02-03 13: 52: 45,468 
ColumnFamilyStore. java: 1329 - Flushed to 
[BigTableReader(path='/data/2/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8407-big-Data.
 db'),



INTERNAL USE

Hi all - cluster had a power outage and one of the nodes in a 14 nodes

cluster isn't starting with:



DEBUG [MemtableFlushWriter:1] 2023-02-03 13:52:45,468

ColumnFamilyStore.java:1329 - Flushed to

[BigTableReader(path='/data/2/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8407-big-Data.db'),

BigTableReader(path='/data/3/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8408-big-Data.db'),

BigTableReader(path='/data/4/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8409-big-Data.db'),

BigTableReader(path='/data/5/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8410-big-Data.db'),

BigTableReader(path='/data/6/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8411-big-Data.db'),

BigTableReader(path='/data/8/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8412-big-Data.db'),

BigTableReader(path='/data/9/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8413-big-Data.db')]

(7 sstables, 92.858MiB), biggest 15.420MiB, smallest 10.307MiB

INFO  [main] 2023-02-03 13:52:45,621 CommitLogReader.java:257 - Finished

reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126163.log

DEBUG [main] 2023-02-03 13:52:45,622 CommitLogReader.java:266 - Reading

/var/lib/cassandra/commitlog/CommitLog-7-1674161126164.log (CL version

7, messaging version 12, compression null)

INFO  [main] 2023-02-03 13:52:46,811 CommitLogReader.java:257 - Finished

reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126164.log

DEBUG [main] 2023-02-03 13:52:46,811 CommitLogReader.java:266 - Reading

/var/lib/cassandra/commitlog/CommitLog-7-1674161126165.log (CL version

7, messaging version 12, compression null)

INFO  [main] 2023-02-03 13:52:47,985 CommitLogReader.java:257 - Finished

reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126165.log

DEBUG [main] 2023-02-03 13:52:47,986 CommitLogReader.java:266 - Reading

/var/lib/cassandra/commitlog/CommitLog-7-1674161126166.log (CL version

7, messaging version 12, compression null)

INFO  [main] 2023-02-03 13:52:49,282 CommitLogReader.java:257 - Finished

reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126166.log

DEBUG [main] 2023-02-03 13:52:49,283 CommitLogReader.java:266 - Reading

/var/lib/cassandra/commitlog/CommitLog-7-1674161126167.log (CL version

7, messaging version 12, compression null)

ERROR [main] 2023-02-03 13:52:49,651 JVMStabilityInspector.java:196 -

Exiting due to error while processing commit log during initialization.

org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:

Mutation checksum failure at 11231154 in Next section at 11230925 in

CommitLog-7-1674161126167.log

 at

org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:387)

 at

org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:244)

 at

org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:147)

 at

org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:191)

 at

org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:200)

 at

org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:181)

 at

org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:357)

 at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:752)

 at

org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:876)



How to proceed?

Thank you!



-Joe





--

This email has been checked for viruses by AVG antivirus software.

https://urldefense.com/v3/__http://www.avg.com__;!!M-nmYVHPHQ!Js4daAIOfZ53leb5CNpzGIRZEtXKw9d4dmRxBCGsYe-51J4i2NkrYuazzFt8O5U-KdQ3HCo9xu4_AeqVYBNySKpz31KzFe0cOQ$


RE: Failed disks - correct procedure

2023-01-17 Thread Durity, Sean R via user
For physical hardware when disks fail, I do a removenode, wait for the drive to 
be replaced, reinstall Cassandra, and then bootstrap the node back in (and run 
clean-up across the DC).

All of our disks are presented as one file system for data, which is not what 
the original question was asking.

Sean R. Durity
From: Marc Hoppins 
Sent: Tuesday, January 17, 2023 3:57 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Failed disks - correct procedure

HI all, I was pondering this very situation. We have a node with a crapped-out 
disk (not the first time). Removenode vs repairnode: in regard time, there is 
going to be little difference twixt replacing a dead node and removing then 
re-installing



INTERNAL USE

HI all,



I was pondering this very situation.



We have a node with a crapped-out disk (not the first time). Removenode vs 
repairnode: in regard time, there is going to be little difference twixt 
replacing a dead node and removing then re-installing a node.  There is going 
to be a bunch of reads/writes and verifications (or similar) which is going to 
take a similar amount of time...or do I read that wrong?



For myself, I just go with removenode and then rejoin after HDD has bee 
replaced.  Usually the fix exceeds the wait time and the node is then out of 
the system anyway.



-Original Message-

From: Joe Obernberger 
mailto:joseph.obernber...@gmail.com>>

Sent: Monday, January 16, 2023 6:31 PM

To: Jeff Jirsa mailto:jji...@gmail.com>>; 
user@cassandra.apache.org

Subject: Re: Failed disks - correct procedure



EXTERNAL





I'm using 4.1.0-1.

I've been doing a lot of truncates lately before the drive failed (research 
project).  Current drives have about 100GBytes of data each, although the 
actual amount of data in Cassandra is much less (because of truncates and 
snapshots).  The cluster is not homo-genius; some nodes have more drives than 
others.



nodetool status -r

Datacenter: datacenter1

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns  Host

ID   Rack

UN  nyx.querymasters.com7.9 GiB250 ?

07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1

UN  enceladus.querymasters.com  6.34 GiB   200 ?

274a6e8d-de37-4e0b-b000-02d221d858a5  rack1

UN  aion.querymasters.com   6.31 GiB   200 ?

59150c47-274a-46fb-9d5e-bed468d36797  rack1

UN  calypso.querymasters.com6.26 GiB   200 ?

e83aa851-69b4-478f-88f6-60e657ea6539  rack1

UN  fortuna.querymasters.com7.1 GiB200 ?

49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1

UN  kratos.querymasters.com 6.36 GiB   200 ?

0d9509cc-2f23-4117-a883-469a1be54baf  rack1

UN  charon.querymasters.com 6.35 GiB   200 ?

d9702f96-256e-45ae-8e12-69a42712be50  rack1

UN  eros.querymasters.com   6.4 GiB200 ?

93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1

UN  ursula.querymasters.com 6.24 GiB   200 ?

4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1

UN  gaia.querymasters.com   6.28 GiB   200 ?

b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1

UN  chaos.querymasters.com  3.78 GiB   120 ?

08a19658-40be-4e55-8709-812b3d4ac750  rack1

UN  pallas.querymasters.com 6.24 GiB   200 ?

b74b6e65-af63-486a-b07f-9e304ec30a39  rack1

UN  paradigm7.querymasters.com  16.25 GiB  500 ?

1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297  rack1

UN  aether.querymasters.com 6.36 GiB   200 ?

352fd049-32f8-4be8-9275-68b145ac2832  rack1

UN  athena.querymasters.com 15.85 GiB  500 ?

b088a8e6-42f3-4331-a583-47ef5149598f  rack1



-Joe



On 1/16/2023 12:23 PM, Jeff Jirsa wrote:

> Prior to cassandra-6696 you’d have to treat one missing disk as a

> failed machine, wipe all the data and re-stream it, as a tombstone for

> a given value may be on one disk and data on another (effectively

> redirecting data)

>

> So the answer has to be version dependent, too - which version were you using?

>

>> On Jan 16, 2023, at 9:08 AM, Tolbert, Andy 
>> mailto:x...@andrewtolbert.com>> wrote:

>>

>> Hi Joe,

>>

>> Reading it back I realized I misunderstood that part of your email,

>> so you must be using data_file_directories with 16 drives?  That's a

>> lot of drives!  I imagine this may happen from time to time given

>> that disks like to fail.

>>

>> That's a bit of an interesting scenario that I would have to think

>> about.  If you brought the node up without the bad drive, repairs are

>> probably going to do a ton of repair overstreaming if you aren't

>> using

>> 4.0 
>> (https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CASSANDRA-3200__;!!M-nmYVHPHQ!NmM-JuBX-GTYHt0XeaEWNz7saGfIvnRUEAy3HG6hX_i0bdaIzpo4ceBTx-mB1K9PsPJhfCb0ZCrgVxL7EkOS5AaQVTU$

RE: Best compaction strategy for rarely used data

2022-12-30 Thread Durity, Sean R via user
Yes, clean-up will reduce the disk space on the existing nodes by re-writing 
only the data that the node now owns into new sstables.


Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra

From: Lapo Luchini 
Sent: Friday, December 30, 2022 4:12 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data

On 2022-12-29 21: 54, Durity, Sean R via user wrote: > At some point you will 
end up with large sstables (like 1 TB) that won’t > compact because there are 
not 4 similar-sized ones able to be compacted Yes, that's exactly what's 
happening. 



INTERNAL USE

On 2022-12-29 21:54, Durity, Sean R via user wrote:

> At some point you will end up with large sstables (like 1 TB) that won’t

> compact because there are not 4 similar-sized ones able to be compacted



Yes, that's exactly what's happening.



I'll see maybe just one more compaction, since the biggest sstable is

already more than 20% of residual free space.



> For me, the backup strategy shouldn’t drive the rest.



Mhh, yes, that makes sense.



> And if your data is ever-growing

> and never deleted, you will be adding nodes to handle the extra data as

> time goes by (and running clean-up on the existing nodes).



What will happen when adding new nodes, as you say, though?

If I have a 1GB sstable with 250GB of data that will be no longer useful

(as a new node will be the new owner) will that sstable be reduced to

750GB by "cleanup" or will it retain old data?



Thanks,



--

Lapo Luchini

l...@lapo.it<mailto:l...@lapo.it>




RE: Best compaction strategy for rarely used data

2022-12-29 Thread Durity, Sean R via user
If there isn’t a TTL and timestamp on the data, I’m not sure the benefits of 
TWCS for this use case. I would stick with size-tiered. At some point you will 
end up with large sstables (like 1 TB) that won’t compact because there are not 
4 similar-sized ones able to be compacted (assuming default parameters for 
STCS). And if your data is ever-growing and never deleted, you will be adding 
nodes to handle the extra data as time goes by (and running clean-up on the 
existing nodes). For me, the backup strategy shouldn’t drive the rest.


Sean R. Durity

From: Paul Chandler 
Sent: Thursday, December 29, 2022 4:51 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data

Hi Lapo Take a look at TWCS, I think that could help your use case: https: 
//thelastpickle. com/blog/2016/12/08/TWCS-part1. html [thelastpickle. com] 
Regards Paul Chandler Sent from my iPhone On 29 Dec 2022, at 08: 55, Lapo 
Luchini 

Hi Lapo

Take a look at TWCS, I think that could help your use case: 
https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html 
[thelastpickle.com]

Regards

Paul Chandler
Sent from my iPhone

On 29 Dec 2022, at 08:55, Lapo Luchini mailto:l...@lapo.it>> 
wrote:
Hi, I have a table which gets (a lot of) data that is written once and very 
rarely read (it is used for data that is mandatory for regulatory reasons), and 
almost never deleted.

I'm using the default SCTS as at the time I didn't know any better, but 
SSTables size are getting huge, which is a problem because they both are 
getting to the size of the available disk and both because I'm using a 
snapshot-based system to backup the node (and thus compacting a huge SSTable 
into an even bigger one generates a lot of traffic for mostly-old data).

I'm thinking about switching to LCS (mainly to solve the size issue), but I 
read that it is "optimized for read heavy workloads […] not a good choice for 
immutable time series data". Given that I don't really care about write nor 
read speed, but would like SSTables size to have a upper limit, would this 
strategy still be the best?

PS: Googling around a strategy called "incremental compaction" (ICS) keeps 
getting in results, but that's only available in ScyllaDB, right?

--
Lapo Luchini
l...@lapo.it


INTERNAL USE


RE: Cassandra 4.0.7 - issue - service not starting

2022-12-08 Thread Durity, Sean R via user
I have seen this when there is a tab character in the yaml file. Yaml is (too) 
picky on these things.

Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra

From: Amit Patel via user 
Sent: Thursday, December 8, 2022 11:38 AM
To: Arvydas Jonusonis ; user@cassandra.apache.org
Subject: [EXTERNAL] RE: Cassandra 4.0.7 - issue - service not starting

Hi Arvydas, CompilerOracle: dontinline 
org/apache/cassandra/db/Columns$Serializer. deserializeLargeSubset (Lorg/apac 
he/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/Columns;I)Lorg/apache/cassandra/db/Columns;
 CompilerOracle: dontinline

Hi Arvydas,

CompilerOracle: dontinline 
org/apache/cassandra/db/Columns$Serializer.deserializeLargeSubset (Lorg/apac
he/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/Columns;I)Lorg/apache/cassandra/db/Columns;
CompilerOracle: dontinline 
org/apache/cassandra/db/Columns$Serializer.serializeLargeSubset (Ljava/util/
Collection;ILorg/apache/cassandra/db/Columns;ILorg/apache/cassandra/io/util/DataOutputPlus;)V
CompilerOracle: dontinline 
org/apache/cassandra/db/Columns$Serializer.serializeLargeSubsetSize (Ljava/u
til/Collection;ILorg/apache/cassandra/db/Columns;I)I
CompilerOracle: dontinline 
org/apache/cassandra/db/commitlog/AbstractCommitLogSegmentManager.advanceAll
ocatingFrom (Lorg/apache/cassandra/db/commitlog/CommitLogSegment;)V
CompilerOracle: dontinline 
org/apache/cassandra/db/transform/BaseIterator.tryGetMoreContents ()Z
CompilerOracle: dontinline 
org/apache/cassandra/db/transform/StoppingTransformation.stop ()V
CompilerOracle: dontinline 
org/apache/cassandra/db/transform/StoppingTransformation.stopInPartition ()V
CompilerOracle: dontinline 
org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.doFlush (I)V
CompilerOracle: dontinline 
org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeSlow (JI)V
CompilerOracle: dontinline 
org/apache/cassandra/io/util/RebufferingInputStream.readPrimitiveSlowly (I)J
CompilerOracle: exclude 
org/apache/cassandra/utils/JVMStabilityInspector.forceHeapSpaceOomMaybe (Ljava/
lang/OutOfMemoryError;)V
CompilerOracle: inline 
org/apache/cassandra/db/rows/UnfilteredSerializer.serializeRowBody (Lorg/apache/
cassandra/db/rows/Row;ILorg/apache/cassandra/db/rows/SerializationHelper;Lorg/apache/cassandra/io/util/
DataOutputPlus;)V
CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/net/FrameDecoderWith8bHeader.decode 
(Ljava/util/Collection;
Lorg/apache/cassandra/net/ShareableBytes;I)V
CompilerOracle: inline 
org/apache/cassandra/service/reads/repair/RowIteratorMergeListener.applyToPartit
ion (ILjava/util/function/Consumer;)V
CompilerOracle: inline 
org/apache/cassandra/utils/AsymmetricOrdering.selectBoundary (Lorg/apache/cassan
dra/utils/AsymmetricOrdering/Op;II)I
CompilerOracle: inline 
org/apache/cassandra/utils/AsymmetricOrdering.strictnessOfLessThan (Lorg/apache/
cassandra/utils/AsymmetricOrdering/Op;)I
CompilerOracle: inline org/apache/cassandra/utils/BloomFilter.indexes 
(Lorg/apache/cassandra/utils/IFil
ter/FilterKey;)[J
CompilerOracle: inline org/apache/cassandra/utils/BloomFilter.setIndexes 
(JJIJ[J)V
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare 
(Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare 
([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline 
org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;
Ljava/nio/ByteBuffer;)I
CompilerOracle: inline 
org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/
lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline 
org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/
lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline 
org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/
nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline 
org/apache/cassandra/utils/memory/BufferPool$LocalPool.tryGetInternal (IZ)Ljava/
nio/ByteBuffer;
CompilerOracle: inline 
org/apache/cassandra/utils/vint/VIntCoding.encodeUnsignedVInt (JI)[B
CompilerOracle: inline 
org/apache/cassandra/utils/vint/VIntCoding.encodeUnsignedVInt (JI[B)V
CompilerOracle: inline 
org/apache/cassandra/utils/vint/VIntCoding.writeUnsignedVInt (JLjava/io/DataOutp
ut;)V
CompilerOracle: inline 
org/apache/cassandra/utils/vint/VIntCoding.writeUnsignedVInt (JLjava/nio/ByteBuf
fer;)V
CompilerOracle: inline org/apache/cassandra/utils/vint/VIntCoding.writeVInt 
(JLjava/io/DataOutput;)V
INFO  [main] 2022-12-08 16:21:02,915 YamlConfigurationLoader.java:97 - 
Configuration location: file:/et
c/cassandra/default.conf/cassandra.yaml
Exception (org.apache.cassandra.exceptions.ConfigurationException) encountered 
during startup: Invalid
yaml: file:/etc/cassandra/default.conf/cassandra.yaml
Error: Can't construct a java obje

RE: Cassandra Summit CFP update

2022-11-30 Thread Durity, Sean R via user
Does it need to be strictly Apache Cassandra? Or is something built on/working 
with DataStax Enterprise allowed? I would think if it doesn’t depend on 
DSE-only technology, it could still apply to a general Cassandra audience.


Sean R. Durity

From: Patrick McFadin 
Sent: Tuesday, November 29, 2022 3:53 PM
To: dev ; user@cassandra.apache.org
Subject: [EXTERNAL] Cassandra Summit CFP update

Hi everyone, An update on the current CFP process for Cassandra Summit. There 
are currently 23 talk submissions which are far behind what we need. Two days 
of tracks mean we need 60 approved talks. Ideally, we need over 100 submitted 
to ensure


Hi everyone,


An update on the current CFP process for Cassandra Summit.


There are currently 23 talk submissions which are far behind what we need. Two 
days of tracks mean we need 60 approved talks. Ideally, we need over 100 
submitted to ensure we have a good pool of quality talks. We already have quite 
a few vendor pitches that have nothing to do with Cassandra. Think of it as 
like CFP spam.


https://events.linuxfoundation.org/cassandra-summit/program/cfp/ 
[events.linuxfoundation.org]


The deadline is December 11th. That is 12 days! If you are assuming that will 
get pushed out, don’t. We have a tight schedule before March 13th. Speakers 
must be notified of talk acceptance by the beginning of January to book travel 
in time. The full schedule will be published by mid-January.


That being said, I have talked to quite a few people that are working on a 
submission. Thank you for being willing to create a talk! How can I help you 
get it completed? Again, here is my Calendly link if you need to talk it over: 
https://calendly.com/patrick-mcfadin/15-minute-cassandra-summit-cfp-consult 
[calendly.com]


This is our conference! Let’s make it a festival of the database we love and 
the things we build with it.


One more thing. We need sponsors! If your employer can, this is a great 
opportunity to get your brand out in front of people building the future.


I’ll be back. Go submit a talk. You’ll be happy you did!


Patrick



INTERNAL USE


RE: Query drivertimeout PT2S

2022-11-09 Thread Durity, Sean R via user
>From the subject, this looks like a client-side timeout (thrown by the 
>driver). I have seen situations where the client/driver timeout of 2 seconds 
>is a shorter timeout than on the server side (10 seconds). So, the server 
>doesn’t really note any problem. Unless this is a very remote client and you 
>suspect network-related latency, I would start by looking at the query that 
>generates the timeout and the schema of the table. Make sure that you are 
>querying WITHIN a partition and not ACROSS partitions. There are plenty of 
>other potential problems, but you would need to give us more data to guide 
>those discussions.

Sean R. Durity

From: Bowen Song via user 
Sent: Tuesday, November 8, 2022 1:53 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Query drivertimeout PT2S

This is a mailing list for the Apache Cassandra, and that's not the same as 
DataStax Enterprise Cassandra you are using. We may still be able to help here 
if you could provide more details, such as the queries, table schema, system 
stats (cpu,


This is a mailing list for the Apache Cassandra, and that's not the same as 
DataStax Enterprise Cassandra you are using. We may still be able to help here 
if you could provide more details, such as the queries, table schema, system 
stats (cpu, ram, disk io, network, and so on), logs, table stats, etc., but if 
it's a DSE Cassandra specific issue, you may have better luck contacting 
DataStax directly or posting it on the DataStax Community 
[community.datastax.com].
On 08/11/2022 14:58, Shagun Bakliwal wrote:
Hi All,

My application is frequently getting timeout errors since 2 weeks now. I'm 
using datastax Cassandra 4.14

Can someone help me here?

Thanks,
Shagun


INTERNAL USE


RE: Questions on the count and multiple index behaviour in cassandra

2022-09-29 Thread Durity, Sean R via user
Aggregate queries (like count(*) ) are fine *within* a reasonably sized 
partition (under 100 MB in size). However, Cassandra  is not the right tool if 
you want to do aggregate queries *across* partitions (unless you break up the 
work with something like Spark). Choosing the right partition key and values IS 
the goal of Cassandra data modeling. (Clustering keys are used for ordering 
data within a partition.)

Good:
Select count(*) from my_table where my_partition_key = ‘1’; --and the partition 
is 100 MB or less

Not good:
Select count(*) from my_table;

Are the counts the actual workload or just a measure of the completion of the 
load? You want to model the data to satisfy the queries for the workload. 
Queries should be very simple; getting the data model right is the hard work. 
Make sure that Cassandra fits the use case you have.


Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra

From: Karthik K 
Sent: Wednesday, September 28, 2022 8:48 AM
To: user@cassandra.apache.org
Cc: rsesha...@altimetrik.com
Subject: [EXTERNAL] Re: Questions on the count and multiple index behaviour in 
cassandra

Hi Stéphane Alleaume, Thanks for your quick response. I have attached the Table 
stats by running the nodetool cfstats command to get to the size. If I am 
correct, the partition size must be 464 Mb. However, when I exported the data 
as csv the
ZjQcmQRYFpfptBannerStart
This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.
Report Suspicious  

   ‌
ZjQcmQRYFpfptBannerEnd
Hi Stéphane Alleaume,
Thanks for your quick response. I have attached the Table stats by running the 
nodetool cfstats command to get to the size.
If I am correct, the partition size must be 464 Mb. However, when I exported 
the data as csv the size was 1510 Mb.

1) If we segment this 464Mb data into more partitions, say, with each partition 
sizing <100Mb, Will the count(*) query work effectively?
2) What will be the approximate response in seconds if we run a select count(*) 
against 1 million records?
3) Though, Elasticsearch is a future option, we want to dig more with cassandra 
to achieve this. Do we have any work around using data modelling ?

Thanks & Regards,
Karthikeyan K

On Wed, Sep 28, 2022 at 5:31 PM Stéphane Alleaume 
mailto:crystallo...@gmail.com>> wrote:
Hi

1) how much size in Mo is your partition ? Should be less than 100 Mo (but less 
in fact)

2) could you plug an Elasticsearch or Solr search in front  ?

Kind regards
Stephane





Le mer. 28 sept. 2022, 13:46, Karthik K 
mailto:mailidofkarthike...@gmail.com>> a écrit :
Hi,

We have two doubts on cassandra 3.11 features:

1) Need to get counts of row from a cassandra table.
We have 3 node clusters with Apache Cassandra 3.11 version.

We loaded a table in cassandra with 9lakh records. We have around 91 columns in 
this table. Most of the records have text as datatype.
All these 9lakh records were part of a single partition key.

When we tried a select count(*) query with that partition key, the query was 
timing out.

However, we were able to retrieve counts through multiple calls by fetching only
1 lakh records in each call. The only disadvantage here is the time taken which
is around 1minute and 3 seconds.

Is there any other approach to get the row count faster in cassandra? Do we 
need to '
change the data modelling approach to achieve this? Suggestions are welcome


2) How to data model in cassandra to support usage of multiple filters.
 We may also need the count of rows for this multiple filter query.

Thanks & Regards,
Karthikeyan


INTERNAL USE



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Adding nodes

2022-07-12 Thread Durity, Sean R via user
In my experience C* is not cheaper storage than HDFS. If that is the goal, it 
may be painful.

Each Cassandra DC has at least one full copy of the data set. For production 
data that I care about (that my app teams care about), we use RF=3 in each 
Cassandra DC. And I only use 1 Cassandra rack per DC. Adding racks can 
theoretically help with availability in a data center - if you can match the 
actual physical layout and points of failure. However, it can also unbalance 
the data and cause other admin tasks (adding nodes, etc) to be much more 
complicated. I only use racks in the cloud (to match availability zone within a 
region).

To me, using only RF=2 and RF=1 would mean a high risk of losing data in an 
active cluster. And your DR DC (RF=1) is not very available. Any node down 
means that data is unavailable. If it happens to be credentials or permissions 
for the app user ID, the whole DC is useless.


Sean R. Durity


INTERNAL USE
From: Marc Hoppins 
Sent: Tuesday, July 12, 2022 8:49 AM
To: user@cassandra.apache.org; Bowen Song 
Subject: [EXTERNAL] RE: Adding nodes

The data guys want 2 copies of data in DC1 and that data to be replicated 
offsite to DC1 for 1 copy (DR purposes)

If this setup doesn't achieve this, what does?

At least HBASE was simple enough in that everything could be configured as a 
giant blob of storage with HDFS taking care of keeping (at least) 1 copy out of 
the local system and 1 copy remotely

From: Bowen Song via user 
mailto:user@cassandra.apache.org>>
Sent: Tuesday, July 12, 2022 12:29 PM
To: user@cassandra.apache.org
Subject: Re: Adding nodes

EXTERNAL

For RF=2 in your DC1, you will not be able to achieve both strong consistency 
and single point of failure tolerance within that DC. You may want to think 
twice before proceeding with that.

For RF=1 in your DC2, you will not be able to run DC-local repairs within in 
that DC. You may also want to think twice about it.

The rack settings specifies the logical rack of a node, and it affects how 
replica is stored within in the DC, but not how many replicas in that DC. The 
RF affects how many copies of data in the DC, but not how they are stored. The 
rack and RF together work out how many copies and where to store them within a 
DC.

In practice, RF should ideally be whole multiples of the number of racks within 
in the DC to ensure even distribution of the replicas among the nodes within 
the DC. That's why rack=1 will always work, and 1*rack=RF, 2*rack=RF and 
3*rack=RF, etc. will also work.


On 12/07/2022 10:34, Marc Hoppins wrote:
The data guys' plan is for table/keyspace NTS DC1= 2 and DC2= 1 across the 
board. Which leads me to...what is the point of having 
Cassandra-rackdc.properties RACK settings anyway?  If you can specify the local 
replication with the DC, having RACK specified elsewhere (whether it is a 
logical or physical rack) seems to be adding confusion to the pot.

From: Bowen Song via user 

Sent: Tuesday, July 12, 2022 11:23 AM
To: user@cassandra.apache.org
Subject: Re: Adding nodes

EXTERNAL

I think you are misinterpreting many concepts here. For a starter, a physical 
rack in a physical DC is not (does not have to be) a logical rack in a logical 
DC in Cassandra; and the allocate_tokens_for_local_replication_factor has 
nothing to do with replication factor (other than using it as an input), but 
has everything to do with token allocation.

You need to plan for number of logical (not physical) racks per DC, either 
number of rack = 1, and RF = any, or number of rack = RF within that DC. It's 
not impossible to add (or remove) a rack from an existing DC, but it's much 
better to plan ahead.


On 12/07/2022 07:33, Marc Hoppins wrote:
There is likely going to be 2 racks in each DC.

Adding the new node decided to quit after 12 hours.  Node was overloaded and GC 
pauses caused the bootstrap to fail.  I begin to see the pattern here.  If 
replication is only within the same datacentre, and one starts off with only 
one rack then all data is within that rack, adding a new rack...but can only 
add one node at a time...will cause a surge of replication onto the one new 
node as this is now a failover point.  I noticed when checking netstats on the 
joining node that it was getting data from 12 sources. This lead me to the 
conclusion that ALL the streaming data was coming from every node in the same 
datacentre. I checked this by running netstats on other nodes in the second 
datacentre and they were all quiescent.  So, unlike HBASE/HDFS where we can 
spread the replication across sites, it seems that it is not a thing for this 
software.  Or do I have that wrong?

Now, obviously, this is the second successive failure with adding a new node. 
ALL of the new nodes I need to add are in a new rack.

# Replica factor is explicitly set, regardless of keyspace or datacenter.
# This is the replica factor wi

RE: Guardrails in Cassandra 4.1 Alpha

2022-06-23 Thread Durity, Sean R
I'm not afraid to admit that I LOVE this feature. Exactly what a data engine 
should be able to do - stop bad behavior.

Sean R. Durity

From: Aaron Ploetz 
Sent: Thursday, June 23, 2022 3:22 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Guardrails in Cassandra 4.1 Alpha

Ahh...yes, my default "aaron" user is indeed a SUPERUSER.

Ok, so I created a new, non-superuser and tried again...

> SELECT * FROm stackoverflow.movies WHERE title='Sneakers (1992)' ALLOW 
> FILTERING;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Guardrail 
allow_filtering violated: Querying with ALLOW FILTERING is not allowed"

Thank you for the quick response, Andres!

On Thu, Jun 23, 2022 at 2:14 PM Andrés de la Peña 
mailto:adelap...@apache.org>> wrote:
Hi Aaron,

Guardrails are not applied to superusers. The default user is a superuser, so 
to see guardrails in action you need to create and use a user that is not a 
superuser.

You can do that by setting, for example, these properties on cassandra.yaml:

authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer

Then you can login with cqlsh using the default superuser and create a regular 
user with the adequate permissions. For example:

bin/cqlsh -u cassandra -p cassandra
> CREATE USER test WITH PASSWORD 'test';
> GRANT SELECT ON ALL KEYSPACES TO test;
bin/cqlsh -u test -p test
> SELECT * FROM stackoverflow.movies WHERE title='Sneakers (1992)' ALLOW 
> FILTERING;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Guardrail 
allow_filtering violated: Querying with ALLOW FILTERING is not allowed"

Finally, that particular guardrail isn't applied to system tables, so it would 
still allow filtering on the system.local and system_views.settings tables, but 
not in stackoverflow.movies.

I hope this helps.

On Thu, 23 Jun 2022 at 19:51, Aaron Ploetz 
mailto:aaronplo...@gmail.com>> wrote:
So I'm trying to test out the guardrails in 4.1-alpha.  I've set 
allow_filtering_enabled: false, but it doesn't seem to care (I can still use 
it).

> SELECT release_version FROM system.local;
 release_version
-
 4.1-alpha1-SNAPSHOT

(1 rows)

> SELECT * FROM system_views.settings WHERE name='allow_filtering_enabled';
 name| value
-+---
 allow_filtering_enabled | false

(1 rows)

> SELECT * FROm stackoverflow.movies WHERE title='Sneakers (1992)' ALLOW 
> FILTERING;
 id   | genre  | title
--++-
 1396 | Crime|Drama|Sci-Fi | Sneakers (1992)

(1 rows)

Is there like some main "guardrails enabled" setting that I missed?

Thanks,

Aaron



INTERNAL USE


RE: Seed List

2022-06-23 Thread Durity, Sean R
It can work to use host names. We have done it for temporary clusters where 
there is at least a theoretical possibility of an ip address change. I don't 
know all the trade-offs of using host names, since we don't do that for 
production.


Sean R. Durity


INTERNAL USE

-Original Message-
From: Marc Hoppins  
Sent: Thursday, June 23, 2022 3:33 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Seed List

Hi guys,

Documentation (for almost everything) uses IP addresses for seeds,  is it 
possible to use the FQDN instead for the seeds (cass.yaml)?  It is far easier 
to read/use names.

Thanks

M


RE: Configuration for new(expanding) cluster and new admins.

2022-06-16 Thread Durity, Sean R
I have run clusters with different disk size nodes by using different number of 
num_tokens. I used the basic math of just increasing the num_tokens by the same 
percentage as change in disk size. (So, if my "normal" node was 8 tokens, one 
with double the disk space would be 16.)

One thing to watch/consider - the (number of tokens) * (the number of nodes) 
makes repairs work harder


Sean R. Durity


INTERNAL USE

-Original Message-
From: Marc Hoppins  
Sent: Wednesday, June 15, 2022 3:34 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Configuration for new(expanding) cluster and new admins.

Hi all,

Say we have 2 datacentres with 12 nodes in each. All hardware is the same.

4-core, 2 x HDD (eg, 4TiB)

num_tokens = 16 as a start point

If a plan is to gradually increase the nodes per DC, and new hardware will have 
more of everything, especially storage, I assume I increase the num_tokens 
value.  Should I have started with a lower value?

What would be considered as a good adjustment for:

Any increase in number of HDD for any node?

Any increase in capacity per HDD for any node?

Is there any direct correlation between new token count and the proportional 
increase in either quantity of devices or total capacity, or is any adjustment 
purely arbitrary just to differentiate between varied nodes?

Thanks

M

RE: Topology vs RackDC

2022-06-02 Thread Durity, Sean R
I agree; it does depend. Our ansible could not infer the DC name from the 
hostname or ip address of our on-prem hardware. That’s especially true when we 
are migrating to new hardware or OS and we are adding logical DCs with 
different names. I suppose it could be embedded in the ansible host file (but 
you are still maintaining that master file), but we don’t organize our hosts 
file that way. We are rarely adding a few nodes here or there, so the penalty 
of a rolling restart is minimal for us.

Sean R. Durity


INTERNAL USE
From: Bowen Song 
Sent: Thursday, June 2, 2022 12:25 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Topology vs RackDC


It really depends on how do you manage your nodes. With automation tools, like 
Ansible, it's much easier to manage the rackdc file per node. The "master list" 
doesn't need to exist, because the file is written once and will never get 
updated. The automation tool will create nodes based on the required DC/rack, 
and writes that information to the rackdc file during the node provisioning 
process. It's much faster to add nodes to a large cluster with rackdc file  - 
no rolling restart required.
On 02/06/2022 14:46, Durity, Sean R wrote:
I agree with Marc. We use the cassandra-topology.properties file (and 
PropertyFileSnitch) for our deployments. Having a file different on every node 
has never made sense to me. There would still have to be some master file 
somewhere from which to generate that individual node file. There is the 
(slight) penalty that a change in topology requires the distribution of a new 
file and a rolling restart.

Long live the PropertyFileSnitch! 😉

Sean R. Durity
From: Paulo Motta <mailto:pauloricard...@gmail.com>
Sent: Thursday, June 2, 2022 8:59 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Topology vs RackDC

It think topology file is better for static clusters, while rackdc for dynamic 
clusters where users can add/remove hosts without needing to update the 
topology file on all hosts.

On Thu, 2 Jun 2022 at 09:13 Marc Hoppins 
mailto:marc.hopp...@eset.com>> wrote:
Hi all,

Why is RACKDC preferred for production than TOPOLOGY?

Surely one common file is far simpler to distribute than deal with the 
mucky-muck of various configs for each host if they are in one rack or another 
and/or one datacentre or another?  It is also fairly self-documenting of the 
setup with the entire cluster there in one file.

From what I read in the documentation, regardless of which snitch one 
implements, cassandra-topology.properties will get read, either as a primary or 
as a backup...so why not just use topology for ALL cases?

Thanks

Marc


INTERNAL USE


RE: Topology vs RackDC

2022-06-02 Thread Durity, Sean R
I agree with Marc. We use the cassandra-topology.properties file (and 
PropertyFileSnitch) for our deployments. Having a file different on every node 
has never made sense to me. There would still have to be some master file 
somewhere from which to generate that individual node file. There is the 
(slight) penalty that a change in topology requires the distribution of a new 
file and a rolling restart.

Long live the PropertyFileSnitch! 😉

Sean R. Durity
From: Paulo Motta 
Sent: Thursday, June 2, 2022 8:59 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Topology vs RackDC

It think topology file is better for static clusters, while rackdc for dynamic 
clusters where users can add/remove hosts without needing to update the 
topology file on all hosts.

On Thu, 2 Jun 2022 at 09:13 Marc Hoppins 
mailto:marc.hopp...@eset.com>> wrote:
Hi all,

Why is RACKDC preferred for production than TOPOLOGY?

Surely one common file is far simpler to distribute than deal with the 
mucky-muck of various configs for each host if they are in one rack or another 
and/or one datacentre or another?  It is also fairly self-documenting of the 
setup with the entire cluster there in one file.

From what I read in the documentation, regardless of which snitch one 
implements, cassandra-topology.properties will get read, either as a primary or 
as a backup...so why not just use topology for ALL cases?

Thanks

Marc


INTERNAL USE


RE: Fetch all data from Cassandra 3.4.4

2022-05-31 Thread Durity, Sean R
A select with no where clause is not a good access pattern for Cassandra, 
regardless of driver version. It will not scale for large data sets or a large 
number of nodes.

Ideally you want to select from a single partition for each query. So, 
depending on the size of the rows, one answer may be to create a partition to 
hold the 25,000 rows. This is assuming the rows are relatively small (under 100 
MB total for the partition) and that you are often dealing with the whole 
partition or a subset. Of course, this strategy could produce a hot spot on the 
cluster if there were more nodes.

Others might chime in with Spark-related answers for working through large data 
sets. If it is only 25,000 rows, that really isn't large, but it is an answer 
to the general problem of analytics-type queries (needing all rows).


Sean R. Durity



INTERNAL USE
From: Bochkarev, Peter 
Sent: Monday, May 30, 2022 7:37 AM
To: user@cassandra.apache.org
Cc: Thondavada, Saiprasad ; Pikalev, Sergey 
; Yaroslavskiy, Vladimir 

Subject: [EXTERNAL] RE: Fetch all data from Cassandra 3.4.4


Hi guys!

We use Cassandra 3.4.4. We have 2 nodes with full replication.
We have such an issue. We use old Java driver 
com.datastax.cassandra.cassandra-driver-core and cant simple do upgrade.
We need fetch all data from table but our driver return 23 000 records. Last 
Java driver com.datastax.oss 
[mvnrepository.com].java-driver-core
 
[mvnrepository.com]
 fetch 25 000 records.
Could we do something to fix issue except to go latest driver?

Also we use Spring framework 1.5.2 in our app.
Request: select * from my_table;


Internal Use - Confidential


RE: about the performance of select * from tbl

2022-04-26 Thread Durity, Sean R
If the number of rows is known and bounded and would be under 100 MB in size, I 
would suggest adding an artificial partition key so that all rows are in one 
partition. I recommend this technique for something like an application 
settings table that is retrieved infrequently (like on app start-up) but needs 
all rows at once. If it is often accessed, this strategy could create hot spots 
or potential availability concerns.

If this is more about analytics and the row count is unbounded, I would pursue 
something like Spark OR re-design the model so that you do have some kind of 
partition (and maybe clustering) keys. I’m always telling app teams that more 
in-parallel queries are a very good option for Cassandra.

My bottom line is this: the BEST way to scale Cassandra is NOT tuning queries, 
but designing the tables to easily answer what you need with proper 
partitioning.


Sean R. Durity
From: Joe Obernberger 
Sent: Tuesday, April 26, 2022 1:10 PM
To: user@cassandra.apache.org; 18624049226 <18624049...@163.com>
Subject: [EXTERNAL] Re: about the performance of select * from tbl


This would be a good use case for Spark + Cassandra.

-Joe
On 4/26/2022 8:48 AM, 18624049226 wrote:

We have a business scenario. We must execute the following statement:

select * from tbl;

This CQL has no WHERE condition.

What I want to ask is that if the data in this table is more than one million 
or more, what methods or parameters can improve the performance of this CQL?


[Image removed by sender. AVG 
logo][avg.com]

This email has been checked for viruses by AVG antivirus software.
www.avg.com 
[avg.com]




INTERNAL USE


RE: Cassandra Management tools?

2022-02-28 Thread Durity, Sean R
I have used my own bash scripts with ssh connections to the nodes to automate 
everything from upgrades, node down monitoring, metrics or log collection, and 
rolling restarts. We are moving toward ansible (our infrastructure team is 
standardizing on its use). Rolling restart isn’t too bad in ansible. I haven’t 
done the automated upgrade, yet. Ansible is much more verbose in output and not 
as clean for understanding what was done – at least so far.


Sean R. Durity

From: Adam Scott 
Sent: Monday, February 28, 2022 6:32 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Management tools?

I use pssh -i -h hosts nodetool  for one-offs.

Rolling restart is tricky to automate, but haven't had to yet. If I were to, I 
would be sure to do a test connect and query to confirm the node is up before 
going to the next one.

For automation I use python fabric.

I too, would be curious what others use.

Thanks,
Adam


On Mon, Feb 28, 2022 at 1:59 PM Joe Obernberger 
mailto:joseph.obernber...@gmail.com>> wrote:
Hi all - curious what tools are folks using to manage large Cassandra
clusters?  For example, to do tasks such as nodetool cleanup after a
node or nodes are added to the cluster, or simply rolling start/stops
after an update to the config or a new version?
We've used puppet before; is that what other folks are using?
Thanks for any suggestions.

-Joe


INTERNAL USE


Migration between Apache 4.x and DSE 6+?

2022-01-18 Thread Durity, Sean R
Has anyone been able to add Apache Cassandra 4.x nodes to a new DC within a DSE 
6+ cluster (or vice versa) in order to migrate from one to the other with no 
downtime? I was able to do this prior to DSE 6/Cassandra 4.0, but that was 
before the internals rewrite (and different sstable format?) of DSE 6.


Sean R. Durity


INTERNAL USE


RE: about memory problem in write heavy system..

2022-01-11 Thread Durity, Sean R
In my experience, the 50% overhead for compaction/upgrade is for the worst case 
scenario systems – where the data is primarily one table and uses size-tiered 
compaction. (I have one of those.) What I really look at is if there is enough 
space to execute upgradesstables on the largest sstable. Granted, it is not fun 
to deal with tight space on a Cassandra cluster.

Sean R. Durity

From: Bowen Song 
Sent: Tuesday, January 11, 2022 6:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: about memory problem in write heavy system..


You don't really need 50% of free disk space available if you don't keep 
backups and snapshots on the same server. The DataStax guide recommends 50% 
free space because it recommands you to take a snapshot (which is implemented 
as filesystem hardlink) before upgrading. If you don't have 50% free disk space 
before upgrading Cassandra, you can choose to keep the backup files elsewhere, 
or don't make a backup at all. The later is of course not recommended for a 
production system.
On 11/01/2022 01:36, Eunsu Kim wrote:
Thank you Bowen.

As can be seen from the chart, the memory of existing nodes has increased since 
new nodes were added. And I stopped writing a specific table. Write throughput 
decreased by about 15%. And memory usage began to decrease.
I'm not sure if this was done by natural resolution or by reducing writing.
What is certain is that the addition of new nodes has increased the native 
memory usage of some existing nodes.

After reading the 3.x to 4.x migration guide of DataStax, it seems that more 
than 50% of disk availability is required for upgrade. This is likely to be a 
major obstacle to upgrading the cluster in operation.


Many thanks.


2022. 1. 10. 오후 8:53, Bowen Song mailto:bo...@bso.ng>> 작성:

Anything special about the table you stopped writing to? I'm wondering how did 
you locate the table was the cause of the memory usage increase.
> For the latest version (3.11.11) upgrade, can the two versions coexist in the 
> cluster for a while?
>
> Can the 4.x version coexist as well?

Yes and yes. It is expected that two different versions of Cassandra will be 
running in the same cluster at the same time while upgrading. This process is 
often called zero downtime upgrade or rolling upgrade. You can perform such 
upgrade from 3.11.4 to 3.11.11 or directly to 4.0.1, both are supported. 
Surprisingly, I can't find any documentation related to this on the 
cassandra.apache.org 
[cassandra.apache.org]
 website (if you found it, please send me a link). Some other sites have brief 
guides on this process, such as DataStax 
[datastax.com]
 and Instaclustr 
[instaclustr.com],
 and you should always read the release notes 
[github.com]
 which includes breaking changes and new features before you perform an upgrade.

On 10/01/2022 00:18, Eunsu Kim wrote:
Thank you for your response

Fortunately, memory usage came back down over the weekend. I removed the 
writing of a specific table last Friday.

<붙여넣은 그래픽-2.png>


For the latest version (3.11.11) upgrade, can the two versions coexist in the 
cluster for a while?

Can the 4.x version coexist as well?


2022. 1. 8. 오전 1:26, Jeff Jirsa mailto:jji...@gmail.com>> 작성:

3.11.4 is a very old release, with lots of known bugs. It's possible the memory 
is related to that.

If you bounce one of the old nodes, where does the memory end up?


On Thu, Jan 6, 2022 at 3:44 PM Eunsu Kim 
mailto:eunsu.bil...@gmail.com>> wrote:

Looking at the memory usage chart, it seems that the physical memory usage of 
the existing node has increased since the new node was added with 
auto_bootstrap=false.

<붙여넣은 그래픽-1.png>




On Fri, Jan 7, 2022 at 1:11 AM Eunsu Kim 
mailto:eunsu.bil...@gmail.com>> wrote:
Hi,

I have a Cassandra cluster(3.11.4) that does heavy writing work. (14k~16k write 
throughput per second per node)

Nodes are physical machine in data center. Number of nodes are 30. Each node 
has three data disks mounted.


A few days ago, a QueryTimeout problem occurred due to Full GC.
So, referring to this 
blog(https://thelastpickle.com/blog/2018/04/11/gc-tuning.html 
[thelastpickle.com]

RE: Separating storage and processing

2021-11-15 Thread Durity, Sean R
We have apps like this, also. For straight Cassandra, I think it is just the 
nature of how it works. DataStax provides some interesting solutions in 
different directions: BigNode (for handling 10-20 TB nodes) or Astra 
(cloud-based/container-driven solution that DOES separate read, write, and 
storage into separately scaled aspects of Cassandra). I suppose that you could 
do some similar work on your own with k8cassandra and StarGate.

Sean Durity – Staff Systems Engineer, Cassandra

From: onmstester onmstester 
Sent: Monday, November 15, 2021 12:56 AM
To: user 
Subject: [EXTERNAL] Separating storage and processing

Hi,
In our Cassandra cluster, because of big rows in input data/data model with TTL 
of several months, we ended up using almost 80% of storage (5TB per node), but 
having less than 20% of CPU usage which almost all of it would be writing rows 
to memtables and compacting sstables, so a lot of CPU capacity wasted.
I wonder if there is anything we can do to solve this problem using Cassandra 
or should migrate from Cassandra to something that separates storage and 
processing (currently i'm not aware of anything as satble as cassandra)?


Sent using Zoho Mail 
[zoho.com]




INTERNAL USE


RE: One big giant cluster or several smaller ones?

2021-11-15 Thread Durity, Sean R
For memory-sake, you do not want “too many” tables in a single cluster (~200 is 
a reasonable rule of thumb). But I don’t see a major concern with a few very 
large tables in the same cluster. The client side, at least in Java, could get 
large (memory-wise) holding a Cluster object for multiple clusters.

I agree with Jeff: a cluster per app is the cleanest separation we have seen. 
Multi-tenant leads to many more potential problems. Multi-cluster per app seems 
unnecessarily complex.



Sean Durity – Staff Systems Engineer, Cassandra

From: S G 
Sent: Saturday, November 13, 2021 9:58 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: One big giant cluster or several smaller ones?

I think 1 cluster per large table should be preferred, rather than per 
application.
Example, what if there is a large application that requires several big tables, 
each many 10s of tera-bytes in size?
Is it still recommended to have 1 cluster for that app ?


On Fri, Nov 12, 2021 at 2:01 PM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
Oh sorry - a cluster per application makes sense. Sharding within an 
application makes sense to avoid very very very large clusters (think: 
~thousand nodes). 1 cluster per app/use case.

On Fri, Nov 12, 2021 at 1:39 PM S G 
mailto:sg.online.em...@gmail.com>> wrote:
Thanks Jeff.
Any side-effect on the client config from small clusters perspective?

Like several smaller clusters means more CassandraClient objects on the client 
side but I guess number of connections shall remain the same as number of 
physical nodes will most likely remain the same only. So I think client side 
would not see any major issue.


On Fri, Nov 12, 2021 at 11:46 AM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
Most people are better served building multiple clusters and spending their 
engineering time optimizing for maintaining multiple clusters, vs spending 
their engineering time learning how to work around the sharp edges that make 
large shared clusters hard.

Large multi-tenant clusters give you less waste and a bit more elasticity (one 
tenant can burst and use spare capacity that would typically be left for the 
other tenants). However, one bad use case / table can ruin everything (one bad 
read that generates GC hits all use cases), and eventually certain 
mechanisms/subsystems dont scale past certain points (e.g. schema - large 
schemas and large clusters are much harder than small schemas and small 
clusters)




On Fri, Nov 12, 2021 at 11:31 AM S G 
mailto:sg.online.em...@gmail.com>> wrote:
Hello,

Is there any case where we would prefer one big giant cluster (with multiple 
large tables) over several smaller clusters?
Apart from some management overhead of multiple Cassandra Clients, it seems 
several smaller clusters are always better than a big one:

  1.  Avoids SPOF for all tables
  2.  Helps debugging (less noise from all tables in the logs)
  3.  Traffic spikes on one table do not affect others if they are in different 
tables.
  4.  We can scale tables independently of each other - so colder data can be 
in a smaller cluster (more data/node) while hotter data can be on a bigger 
cluster (less data/node)

It does not mean that every table should be in its own cluster.
But large ones can be moved to their own dedicated clusters (like those more 
than a few terabytes).
And smaller ones can be clubbed together in one or few clusters.

Please share any recommendations for the above from actual production 
experiences.
Thanks for helping !



INTERNAL USE


RE: R/W timeouts VS number of tables in keyspace

2021-07-20 Thread Durity, Sean R
Each table in the cluster will have a memtable. This is why you do not want to 
fracture the memory into 900+ slices. The rule of thumb I have followed is to 
stay in the low hundreds (maybe 200) tables for the whole cluster. I would be 
requiring the hard refactoring (or moving tables to different clusters) 
immediately, since you really need to reduce by at least 700 tables. You are 
seeing the memory impacts.

In addition, in my experience, CMS is much harder to tune. G1GC works well in 
my use cases without much tuning (or Java-guru level knowledge). However, I 
don’t think that you will be able to engineer around the 900+ tables, no matter 
which GC you use.

Sean Durity – Staff Systems Engineer, Cassandra

From: Luca Rondanini 
Sent: Monday, July 19, 2021 11:34 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] R/W timeouts VS number of tables in keyspace

Hi all,

I have a keyspace with almost 900 tables.

Lately I started receiving lots of w/r timeouts (eg 
com.datastax.driver.core.exceptions.Read/WriteTimeoutException: Cassandra 
timeout during write query at consistency LOCAL_ONE (1 replica were required 
but only 0 acknowledged the write).

I'm even experiencing nodes crashing.

In the logs I get many warnings like:

WARN  [Service Thread]GCInspector.java:282 - ConcurrentMarkSweep GC in 
4025ms.  CMS Old Ge
n: 2141569800 -> 2116170568; Par Eden Space: 167772160 -> 0; Par Survivor 
Space: 20971520 -> 0

WARN  [GossipTasks:1].FailureDetector.java:288 - Not marking nodes down due 
to local pause
of 5038005208 > 50
I know 900 tables is a design error for C* but before a super painful 
refactoring I'd like to rule out any configuration problem. Any suggestion?

Thanks a lot,
Luca






The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Storing user activity logs

2021-07-20 Thread Durity, Sean R
Yes, use the time-bucketing approach and choose a bucket-size (included in the 
partition key) that is granular enough to keep partitions to about 100 MB in 
size. (Unbounded partitions WILL destroy your cluster.) If your queries *need* 
to retrieve all user activity over a certain period, then, yes, multiple 
queries may be required. Partition key queries (of small partitions) are very 
fast and can be done asynchronously. That is the right way to use Cassandra for 
a time series of data.

Sean Durity – Staff Systems Engineer, Cassandra

From: manish khandelwal 
Sent: Monday, July 19, 2021 11:58 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Storing user activity logs

I concur with Eliot view. Only way you can reduce partition size is by tweaking 
your partition key. Here with user_id as partition key, partition size depends 
on the activity of the user. For a superactive user it can become large in no 
time. After changing the key migration of old data  to the new table will also 
be required, please keep that also in mind.

Regards
Manish

On Tue, Jul 20, 2021 at 2:54 AM Elliott Sims 
mailto:elli...@backblaze.com>> wrote:
Your partition key determines your partition size.  Reducing retention sounds 
like it would help some in your case, but really you'd have to split it up 
somehow.  If it fits your query pattern, you could potentially have a compound 
key of userid+datetime, or some other time-based split.  You could also just 
split each user's rows into subsets with some sort of indirect mapping, though 
that can get messy pretty fast.

On Mon, Jul 19, 2021 at 9:01 AM MyWorld 
mailto:timeplus.1...@gmail.com>> wrote:
Hi all,

We are currently storing our user activity log in Cassandra with below 
architecture.

Create table user_act_log(
Userid bigint,
Datetime bigint,
Sno UUID,
some more columns)
With partition key - userid
Clustering key - datetime, sno
And TTL of 6 months

With time our table data have grown to around 500gb and we notice from table 
histogram our max partition size have also grown to tremendous size (nearly 1gb)

So, please help me out what should be the right architecture for this use case?

I am currently thinking of changing the compaction strategy to time window from 
size tier with 30 day window. But will this improve the partion size?

Should we use any other db for such use case?






The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


4.0 best feature/fix?

2021-05-07 Thread Durity, Sean R
There is not enough 4.0 chatter here. What feature or fix of the 4.0 release is 
most important for your use case(s)/environment? What is working well so far? 
What needs more work? Is there anything that needs more explanation?

[cid:image001.png@01D7431D.8C1332E0]
Sean Durity
Staff Systems Engineer - Cassandra
#cassandra - for the latest news and updates





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Cassandra 3.11 cqlsh doesn't work with latest JDK

2021-04-30 Thread Durity, Sean R
Try adding this into the SSL section of your cqlshrc file:
version = SSLv23


Sean Durity

From: Maxim Parkachov 
Sent: Friday, April 30, 2021 8:57 AM
To: user@cassandra.apache.org; d...@cassandra.apache.org
Subject: [EXTERNAL] Cassandra 3.11 cqlsh doesn't work with latest JDK

Hi everyone,

I have Apache Cassandra 3.11.6 with SSL encryption, CentOS Linux release 7.9, 
python 2.7.5. JDK and python are coming from operating system.

I have updated today operating system and with that I've got new JDK

$ java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

Now when I try to connect to my local instance of Cassandra with cqlsh I'm 
getting error:

$ cqlsh --ssl -u cassandra -p cassandra
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(1, 
u"Tried connecting to [('127.0.0.1', 9142)]. Last error: [SSL: 
WRONG_VERSION_NUMBER] wrong version number (_ssl.c:618)")})

Apparently, latest release of JDK *_292 disabled TLS1.0 and TLS1.1.

Is this known issue ? Is there is something I could do to quickly remedy the 
situation ?

Thanks in advance,
Maxim.




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Huge single-node DCs (?)

2021-04-09 Thread Durity, Sean R
DataStax Enterprise has a new-ish feature set called Big Node that is supposed 
to help with using much denser nodes. We are going to be doing some testing 
with that for a similar use case with ever-growing disk needs, but no real 
increase in read or write volume. At some point it may become available in the 
open source version, too.


Sean Durity – Staff Systems Engineer, Cassandra

From: Elliott Sims 
Sent: Thursday, April 8, 2021 6:36 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Huge single-node DCs (?)

I'm not sure I'd suggest building a single DIY Backblaze pod.  The SATA port 
multipliers are a pain both from a supply chain and systems management 
perspective.  Can be worth it when you're amortizing that across a lot of 
servers and can exert some leverage over wholesale suppliers, but less so for a 
one-off.  There's a lot more whitebox/OEM/etc options for high-density storage 
servers these days from Seagate, Dell, HP, Supermicro, etc that are worth a 
look.

I'd agree with this (both examples) sounding like a poor fit for Cassandra.  
Seems like you could always just spin up a bunch of Cassandra VMs in the ESX 
cluster instead of one big one, but something like MySQL or PostgreSQL might 
suit your needs better.  Or even some sort of flatfile archive with something 
like Parquet if it's more being kept "just in case" with no need for quick 
random access.

For the 10PB example, it may be time to look at something like Hadoop, or maybe 
Ceph.

On Thu, Apr 8, 2021 at 10:39 AM Bowen Song mailto:bo...@bso.ng>> 
wrote:

This is off-topic. But if your goal is to maximise storage density and also 
ensuring data durability and availability, this is what you should be looking 
at:

  *   hardware: https://www.backblaze.com/blog/open-source-data-storage-server/ 
[backblaze.com]
  *   architecture and software: 
https://www.backblaze.com/blog/vault-cloud-storage-architecture/ 
[backblaze.com]


On 08/04/2021 17:50, Joe Obernberger wrote:
I am also curious on this question.  Say your use case is to store 10PBytes of 
data in a new server room / data-center with new equipment, what makes the most 
sense?  If your database is primarily write with little read, I think you'd 
want to maximize disk space per rack space.  So you may opt for a 2u server 
with 24 3.5" disks at 16TBytes each for a node with 384TBytes of disk - so ~27 
servers for 10PBytes.

Cassandra doesn't seem to be the good choice for that configuration; the rule 
of thumb that I'm hearing is ~2Tbytes per node, in which case we'd need over 
5000 servers.  This seems really unreasonable.

-Joe

On 4/8/2021 9:56 AM, Lapo Luchini wrote:

Hi, one project I wrote is using Cassandra to back the huge amount of data it 
needs (data is written only once and read very rarely, but needs to be 
accessible for years, so the storage needs become huge in time and I chose 
Cassandra mainly for its horizontal scalability regarding disk size) and a 
client of mine needs to install that on his hosts.

Problem is, while I usually use a cluster of 6 "smallish" nodes (which can grow 
in time), he only has big ESX servers with huge disk space (which is already 
RAID-6 redundant) but wouldn't have the possibility to have 3+ nodes per DC.

This is out of my usual experience with Cassandra and, as far as I read around, 
out of most use-cases found on the website or this mailing list, so the 
question is:
does it make sense to use Cassandra with a big (let's talk 6TB today, up to 
20TB in a few years) single-node DataCenter, and another single-node DataCenter 
(to act as disaster recovery)?

Thanks in advance for any suggestion or comment!



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or spe

RE: Changing num_tokens and migrating to 4.0

2021-03-22 Thread Durity, Sean R
I have a cluster (almost 200 nodes) with a variety of disk sizes and use 
different numbers of tokens so that the machines can use the disk they have. It 
is a very handy feature! While I agree that a node with larger disk may handle 
more requests, that may not be enough to impact CPU or memory. I rarely see 
Cassandra CPU-bound for my use cases. These are primarily write use cases with 
a low number of clients with far fewer reads. There is just a lot of data to 
keep.

Sean Durity

From: Alex Ott 
Sent: Saturday, March 20, 2021 1:01 PM
To: user 
Subject: [EXTERNAL] Re: Changing num_tokens and migrating to 4.0

if the nodes are almost the same, except the disk space, then giving them more 
may make siltation worse - they will get more requests than other nodes, and 
won't have resources to process them.
In Cassandra the disk size isn't the main "success" factor - it's a memory, 
CPU, disk type (SSD), etc.

On Sat, Mar 20, 2021 at 5:26 PM Lapo Luchini 
mailto:l...@lapo.it>> wrote:
Hi, thanks for suggestions!
I'll definitely migrate to 4.0 after all this is done, then.

Old prod DC I fear can't suffer losing a node right now (a few nodes
have the disk 70% full), but I can maybe find a third node for the new
DC right away.

BTW the new nodes have got 3× the disk space, but are not so much
different regarding CPU and RAM: does it make any sense giving them a
bit more num_tokens (maybe 20-30 instead of 16) than the rest of the old
DC hosts or "asymmetrical" clusters lead to problems?

No real need to do that anyways, moving from 6 nodes to (eventually) 8
should be enough lessen the load on the disks, and before more space is
needed I will probably have more nodes.

Lapo

On 2021-03-20 16:23, Alex Ott wrote:
> I personally maybe would go following way (need to calculate how many
> joins/decommissions will be at the end):
>
>   * Decommission one node from prod DC
>   * Form new DC from two new machines and decommissioned one.
>   * Rebuild DC from existing one, make sure that repair finished, etc.
>   * Switch traffic
>   * Remove old DC
>   * Add nodes from old DC one by one into new DC
>



-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org


--
With best wishes,Alex Ott
http://alexott.net/ 
[alexott.net]
Twitter: alexott_en (English), alexott (Russian)



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Cassandra video tutorials for administrators.

2021-03-18 Thread Durity, Sean R
+1 for data modeling. If an admin can spend the day helping app teams get the 
model right BEFORE hitting production, those are the best days (and prevent the 
bad days of trying to engineer around a bad model/migrate data to new 
tables/etc)

I also find good value in understanding the availability guarantees of 
Cassandra and the underlying VMs/hardware for each application. However, app 
teams do not usually spend the necessary time to understand how to construct 
their connections to take full advantage of the powerful cluster they have. 
Learning about the connection policies of the various drivers is important. 
Then “encourage” the app team to actually test their availability in lower life 
cycles by simulating various failure scenarios.

Then there is monitoring – a beast of a subject… Trying to figure out what is 
actionable is a life-long journey. 😉

Fortunately, my “How to Become the Lord of the Rings” talk from Cassandra 
Summit 2019 is not available. You can avoid my ugly mug.

Sean Durity

From: Patrick McFadin 
Sent: Wednesday, March 17, 2021 9:34 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra video tutorials for administrators.

Hi Justine,

Welcome to the community! There is quite an extensive playlist here from an 
older DataStax Academy course on Cassandra admin: 
https://www.youtube.com/playlist?list=PL2g2h-wyI4SrHMlHBJVe_or_Ryek2THgQ 
[youtube.com]

An exact daily task list type video I haven't seen. In my experience the 
fundamental things to know for the day to day ops are running repairs and 
backups. More one off but useful topics are things like adding or removing 
nodes from the cluster or even restoring a backup. To be helpful with 
developers using the system, it's good to get a foundation in data modeling. 
You'll find that a lot of issues come down to just a bad data model and knowing 
what that looks like can save a lot of time. Here's a playlist for that: 
https://www.youtube.com/playlist?list=PL2g2h-wyI4SqIigskyJNAeL2vSTJZU_Qp 
[youtube.com]

Please feel free ask questions as the pop up.

Patrick

On Wed, Mar 17, 2021 at 6:24 PM 
mailto:justine...@biblicaltext.com>> wrote:
Hi Elliott,

Watching it now, this video is super super helpful, thanks for sharing. I was 
however thinking more about day to day maintenance issues, maybe that topic is 
not quite as sexy for a youtube video 😀   Maybe I should make one once I have 
become a bit more experience under my belt.

On Mar 18, 2021, at 6:27 AM, Elliott Sims 
mailto:elli...@backblaze.com>> wrote:

I'm a big fan of this one about LWTs: 
https://www.youtube.com/watch?v=wcxQM3ZN20c 
[youtube.com]
Not only if you want to understand LWTs, but also to get a better understanding 
of the sometimes-unintuitive consistency promises made and not made for non-LWT 
queries.

On Tue, Mar 16, 2021 at 11:53 PM 
mailto:justine...@biblicaltext.com>> wrote:
I know there is a lot of useful information out there, including on you tube. I 
am looking for recommendations for good introductory (but detailed) videos 
created by people who have cassandra cluster management, that outline all the 
day to day activities someone who is managing a cluster would understand and/or 
be doing.

I believe I have a reasonably good grasp of Cassandra, but I am in that “I 
don’t know what I don’t know phase” where there might be things that I am 
unaware I should understand.


—
Justine So
DevOps Engineer
https://biblicaltext.com/ 
[biblicaltext.com]



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other i

RE: No node was available to execute query error

2021-03-16 Thread Durity, Sean R
Sometimes time bucketing can be used to create manageable partition sizes. How 
much data is attached to a day, week, or minute? Could you use a partition and 
clustering key like: ((source, time_bucket), timestamp)?

Then your application logic can iterate through time buckets to pull out the 
data in scalable chunks:
Select column1, column2 from my_table where source = ‘PRIME SOURCE’ and 
time_bucket = ‘2021-03-15’;
Select column1, column2 from my_table where source = ‘PRIME SOURCE’ and 
time_bucket = ‘2021-03-16’
…

Also, there are implementations of Spark that will create the proper, single 
partition queries for large data sets. DataStax Analytics is one example (spark 
runs on each node).


Sean Durity – Staff Systems Engineer, Cassandra

From: Bowen Song 
Sent: Monday, March 15, 2021 5:27 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: No node was available to execute query error


There are different approaches, depending on the application's logic. Roughly 
speaking, there's two distinct scenarios:

  1.  Your application knows all the partition keys of the required data in 
advance, either by reading them from another data source (e.g.: another 
Cassandra table, other database, a file, or an API), or can reconstruct the 
partition keys from other known information (e.g.: sequential numbers, date 
time in a known range, etc.).
  2.  Your application needs all (or nearly all) rows from a given table, so 
you can use range requests to read everything out from that table.
However, before you choose the second option and create a table for each 
"source" value, I must warn you that creating hundreds of tables in Cassandra 
is a bad idea.

Ask yourself a question, what is really required to 'do something'? Do you 
really need all data each time? Is it possible to make 'do something' 
incremental, so you'll only need some data each time?


On 15/03/2021 19:33, Joe Obernberger wrote:

Thank you.
What is the best way to iterate over a very large number of rows in Cassandra?  
I know the datastax driver let's java do blocks of n records, but is that the 
best way?

-joe
On 3/15/2021 1:42 PM, Bowen Song wrote:

I personally try to avoid using secondary indexes, especially in large clusters.

SI is not scalable, because a SI query doesn't have the partition key 
information, Cassandra must send it to nearly all nodes in a DC to get the 
answer. Thus, the more nodes you have in a cluster, the slower and more 
expensive to run a SI query. Creating a SI on a table also can indirectly 
create large partitions in the index tables.


On 15/03/2021 17:27, Joe Obernberger wrote:

Great stuff - thank you.  I've spent the morning here redesigning with smaller 
partitions.

If I have a large number of unique IDs that I want to regularly 'do something' 
with, would it make sense to have a table where a UUID is the partition key, 
and create a secondary index on a field (call it source) that I want to select 
from where the number of UUIDs per source might be very large (billions).
So - select * from table where source=?
The number of unique source values is small - maybe 1000
Whereas each source may have billions of UUIDs.

-Joe


On 3/15/2021 11:18 AM, Bowen Song wrote:

To be clear, this

CREATE TABLE ... PRIMARY KEY (k1, k2);

is the same as:

CREATE TABLE ... PRIMARY KEY ((k1), k2);

but they are NOT the same as:

CREATE TABLE ... PRIMARY KEY ((k1, k2));

The first two statements creates a table with a partition key k1 and a 
clustering key k2. The 3rd statement creates a composite partition key from k1 
and k2, therefore k1 and k2 are the partition keys for this table.



Your example "create table xyz (uuid text, source text, primary key (source, 
uuid));" uses the same syntax as the first statement, which creates the table 
xyz with a partition key source, and a clustering key uuid (which, BTW, is a 
non-reserved keyword).



A partition in Cassandra is solely determined by the partition key(s), and the 
clustering key(s) have nothing to do with it. The size of a compacted partition 
is determined by the number of rows in the partition and the size of each row. 
If the table doesn't have a clustering key, each partition will have at most 
one row. The row size is the serialized size of all data in that row, including 
tombstones.



You can reduce the partition size for a table by either reducing the serialized 
data size or adding more columns to the (composite) partition keys. But please 
be aware, you will have to provide ALL partition key values when you read from 
or write to this table (other than range, SI or MV queries), therefore you will 
need to consider the queries before designing the table schema. For 
scalability, you will need predictable partition size that does not grow over 
time, or have an actionable plan to re-partition the table when the partition 
size exceeds a certain threshold. Picking the threshold is more of an art than 
science, generally speaking it should stay below a few 

RE: underutilized servers

2021-03-05 Thread Durity, Sean R
Are there specific queries that are slow? Partition-key queries should have 
read latencies in the single digits of ms (or faster). If that is not what you 
are seeing, I would first review the data model and queries to make sure that 
the data is modeled properly for Cassandra. Without metrics, I would start at 
16-20 GB of RAM for Cassandra on each node (or 31 GB if you can get 64 GB per 
host).

Since these are VMs, is there any chance they are competing for resources on 
the same physical host? In my (limited) VM experience, VMs can be 10x slower 
than physical hosts with local SSDs. (They don't have to be slower, but it can 
be harder to get visibility to the actual bottlenecks.)

I would also look to see what consistency level is being used with the queries. 
In most cases LOCAL_QUORUM or LOCAL_ONE is preferred.

Does the app use prepared statements that are only prepared once per app 
invocation? Any LWT/"if exists" in your code?


Sean Durity

From: Attila Wind 
Sent: Friday, March 5, 2021 9:48 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] underutilized servers


Hi guys,

I have a DevOps related question - hope someone here could give some 
ideas/pointers...

We are running a 3 nodes Cassandra cluster
Recently we realized we do have performance issues. And based on investigation 
we took it seems our bottleneck is the Cassandra cluster. The application layer 
is waiting a lot for Cassandra ops. So queries are running slow on Cassandra 
side however due to our monitoring it looks the Cassandra servers still have 
lots of free resources...

The Cassandra machines are virtual machines (we do own the physical hosts too) 
built with kvm - with 6 CPU cores (3 physical) and 32GB RAM dedicated to it.
We are using Ubuntu Linux 18.04 distro - everywhere the same version (the 
physical and virtual host)
We are running Cassandra 4.0-alpha4

What we see is

  *   CPU load is around 20-25% - so we have lots of spare capacity
  *   iowait is around 2-5% - so disk bandwidth should be fine
  *   network load is around 50% of the full available bandwidth
  *   loadavg is max around 4 - 4.5 but typically around 3 (because of the cpu 
count 6 should represent 100% load)

and still, query performance is slow ... and we do not understand what could 
hold Cassandra back to fully utilize the server resources...

We are clearly missing something!
Anyone any idea / tip?

thanks!
--
Attila Wind

http://www.linkedin.com/in/attilaw 
[linkedin.com]
Mobile: +49 176 43556932




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Cassandra timeouts 3.11.6

2021-01-26 Thread Durity, Sean R (US)
I would be looking at the queries in the application to see if there are any 
cross-partition queries (ALLOW FILTERING or IN clauses across partitions). This 
looks like queries that work fine with small scale, but are hitting timeouts 
when the data size has increased.
Also see if anyone has ad-hoc access to the cluster and is running some of 
those same kind of bad queries.

Sean Durity

From: MyWorld 
Sent: Tuesday, January 26, 2021 8:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Cassandra timeouts 3.11.6

Hi,
We have a cluster of 4 nodes all in one DC (apache cass version : 3.11.6).
Things were working fine till last month when all of a sudden we started facing 
Operation time outs at client end intermittently.

We have prometheus+grafana configured for monitoring.
On checking we found the following points:
1. Read/Write latency at coordinator level increase at the same time on one or 
multiple nodes.
2. jvm_threads_current increases at the same time on one or multiple nodes.
3. Cassandra hints storage  increases at the same time on one or multiple nodes.
4. Increase in Client Requests + Connection timeout  with dropped messages at 
times.
5. Increase in connected native clients count on all nodes

Things already checked :
1. No change in read/write requests.
2. No major change in table level read/write latency.
3. No spike on DB load or CPU utilization.
4. Memory usage is also normal.
5. GC Pauses are also normal.
6. No packet loss between nodes at network level

On checking detailed logs, we found majorly two types of messages during 
timeouts.
1. HintsDispatcher : Dispatching hints from one node to other
2.  READ messages were dropped in last 5000 ms: 0 internal and 1 cross node. 
Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 14831 
ms
3. StatusLogger messages.

Please suggest possible reasons for the same and action items.

Regards,
Ashish



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Node Size

2021-01-20 Thread Durity, Sean R
This is a great way to think through the problem and solution. I will add that 
part of my calculation on failure time is how long does it take to actually 
replace a drive and/or a server with (however many) drives? We pay for very 
fast vendor SLAs. However, in reality, there has been quite a bit more activity 
before any of those SLAs kicks in and then the hardware is actually ready for 
use by Cassandra. So, I calculate my needed capacity and preferred node sizes 
with those factors included. (This is for on-prem hardware, not a 
cloud-there’s-always-a-spare model.)


Sean Durity

From: Jeff Jirsa 
Sent: Wednesday, January 20, 2021 11:59 AM
To: cassandra 
Subject: [EXTERNAL] Re: Node Size


Not going to give a number other than to say that 1TB/instance is probably 
super super super conservative in 2021. The modern number is likely 
considerably higher. But let's look at this from first principles. There's 
basically two things to worry about here:

1) Can you get enough CPU/memory to support a query load over that much data, 
and
2) When that machine fails, what happens?

Let's set aside 1, because you can certainly find some query pattern that 
works, e.g. write-only with time window compaction or something where there's 
very little actual work to maintain state.

So focusing on 2, a few philosophical notes:

2.a) For each range, cassandra streams from one replica. That means if you use 
a single token and RF=3, you're probably streaming from 3 hosts at a time
2.b) In cassandra 0.whatever to 3.11, streaming during replacement presumed 
that you would only send a portion of each data file to the new node, so it 
deserialized and reserialized most of the contents, even if the whole file was 
being sent (in LCS, sending the whole file is COMMON; in TWCS / STCS, it's less 
common)
2.c) Each data file doing the partial file streaming ser/deser uses exactly one 
core/thread on the receiving side. Adding extra cpu doesnt speed up streaming 
when you have to serialize/deserialize.
2.d) The more disks you put into a system, the more likely it is that any disk 
on a host fails, so your frequency of failure will go up with more disks.

What's that mean?

The time it takes to rebuild a failed node depends on:
- Whether or not you're using vnodes (recalling that Joey at Netflix did some 
fun math that says lots of vnodes makes your chance of outage/dataloss go up 
very very quickly)
- Whether or not you're using LCS (recalling that LCS is super IO intensive 
compared to other compaction strategies)
- Whether or not you're running RAID on the host

Vnodes means more streaming sources, but also increases your chance of an 
outage with concurrent host failures.
LCS means streaming is faster, but also requires a lot more IO to maintain
RAID is ... well, RAID. You're still doing the same type of rebuild operation 
there, and losing capacity, so ... dont do that probably.

If you are clever enough to run more than one cassandra instance on the host, 
you protect yourself from the "bad" vnode behaviors (likelihood of an outage 
with 2 hosts down, ability to do simultaneous hosts joining/leaving/moving, 
etc), but it requires multiple IPs and a lot more effort.

So, how much data can you put onto a machine? Calculate your failure rate. 
Calculate your rebuild time. Figure out your chances of two failures in that 
same window, and the cost to your business of an outage/data loss if that were 
to happen. Keep adjusting fill sizes / ratios until you get numbers that work 
for you.



On Wed, Jan 20, 2021 at 7:59 AM Joe Obernberger 
mailto:joseph.obernber...@gmail.com>> wrote:

Thank you Sean and Yakir.  Is 4.x the same?

So if you were to build a 1PByte system, you would want 512-1024 nodes?  
Doesn't seem space efficient vs say 48TByte nodes where you would need ~21 
machines.
What would you do to build a 1PByte configuration?  I know there are a lot of - 
it depends - on that question, but say it was a write heavy, light read setup.  
Thank you!

-Joe
On 1/20/2021 10:06 AM, Durity, Sean R wrote:
Yakir is correct. While it is feasible to have large disk nodes, the practical 
aspect of managing them is an issue. With the current technology, I do not 
build nodes with more than about 3.5 TB of disk available. I prefer 1-2 TB, but 
costs/number of nodes can change the considerations.

Putting more than 1 node of Cassandra on a given host is also possible, but you 
will want to consider your availability if that hardware goes down. Losing 2 or 
more nodes with one failure is usually not good.

NOTE: DataStax has some new features for supporting much larger disks and 
alleviating many of the admin pains associated with it. I don’t have personal 
experience with it, yet, but I will be testing it soon. In my understanding it 
is for use cases with massive needs for disk, but low to moderate throughput 
(ie, where node expansion is only for dis

RE: Node Size

2021-01-20 Thread Durity, Sean R
Yakir is correct. While it is feasible to have large disk nodes, the practical 
aspect of managing them is an issue. With the current technology, I do not 
build nodes with more than about 3.5 TB of disk available. I prefer 1-2 TB, but 
costs/number of nodes can change the considerations.

Putting more than 1 node of Cassandra on a given host is also possible, but you 
will want to consider your availability if that hardware goes down. Losing 2 or 
more nodes with one failure is usually not good.

NOTE: DataStax has some new features for supporting much larger disks and 
alleviating many of the admin pains associated with it. I don’t have personal 
experience with it, yet, but I will be testing it soon. In my understanding it 
is for use cases with massive needs for disk, but low to moderate throughput 
(ie, where node expansion is only for disk, not additional traffic).

Sean Durity

From: Yakir Gibraltar 
Sent: Wednesday, January 20, 2021 9:21 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Node Size

It possible to use large nodes and it will work, the problem of large nodes 
will be:

  *   Maintenance like join/remove nodes will take more time.
  *   Larger heap
  *   etc.

On Wed, Jan 20, 2021 at 3:54 PM Joe Obernberger 
mailto:joseph.obernber...@gmail.com>> wrote:
Anyone know where I could find out more information on this?
Thanks!

-Joe

On 1/13/2021 8:42 AM, Joe Obernberger wrote:
> Reading the documentation on Cassandra 3.x there is recommendations
> that node size should be ~1TByte of data.  Modern servers can have 24
> SSDs, each at 2TBytes in size for data.  Is that a bad idea for
> Cassandra?  Does 4.0beta4 handle larger nodes?
> We have machines that have 16, 8TBytes SATA drives - would that be a
> bad server for Cassandra?  Would it make sense to run multiple copies
> of Cassandra on the same node in that case?
>
> Thanks!
>
> -Joe
>

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org


--
בברכה,
יקיר גיברלטר



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: unable to restore data from copied data directory

2021-01-04 Thread Durity, Sean R
This may not answer all your questions, but maybe it will help move you further 
along:
- you could copy the data (not system) folders *IF* the clusters match in 
topology. This would include the clusters having the same token range 
assignment(s). And you would have to copy the folders from one original node to 
the exact matching node in the second cluster. [To learn more, read about how 
Cassandra distributes data across the cluster. It will take effort to have 
exact matching clusters]
- If you cannot make an exact match in topology, investigate something like 
dsbulk for moving data in and out of clusters with whatever topology they have. 
This is a much more portable solution.
- I know that teams also do disk snapshots on cloud platforms as one back-up 
solution. They can attach that disk snapshot to a new VM (configured the same 
as the previous one) as needed. I don’t know all the particulars of this 
approach, though.

Sean Durity – Staff Systems Engineer, Cassandra

From: Manu Chadha 
Sent: Saturday, January 2, 2021 4:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: unable to restore data from copied data directory

Thanks. Shall I copy only system-schema folder? I tried copying all the folders 
and could think of the following issues I encountered


  1.  C* didnt’ start because the Cluster name by default is Test Cluster while 
the tables seem to refer to K8ssandra cluster “Saved cluster name k8ssandra != 
configured name Test Cluster”
  2.  Then I got this error – “Cannot start node if snitch's data center 
(datacenter1) differs from previous data center (dc1). Please fix the snitch 
configuration, decommission and rebootstrap this node or use the flag 
-Dcassandra.ignore_dc=true.”
  3.  At one point I also got error about no. of tokens (cannot change the 
number of tokens from 257 to 256).

It seems it is not straightforward that I just copy the folders. Any advice 
please?

Sent from Mail 
[go.microsoft.com]
 for Windows 10

From: Jeff Jirsa
Sent: 02 January 2021 20:57
To: user@cassandra.apache.org
Subject: Re: unable to restore data from copied data directory



On Jan 2, 2021, at 7:30 AM, Manu Chadha 
mailto:manu.cha...@hotmail.com>> wrote:

Hi

Can I just copy the keyspace folders into new cassandra installation s backup 
and restore strategy? I am trying to do that but it isn’t working.

I am using `K8ssandra` to run my single node C* cluster. I am experimenting 
with data backup and restore. Though K8ssandra uses medusa for data backup and 
restore, I could use it so I thought to test by simply copying/pasting the data 
directory. But I don’t see my data after restore. There could be mistakes in my 
approach so I am not really sure where to look. For example

  1.  K8ssandra uses Kubernetes’ persistent Volume Claims. Does that mean that 
the data is actually stored somewhere else and not in data directories of 
keyspaces?
  2.  Is there a way to look into the files in data directories of keyspaces to 
check what data is there. Maybe the data isn’t backed up properly.

The steps I did to copy the data are:
GKE cluster-> default-pool  -> found node running k8ssandra-dc1-default-sts-0 
container
Go to VM instances -> SSH to the node which is running 
k8ssandra-dc1-default-sts-0 container
Once SSHed, ran  “docker exec -it 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0
 /bin/bash”
I noticed that the container has Cassandra :
/opt/cassandra
./opt/cassandra/bin/cassandra
./opt/cassandra/javadoc/org/apache/cassandra
./var/lib/cassandra
./var/log/cassandra

cd opt/cassandra/data/data. There were directories for each keyspace. I assume 
that when taking backups we can take a copy of this data directory. Then once 
we need to restore, we can simply copy them back to new node’s data directory.

Note that I couldn’t run nodetool inside the container (nodetool flush or 
nodetool refresh) due to JMX issue. I don’t know how important it is to run the 
command. There is no traffic running on the systems though.

I copied data directory from OUTSIDE container (from the node) using “docker cp 
container name:src_path dest_path” (eg. docker cp 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0:/opt/cassandra/data/data
 backup/)

Then to transfer the backup directory to cloudshell (the console on web 
browser), I used “gcloud compute scp --recurse 
gke-k8ssandra-cluster-default-pool-1b1cc22a-rd6t:~/backup/data 
~/K8ssandra_data_backup”
Then I copied from cloudshell to my laptop/workstation, using cloudshell 
editor. This downloaded a tar of the backup (using a download link).

Then I downloaded a new .gz of C*3.11.6  on my laptop. After unzipping it, I 
noticed that it hasn’t got a data directory.

RE: how to choose tombstone_failure_threshold value if I want to delete billions of entries?

2020-11-20 Thread Durity, Sean R
Tombstone_failure_threshold is only for reads. If the tombstones are in 
different partitions, and you aren’t doing cross-partition reads, you shouldn’t 
need to adjust that value.

If disk space recovery is the goal, it depends on how available you need the 
data to be. The faster way is probably to unload the 2 billion you want to 
keep, truncate the table, reload the 2 billion. But you might have some data 
unavailable during the reload. Can the app tolerate that? Dsbulk can make this 
much faster than previous methods.

The tombstone + compaction method will take a while, and could get tricky if 
some nodes are close to the limit for compaction to actually occur. You would 
want to adjust gc_grace to a low (but acceptable) time and probably turn on 
unchecked_tombstone_compaction with a low tombstone threshold (0.1 or lower?). 
You would probably still need to force a major compaction to get rid of data 
where the tombstones are in different sstables than the original data (assuming 
size-tiered). This is all much more tedious, error-prone, and requires some 
attention to each node. If a node can’t compact, you might have to wipe it and 
rebuild/re-add it to the cluster.


Sean Durity

From: Pushpendra Rajpoot 
Sent: Friday, November 20, 2020 10:34 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] how to choose tombstone_failure_threshold value if I want 
to delete billions of entries?

Hi Team,

I have a table having approx 15 billions entries and I want to delete approx 13 
billions entries from it. I cannot write 13 billion tombstones in one go since 
there is a disk space crunch.

I am planning to delete data in chunks so I will be creating 400 millions 
tombstones in one go.

Now, I have 2 questions:

1. What is the optimal value of the tombstone_failure_threshold for the above 
scenario?
2. What is the best way to delete 13 billions entries in my case ?

Regards,
Pushpendra



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: local read from coordinator

2020-11-11 Thread Durity, Sean R
I appreciate the update to my understanding of the read path! Thanks, Jeff.


Sean Durity

From: Jeff Jirsa 
Sent: Wednesday, November 11, 2020 10:33 AM
To: cassandra 
Subject: [EXTERNAL] Re: local read from coordinator

What you describe is true for writes but not reads.

The read only gets sent to enough nodes to meet the consistency level, 
unless/until one of two things happen:
- You trigger probabilistic read repair, in which case it's sent to all nodes 
(or all nodes in a DC), or
- One of the chosen replicas is too slow to respond, in which case speculative 
retry triggers and extra read commands are sent.

EVEN IF it was sent to all hosts in a DC and it responded to the client when a 
sufficient CL was met (on reads), that doesn't ever guarantee that the 
coordinator is the first to respond (or is guaranteed to respond, could hit 
tombstone overwhelming or have very slow reads due to compaction / IO / gc / 
slow disks / bad memory / etc).

On Wed, Nov 11, 2020 at 6:36 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Doesn’t the read get sent to all nodes that own the data in parallel (from the 
coordinator)? And the first one that is able to respond wins (for LOCAL_ONE). 
That was my understanding.

Sean Durity

From: Jeff Jirsa mailto:jji...@gmail.com>>
Sent: Wednesday, November 11, 2020 9:24 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: local read from coordinator


This isn’t necessarily true and cassandra has no coordinator-only consistency 
level to force this behavior

(The snitch is going to pick the best option for local_one reads and any 
compactions or latency deviations from load will make it likely that another 
replica is chosen in practice)

On Nov 11, 2020, at 3:46 AM, Alex Ott 
mailto:alex...@gmail.com>> wrote:

if you force routing key, then the replica that owns the data will be selected 
as coordinator

On Wed, Nov 11, 2020 at 12:35 PM onmstester onmstester 
mailto:onmstes...@zoho.com.invalid>> wrote:
Thanx,

But i'm OK with coordinator part, actually i was looking for kind of read CL to 
force to read from the coordinator only with no other connections to other 
nodes!


Sent using Zoho Mail 
[zoho.com]<https://urldefense.com/v3/__https:/www.zoho.com/mail/__;!!M-nmYVHPHQ!dXosmgU9wrmIw7b4y8Xioii9P_OuSN-mvaCcAKCbdGEzSRBZO20Ibpqoyl9WNDHOGg-w1A4$>



 Forwarded message 
From: Alex Ott mailto:alex...@gmail.com>>
To: "user"mailto:user@cassandra.apache.org>>
Date: Wed, 11 Nov 2020 11:28:56 +0330
Subject: Re: local read from coordinator
 Forwarded message 

token-aware policy doesn't work for token range queries (at least in the Java 
driver 3.x).  You need to force the driver to do the reading using a specific 
token as a routing key.  Here is Java implementation of the token range 
scanning algorithm that Spark uses: 
https://github.com/alexott/cassandra-dse-playground/blob/master/driver-1.x/src/main/java/com/datastax/alexott/demos/TokenRangesScan.java
 
[github.com]<https://urldefense.com/v3/__https:/github.com/alexott/cassandra-dse-playground/blob/master/driver-1.x/src/main/java/com/datastax/alexott/demos/TokenRangesScan.java__;!!M-nmYVHPHQ!dXosmgU9wrmIw7b4y8Xioii9P_OuSN-mvaCcAKCbdGEzSRBZO20Ibpqoyl9WNDHOVaOupvg$>

I'm not aware if Python driver is able to set routing key explicitly, but 
whitelist policy should help



On Wed, Nov 11, 2020 at 7:03 AM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
Yes, use a token-aware policy so the driver will pick a coordinator where the 
token (partition) exists. Cheers!


--
With best wishes,Alex Ott
http://alexott.net/ 
[alexott.net]<https://urldefense.com/v3/__http:/alexott.net/__;!!M-nmYVHPHQ!dXosmgU9wrmIw7b4y8Xioii9P_OuSN-mvaCcAKCbdGEzSRBZO20Ibpqoyl9WNDHOea1qCGg$>
Twitter: alexott_en (English), alexott (Russian)




--
With best wishes,Alex Ott
http://alexott.net/ 
[alexott.net]<https://urldefense.com/v3/__http:/alexott.net/__;!!M-nmYVHPHQ!dXosmgU9wrmIw7b4y8Xioii9P_OuSN-mvaCcAKCbdGEzSRBZO20Ibpqoyl9WNDHOea1qCGg$>
Twitter: alexott_en (English), alexott (Russian)



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for 

RE: local read from coordinator

2020-11-11 Thread Durity, Sean R
Doesn’t the read get sent to all nodes that own the data in parallel (from the 
coordinator)? And the first one that is able to respond wins (for LOCAL_ONE). 
That was my understanding.

Sean Durity

From: Jeff Jirsa 
Sent: Wednesday, November 11, 2020 9:24 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: local read from coordinator


This isn’t necessarily true and cassandra has no coordinator-only consistency 
level to force this behavior

(The snitch is going to pick the best option for local_one reads and any 
compactions or latency deviations from load will make it likely that another 
replica is chosen in practice)


On Nov 11, 2020, at 3:46 AM, Alex Ott 
mailto:alex...@gmail.com>> wrote:

if you force routing key, then the replica that owns the data will be selected 
as coordinator

On Wed, Nov 11, 2020 at 12:35 PM onmstester onmstester 
mailto:onmstes...@zoho.com.invalid>> wrote:
Thanx,

But i'm OK with coordinator part, actually i was looking for kind of read CL to 
force to read from the coordinator only with no other connections to other 
nodes!


Sent using Zoho Mail 
[zoho.com]



 Forwarded message 
From: Alex Ott mailto:alex...@gmail.com>>
To: "user"mailto:user@cassandra.apache.org>>
Date: Wed, 11 Nov 2020 11:28:56 +0330
Subject: Re: local read from coordinator
 Forwarded message 

token-aware policy doesn't work for token range queries (at least in the Java 
driver 3.x).  You need to force the driver to do the reading using a specific 
token as a routing key.  Here is Java implementation of the token range 
scanning algorithm that Spark uses: 
https://github.com/alexott/cassandra-dse-playground/blob/master/driver-1.x/src/main/java/com/datastax/alexott/demos/TokenRangesScan.java
 
[github.com]

I'm not aware if Python driver is able to set routing key explicitly, but 
whitelist policy should help



On Wed, Nov 11, 2020 at 7:03 AM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
Yes, use a token-aware policy so the driver will pick a coordinator where the 
token (partition) exists. Cheers!


--
With best wishes,Alex Ott
http://alexott.net/ 
[alexott.net]
Twitter: alexott_en (English), alexott (Russian)




--
With best wishes,Alex Ott
http://alexott.net/ 
[alexott.net]
Twitter: alexott_en (English), alexott (Russian)



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Last stored value metadata table

2020-11-10 Thread Durity, Sean R
Lots of updates to the same rows/columns could theoretically impact read 
performance. One way to help counter that would be to use the 
LeveledCompactionStrategy to keep the table optimized for reads. It could keep 
your nodes busier with compaction – so test it out.


Sean Durity

From: Gábor Auth 
Sent: Tuesday, November 10, 2020 11:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Last stored value metadata table

Hi,

On Tue, Nov 10, 2020 at 5:29 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Updates do not create tombstones. Deletes create tombstones. The above scenario 
would not create any tombstones. For a full solution, though, I would probably 
suggest a TTL on the data so that old/unchanged data eventually gets removed 
(if that is desirable). TTLs can create tombstones, but should not be a major 
problem if expired data is relatively infrequent.

Okay, there are no tombstones (I misused the term) but every updated `value` 
are sitting in the memory and on the disk before the next compaction... Does it 
degrade the read performance?

--
Bye,
Auth Gábor (https://iotguru.cloud 
[iotguru.cloud]<https://urldefense.com/v3/__https:/iotguru.cloud__;!!M-nmYVHPHQ!cnmro4EqZM3gNHz8GNmzIFDZ29hTfdoqwoZbnVG07wpOhi2hoTNm7PeAyBGDvj0uEIfFPUA$>)



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Last stored value metadata table

2020-11-10 Thread Durity, Sean R

Hi,

On Tue, Nov 10, 2020 at 3:18 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
My answer would depend on how many “names” you expect. If it is a relatively 
small and constrained list (under a few hundred thousand), I would start with 
something like:

At the moment, the number of names is more than 10,000 but not than 100,000.

Create table last_values (
arbitrary_partition text, -- use an app name or something static to define the 
partition
name text,
value text,
last_upd_ts timestamp,
primary key (arbitrary_partition, name);

What is the purpose of the partition key?

--- This keeps the data in one partition so that you can retrieve all of it in 
one query (as you requested). If the partition key is just “name,” then you 
would need a query for each name:
select value, last_upd_ts from last_values where name = ‘name1’; //10,000+ 
queries and you have to know all the names

Since it is a single partition, you want to keep the partition size under 100 
MB (rule of thumb). That is why knowing the size/bounds of the data is 
important.

(NOTE: every insert would just overwrite the last value. You only keep the last 
one.)

This is the behavior that I want. :)

I’m assuming that your data arrives in time series order, so that it is easy to 
just insert the last value into last_values. If you have to read before write, 
that would be a Cassandra anti-pattern that needs a different solution. (Based 
on how regular the data points are, I would look at something time-series 
related with a short TTL.)

Okay, but as I know, this is the scenario when every update of the 
`last_values` generates two tombstones because of the update of the `value` and 
`last_upd_ts` field. Maybe I know it wrong?

--- Updates do not create tombstones. Deletes create tombstones. The above 
scenario would not create any tombstones. For a full solution, though, I would 
probably suggest a TTL on the data so that old/unchanged data eventually gets 
removed (if that is desirable). TTLs can create tombstones, but should not be a 
major problem if expired data is relatively infrequent.


--
Bye,
Auth Gábor (https://iotguru.cloud 
[iotguru.cloud]<https://urldefense.com/v3/__https:/iotguru.cloud__;!!M-nmYVHPHQ!cUsUhNSWD5CcKWc6jw_Kp5PVKFfxIuLHCfUbIMRRnQMl3BUoQjbKHo6dJdC2rUTFVdqqxdA$>)



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Last stored value metadata table

2020-11-10 Thread Durity, Sean R
My answer would depend on how many “names” you expect. If it is a relatively 
small and constrained list (under a few hundred thousand), I would start with 
something like:

Create table last_values (
arbitrary_partition text, -- use an app name or something static to define the 
partition
name text,
value text,
last_upd_ts timestamp,
primary key (arbitrary_partition, name);

(NOTE: every insert would just overwrite the last value. You only keep the last 
one.)

Then your query is easy:
Select name, value, last_upd_ts from last_values where arbitrary_partition = 
‘my_app_name’;

If the list of names is unbounded/large, then I would be asking, does the query 
really need every name/value pair? What other way could they grouped together 
in a reasonable partition? I would use that instead of the arbitrary_partition 
above and run multiple queries (one for each partition) if a massive list is 
actually required.

I’m assuming that your data arrives in time series order, so that it is easy to 
just insert the last value into last_values. If you have to read before write, 
that would be a Cassandra anti-pattern that needs a different solution. (Based 
on how regular the data points are, I would look at something time-series 
related with a short TTL.)


Sean Durity

From: Gábor Auth 
Sent: Tuesday, November 10, 2020 2:39 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Last stored value metadata table

Hi,

Short story: storing time series of measurements (key(name, timestamp), value).

The problem: get the list of the last `value` of every `name`.

Is there a Cassandra friendly solution to store the last value of every `name` 
in a separate metadata table? It will come with a lot of tombstones... any 
other solution? :)

--
Bye,
Auth Gábor



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: data modeling qu: use a Map datatype, or just simple rows... ?

2020-10-01 Thread Durity, Sean R
I’m a little late on this one, but I would choose approach 1. It is much more 
comprehensible to anyone who comes afterwards. And it should easily scale in 
Cassandra to whatever volume you have. I think I would call the table 
recent_users to make it very clear the purpose of the table. It is also 
extensible if you want to add new features like last_page_visited or 
last_purchase_date – just add new columns. Data expiration is automatic with 
the default TTL on the table.

Approach 2 could have tombstone issues within the map. And it would be hard to 
extend for new features. I think clean-up would require a separate process, 
too. I don’t think you can expire rows within a map column using TTL.

Sean Durity

From: Rahul Singh 
Sent: Saturday, September 19, 2020 10:41 AM
To: user@cassandra.apache.org; Attila Wind 
Subject: [EXTERNAL] Re: data modeling qu: use a Map datatype, or just simple 
rows... ?

Not necessarily. A deterministic hash randomizes a key that may be susceptible 
to “clustering” that also may need to be used in other non Cassandra systems.

This way records can be accessed in both systems while leveraging the 
partitioner in Cassandra without pitfalls.

The same can be done with natural string keys like “email.”

Best regards,
Rahul Singh

From: Sagar Jambhulkar 
mailto:sagar.jambhul...@gmail.com>>
Sent: Saturday, September 19, 2020 6:45:25 AM
To: user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>; Attila Wind 
mailto:attila.wind@swf.technology>>
Subject: Re: data modeling qu: use a Map datatype, or just simple rows... ?

Don't really see a difference in two options. Won't the partitioner run on user 
id and create a hash for you? Unless your hash function is better than 
partitioner.

On Fri, 18 Sep 2020, 21:33 Attila Wind, 
mailto:attilaw@swf.technology>> wrote:
Hey guys,
I'm curious about your experiences regarding a data modeling question we are 
facing with.
At the moment we see 2 major different approaches in terms of how to build the 
tables
But I'm googling around already for days with no luck to find any useful 
material explaining to me how a Map (as collection datatype) works on the 
storage engine, and what could surprise us later if we . So decided to ask this 
question... (If someone has some nice pointers here maybe that is also much 
appreciated!)
So
To describe the problem in a simplified form
 • Imagine you have users (everyone is identified with a UUID),
 • and we want to answer a simple question: "have we seen this guy before?"
 • we "just" want to be able to answer this question for a limited time - let's 
say for 3 months
 • but... there are lots of lots of users we run into... many millions / 
each day...
 • and ~15-20% of them are returning users only - so many guys we just 
might see once
We are thinking about something like a big big Map, in a form of
userId => lastSeenTimestamp

Obviously if we would have something like that then answering the above 
question is simply:
if(map.get(userId) != null)  => TRUE - we have seen the guy before
Regarding the 2 major modelling approaches I mentioned above

Approach 1
Just simply use a table, something like this

CREATE TABLE IF NOT EXISTS users (
user_idvarchar,
last_seenint,-- a UNIX timestamp is enough, thats 
why int
PRIMARY KEY (user_id)
) 
AND default_time_to_live = <3 months of seconds>;
Approach 2
 to do not produce that much rows, "cluster" the guys a bit together (into 1 
row) so
introduce a hashing function over the userId, producing a value btw [0; 1]
and go with a table like
CREATE TABLE IF NOT EXISTS users (
user_id_hashint,
users_seenmap,-- this is a userId => last 
timestamp map
PRIMARY KEY (user_id_hash)
) 
AND default_time_to_live = <3 months of seconds>;-- yes, its clearly 
not a good enough way ...

In theory:
 • on a WRITE path both representation gives us a way to do the write without 
the need of read
 • even the READ path is pretty efficient in both cases
 • Approach2 is worse definitely when we come to the cleanup - "remove info if 
older than 3 month"
 • Approach2 might affect the balance of the cluster more - thats clear 
(however not that much due to the "law of large number" and really enough 
random factors)
And what we are struggling around is: what do you think
Which approach would be better over time? So will slow down the cluster less 
considering in compaction etc etc
As far as we can see the real question is:
which hurts more?
 • much more rows, but very small rows (regarding data size), or
 • much less rows, but much bigger rows (regarding data size)
?
Any thoughts, comments, pointers to some related case studies, articles, etc is 
highly appreciated!! :-)
thanks!
--
Attila Wind

http://www.linkedin.com/in/attilaw 
[linkedin.com]

RE: Restore a table with dropped columns to a new cluster fails

2020-07-24 Thread Durity, Sean R
I would use dsbulk to unload and load. Then the schemas don’t really matter. 
You define which fields in the resulting file are loaded into which columns. 
You also won’t have the limitations and slowness of COPY TO/FROM.


Sean Durity

From: Mitch Gitman 
Sent: Friday, July 24, 2020 2:22 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Restore a table with dropped columns to a new cluster 
fails

I'm reviving this thread because I'm looking for a non-hacky way to migrate 
data from one cluster to another using nodetool snapshot and sstableloader 
without having to preserve dropped columns in the new schema. In my view, 
that's just cruft and confusion that keeps building.

The best idea I can come up with is to do the following in the source cluster:

  1.  Use the cqlsh COPY FROM command to export the data in the table.
  2.  Drop the table.
  3.  Re-create the table.
  4.  Use the cqlsh COPY TO command to import the data into the new incarnation 
of the table.

This approach is predicated on two assumptions:

  *   The re-created table has no knowledge of the history of the old table by 
the same name.
  *   The amount of data in the table doesn't exceed what the COPY command can 
handle.

If the dropped columns exist in the table in an environment where there's a lot 
of data, then we'd have to use some other mechanism to capture and reload the 
data.

If you see something wrong about this approach or you have a better way to do 
it, I'd be glad to hear from you.

On Tue, Feb 19, 2019 at 11:31 AM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
You can also manually add the dropped column to the appropriate table to 
eliminate the issue. Has to be done by a human, a new cluster would have no way 
of learning about a dropped column, and the missing metadata cannot be inferred.


On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims 
mailto:elli...@backblaze.com>> wrote:
When a snapshot is taken, it includes a "schema.cql" file.  That should be 
sufficient to restore whatever you need to restore.  I'd argue that neither 
automatically resurrecting a dropped table nor silently failing to restore it 
is a good behavior, so it's not unreasonable to have the user re-create the 
table then choose if they want to re-drop it.

On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger 
mailto:hkro...@gmail.com>> wrote:
Hi,

I would like to bring this issue to your attention.

Link to the ticket:
https://issues.apache.org/jira/browse/CASSANDRA-14336 
[issues.apache.org]

Basically if a table contains dropped columns and you try to restore a snapshot 
to a new cluster, that will fail because of an error like 
"java.lang.RuntimeException: Unknown column XXX during deserialization”.

I feel this is quite serious problem for backup and restore functionality of 
Cassandra. You cannot restore a backup to a new cluster if columns have been 
dropped.

There have been other similar tickets that have been apparently closed but 
based on my test with 3.11.4, the issue still persists.

Best Regards,
Hannu Kröger



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Running Large Clusters in Production

2020-07-13 Thread Durity, Sean R
I’m curious – is the scaling needed for the amount of data, the amount of user 
connections, throughput or what? I have a 200ish cluster, but it is primarily a 
disk space issue. When I can have (and administer) nodes with large disks, the 
cluster size will shrink.


Sean Durity

From: Isaac Reath (BLOOMBERG/ 919 3RD A) 
Sent: Monday, July 13, 2020 10:35 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Running Large Clusters in Production

Thanks for the info Jeff, all very helpful!
From: user@cassandra.apache.org At: 07/11/20 
12:30:36
To: user@cassandra.apache.org
Subject: Re: Running Large Clusters in Production

Gossip related stuff eventually becomes the issue

For example, when a new host joins the cluster (or replaces a failed host), the 
new bootstrapping tokens go into a “pending range” set. Writes then merge 
pending ranges with final ranges, and the data structures involved here weren’t 
necessarily designed for hundreds of thousands of ranges, so it’s likely they 
stop behaving at some point 
(https://issues.apache.org/jira/browse/CASSANDRA-6345 
[issues.apache.org]
 , https://issues.apache.org/jira/browse/CASSANDRA-6127 
[issues.apache.org]
   as an example, but there have been others)

Unrelated to vnodes, until cassandra 4.0, the internode messaging requires 
basically 6 threads per instance - 3 for ingress and 3 for egress, to every 
other host in the cluster. The full mesh gets pretty expensive, it was 
rewritten in 4.0 and that thousand number may go up quite a bit after that.


On Jul 11, 2020, at 9:16 AM, Isaac Reath (BLOOMBERG/ 919 3RD A) 
mailto:ire...@bloomberg.net>> wrote:

Thank you John and Jeff, I was leaning towards sharding and this really helps 
support that opinion. Would you mind explaining a bit more what about vnodes 
caused those issues?
From: user@cassandra.apache.org At: 07/10/20 
19:06:27
To: user@cassandra.apache.org
Cc: Isaac Reath (BLOOMBERG/ 919 3RD A )
Subject: Re: Running Large Clusters in Production

I worked on a handful of large clusters (> 200 nodes) using vnodes, and there 
were some serious issues with both performance and availability.  We had to put 
in a LOT of work to fix the problems.

I agree with Jeff - it's way better to manage multiple clusters than a really 
large one.


On Fri, Jul 10, 2020 at 2:49 PM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
1000 instances are fine if you're not using vnodes.

I'm not sure what the limit is if you're using vnodes.

If you might get to 1000, shard early before you get there. Running 8x100 host 
clusters will be easier than one 800 host cluster.


On Fri, Jul 10, 2020 at 2:19 PM Isaac Reath (BLOOMBERG/ 919 3RD A) 
mailto:ire...@bloomberg.net>> wrote:
Hi All,

I’m currently dealing with a use case that is running on around 200 nodes, due 
to growth of their product as well as onboarding additional data sources, we 
are looking at having to expand that to around 700 nodes, and potentially 
beyond to 1000+. To that end I have a couple of questions:

1) For those who have experienced managing clusters at that scale, what types 
of operational challenges have you run into that you might not see when 
operating 100 node clusters? A couple that come to mind are version (especially 
major version) upgrades become a lot more risky as it no longer becomes 
feasible to do a blue / green style deployment of the database and backup & 
restore operations seem far more error prone as well for the same reason 
(having to do an in-place restore instead of being able to spin up a new 
cluster to restore to).

2) Is there a cluster size beyond which sharding across multiple clusters 
becomes the recommended approach?

Thanks,
Isaac






The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or

RE: Safely enabling internode TLS encryption on live cassandra cluster

2020-07-06 Thread Durity, Sean R
I plan downtime for changes to security settings like this. I could not come up 
with a way to not have degraded access or inconsistent data or something else 
bad. The foundational issue is that unencrypted nodes cannot communicate with 
encrypted ones.

I depend on Cassandra’s high availability for many things, but I always caution 
my teams that security-related changes will usually require an outage. When I 
can have an outage window, this kind of change is very quick.

Sean Durity

From: Egan Neuhengen 
Sent: Monday, July 6, 2020 12:50 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Safely enabling internode TLS encryption on live cassandra 
cluster

Hello,

We are trying to come up with a safe way to turn on internode (NOT 
client-server) TLS encryption on a cassandra cluster with two datacenters, 
anywhere from 3 to 20 nodes in each DC, 3+ racks in each DC. Cassandra version 
is 3.11.6, OS is CentOS 7. We have full control over cassandra configuration 
and operation, and a decent amount of control over client driver configuration. 
We're looking for a way to enable internode TLS with no period of time in which 
clients cannot connect to the cluster or clients can connect but receive 
inconsistent or incorrect data results.

Our understanding is that in 3.11, cassandra internode TLS encryption 
configuration (server_encryption_options::internode_encryption) can be set to 
none, all, dc, or rack, and "none" means the node will only send and receive 
unencrypted data, any other involves varying scope of only sending and 
receiving encrypted data; an "optional" setting only appears in the unreleased 
4.0. The problem we run into is that no matter which scope we use, we end up 
with a period of time in which two different parts of the cluster won't be able 
to talk to each other, and so clients might get different answers depending on 
which part they talk to. In this scenario, clients can be shifted to talk to 
only one DC for a limited time, but cannot transition directly from only 
communicating with one DC to only communicating to the other; some period of 
time must be spent communicating to both, however small, between those two 
states.

Is there a way to do this while avoiding downtime and wrong-answer problems?



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Cassandra upgrade from 3.11.3 -> 3.11.6

2020-06-24 Thread Durity, Sean R
That seems like a lot of unnecessary streaming operations to me. I think 
someone said that streaming works between these 2 versions. But I would not use 
this approach. Why not an in-place upgrade?


Sean Durity

From: Jai Bheemsen Rao Dhanwada 
Sent: Wednesday, June 24, 2020 11:36 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra upgrade from 3.11.3 -> 3.11.6

Thank you all for the suggestions.

I am not trying to scale up the cluster for capacity but for the upgrade 
process instead of in place upgrade I am planning to add nodes with 3.11.6 and 
then decommission  the nodes with 3.11.3.

On Wednesday, June 24, 2020, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Streaming operations (repair/bootstrap) with different file versions is usually 
a problem. Running a mixed version cluster is fine – for the time you are doing 
the upgrade. I would not stay on mixed versions for any longer than that. It 
takes more time, but I separate out the admin tasks so that I can reason what 
should happen. I would either scale up or upgrade (depending on which is more 
urgent), then do the other.


Sean Durity

From: manish khandelwal 
mailto:manishkhandelwa...@gmail.com>>
Sent: Wednesday, June 24, 2020 5:52 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Cassandra upgrade from 3.11.3 -> 3.11.6

Rightly said by Surbhi, it is not good to scale with mixed versions as 
debugging issues will be very difficult.
Better to upgrade first and then scale.

Regards

On Wed, Jun 24, 2020 at 11:20 AM Surbhi Gupta 
mailto:surbhi.gupt...@gmail.com>> wrote:
In case of any issue, it gets very difficult to debug when we have multiple 
versions.

On Tue, 23 Jun 2020 at 22:23, Jürgen Albersdorfer 
mailto:jalbersdor...@gmail.com>> wrote:
Hi, I would say „It depends“ - as it always does. I have had a 21 Node Cluster 
running in Production in one DC with versions ranging from 3.11.1 to 3.11.6 
without having had any single issue for over a year. I just upgraded all nodes 
to 3.11.6 for the sake of consistency.
Von meinem iPhone gesendet

Am 24.06.2020 um 02:56 schrieb Surbhi Gupta 
mailto:surbhi.gupt...@gmail.com>>:


Hi ,

We have recently upgraded from 3.11.0 to 3.11.5 . There is a sstable format 
change from 3.11.4 .
We also had to expand the cluster and we also discussed about expansion first 
and than upgrade. But finally we upgraded and than expanded.
As per our experience what I could tell you is, it is not advisable to add new 
nodes on higher version.
There are many bugs which got fixed from 3.11.3 to 3.11.6.

Thanks
Surbhi

On Tue, Jun 23, 2020 at 5:04 PM Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>> wrote:
Hello,

I am trying to upgrade from 3.11.3 to 3.11.6.
Can I add new nodes with the 3.11.6  version to the cluster running with 3.11.3?
Also, I see the SSTable format changed from mc-* to md-*, does this cause any 
issues?




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct

RE: Cassandra upgrade from 3.11.3 -> 3.11.6

2020-06-24 Thread Durity, Sean R
Streaming operations (repair/bootstrap) with different file versions is usually 
a problem. Running a mixed version cluster is fine – for the time you are doing 
the upgrade. I would not stay on mixed versions for any longer than that. It 
takes more time, but I separate out the admin tasks so that I can reason what 
should happen. I would either scale up or upgrade (depending on which is more 
urgent), then do the other.


Sean Durity

From: manish khandelwal 
Sent: Wednesday, June 24, 2020 5:52 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra upgrade from 3.11.3 -> 3.11.6

Rightly said by Surbhi, it is not good to scale with mixed versions as 
debugging issues will be very difficult.
Better to upgrade first and then scale.

Regards

On Wed, Jun 24, 2020 at 11:20 AM Surbhi Gupta 
mailto:surbhi.gupt...@gmail.com>> wrote:
In case of any issue, it gets very difficult to debug when we have multiple 
versions.

On Tue, 23 Jun 2020 at 22:23, Jürgen Albersdorfer 
mailto:jalbersdor...@gmail.com>> wrote:
Hi, I would say „It depends“ - as it always does. I have had a 21 Node Cluster 
running in Production in one DC with versions ranging from 3.11.1 to 3.11.6 
without having had any single issue for over a year. I just upgraded all nodes 
to 3.11.6 for the sake of consistency.
Von meinem iPhone gesendet


Am 24.06.2020 um 02:56 schrieb Surbhi Gupta 
mailto:surbhi.gupt...@gmail.com>>:


Hi ,

We have recently upgraded from 3.11.0 to 3.11.5 . There is a sstable format 
change from 3.11.4 .
We also had to expand the cluster and we also discussed about expansion first 
and than upgrade. But finally we upgraded and than expanded.
As per our experience what I could tell you is, it is not advisable to add new 
nodes on higher version.
There are many bugs which got fixed from 3.11.3 to 3.11.6.

Thanks
Surbhi

On Tue, Jun 23, 2020 at 5:04 PM Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>> wrote:
Hello,

I am trying to upgrade from 3.11.3 to 3.11.6.
Can I add new nodes with the 3.11.6  version to the cluster running with 3.11.3?
Also, I see the SSTable format changed from mc-* to md-*, does this cause any 
issues?




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Cassandra Bootstrap Sequence

2020-06-02 Thread Durity, Sean R
As I understand it, Cassandra clusters should be limited to a number of tables 
in the low hundreds (under 200), at most. What you are seeing is the carving up 
of memtables for each of those 3,000. I try to limit my clusters to roughly 100 
tables.


Sean Durity

From: Jai Bheemsen Rao Dhanwada 
Sent: Tuesday, June 2, 2020 10:48 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence

3000 tables

On Tuesday, June 2, 2020, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
How many total tables in the cluster?


Sean Durity

From: Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>>
Sent: Monday, June 1, 2020 8:36 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence

Thanks Erick,

I see below tasks are being run mostly. I didn't quite understand what exactly 
these scheduled tasks are for? Is there a way to reduce the boot-up time or do 
I have to live with this delay?

$ zgrep "CompactionStrategyManager.java:380 - Recreating compaction strategy" 
debug.log*  | wc -l
3249
$ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" 
debug.log*  | wc -l
6293
$ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc -l
6308
$ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from DiskBoundaries" 
debug.log*  | wc -l
3249





On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
There's quite a lot of steps that takes place during the startup sequence 
between these 2 lines:

INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop

For the most part, it's taken up by CompactionStrategyManager and 
DiskBoundaryManager. If you check debug.log, you'll see that it's mostly 
updating disk boundaries. The length of time it takes is proportional to the 
number of tables in the cluster.

Have a look at this section [1] of CassandraDaemon if you're interested in the 
details of the startup sequence. Cheers!

[1] 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
 
[github.com]<https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$>



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Impact of enabling authentication on performance

2020-06-02 Thread Durity, Sean R
To flesh this out a bit, I set roles_validity_in_ms and 
permissions_validity_in_ms to 360 (10 minutes). The default of 2000 is far 
too often for my use cases. Usually I set the RF for system_auth to 3 per DC. 
On a larger, busier cluster I have set it to 6 per DC. NOTE: if you set the 
validity higher, it may take that amount of time before a change in password or 
table permissions is picked up (usually less).


Sean Durity

-Original Message-
From: Jeff Jirsa 
Sent: Tuesday, June 2, 2020 2:39 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Impact of enabling authentication on performance

Set the Auth cache to a long validity

Don’t go crazy with RF of system auth

Drop bcrypt rounds if you see massive cpu spikes on reconnect storms


> On Jun 1, 2020, at 11:26 PM, Gil Ganz  wrote:
>
> 
> Hi
> I have a production 3.11.6 cluster which I'm might want to enable 
> authentication in, I'm trying to understand what will be the performance 
> impact, if any.
> I understand each use case might be different, trying to understand if there 
> is a common % people usually see their performance hit, or if someone has 
> looked into this.
> Gil

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Cassandra Bootstrap Sequence

2020-06-02 Thread Durity, Sean R
How many total tables in the cluster?


Sean Durity

From: Jai Bheemsen Rao Dhanwada 
Sent: Monday, June 1, 2020 8:36 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence

Thanks Erick,

I see below tasks are being run mostly. I didn't quite understand what exactly 
these scheduled tasks are for? Is there a way to reduce the boot-up time or do 
I have to live with this delay?

$ zgrep "CompactionStrategyManager.java:380 - Recreating compaction strategy" 
debug.log*  | wc -l
3249
$ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" 
debug.log*  | wc -l
6293
$ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc -l
6308
$ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from DiskBoundaries" 
debug.log*  | wc -l
3249





On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
There's quite a lot of steps that takes place during the startup sequence 
between these 2 lines:

INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop

For the most part, it's taken up by CompactionStrategyManager and 
DiskBoundaryManager. If you check debug.log, you'll see that it's mostly 
updating disk boundaries. The length of time it takes is proportional to the 
number of tables in the cluster.

Have a look at this section [1] of CassandraDaemon if you're interested in the 
details of the startup sequence. Cheers!

[1] 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
 
[github.com]



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Issues, understanding how CQL works

2020-04-22 Thread Durity, Sean R
I thought this might be a single-time use case request. I think my first 
approach would be to use something like dsbulk to unload the data and then 
reload it into a table designed for the query you want to do (as long as you 
have adequate disk space). I think like a DBA/admin first. Dsbulk creates csv 
files, so you could move that data to any kind of database, if you chose.

An alternative approach would be to use a driver that supports paging (I think 
this would be most of them) and write a program to walk the data set and output 
what you need in whatever format you need.

Or, since this is a single node scenario, you could try sstable2json to export 
the sstables (files on disk) into JSON, if that is a more workable format for 
you.

Sean Durity – Staff Systems Engineer, Cassandra

-Original Message-
From: Marc Richter 
Sent: Wednesday, April 22, 2020 6:22 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Issues, understanding how CQL works

Hi Jeff,

thank you for your exhaustive and verbose answer!
Also, a very big "Thank you!" to all the other replyers; I hope you
understand that I summarize all your feedback in this single answer.

 From what I understand from your answers, Cassandra seems to be
optimized to store (and read) data in only exactly that way that the
data structure has been designed for. That makes it very inflexible, but
allows it to do that single job very effectively for a trade-off.

I also understand, the more I dig into Cassandra, that the team I am
supporting is using Cassandra kind of wrong; they for example do have
only one node and so do not use neither the load-balancing, nor the
redundancy-capabilities Cassandra offers.
Thus, maybe relevant side-note: All the data resides on just one single
node; maybe that info is important, because we know on which node the
data is (I know that Cassandra internally is applying the same Hashing -
Voodoo as if there were 1k nodes, but maybe this is important anyways).

Anyways: I do not really care if a query or effort to find this
information is sub-optimal or very "expensive" in means of effectivity
or system-load, since this isn't something that I need to extract on a
regular basis, but only once. Due to that, it doesn't need to be optimal
or effective; I also do not care if it blocks the node for several
hours, since Cassandra is only working on this single request. I really
need this info (most recent "insertdate") only once.
Is, considering this, a way to do that?

 > Because you didnt provide a signalid and monthyear, it doesn't know
 > which machine in your cluster to use to start the query.

I know this already; thanks for confirming that I got this correct! But
what do I do then if I do not know all "signalid"s? How to learn them?

Is it maybe possible to get a full list of all "signalid"s? Or is it
possible to "re-arrange" the data in the cluster or something that
enables me to learn what's the most recent "insertdate"?
I really do not care if I need to do some expensive copy-all-data -
move, but I do not know about what is possible and how to do that.

Best regards,
Marc Richter

On 21.04.20 19:20, Jeff Jirsa wrote:
>
>
> On Tue, Apr 21, 2020 at 6:20 AM Marc Richter  > wrote:
>
> Hi everyone,
>
> I'm very new to Cassandra. I have, however, some experience with SQL.
>
>
> The biggest thing to remember is that Cassandra is designed to scale out
> to massive clusters - like thousands of instances. To do that, you can't
> assume it's ever ok to read all of the data, because that doesn't scale.
> So cassandra takes shortcuts / optimizations to make it possible to
> ADDRESS all of that data, but not SCAN it.
>
>
> I need to extract some information from a Cassandra database that has
> the following table definition:
>
> CREATE TABLE tagdata.central (
> signalid int,
> monthyear int,
> fromtime bigint,
> totime bigint,
> avg decimal,
> insertdate bigint,
> max decimal,
> min decimal,
> readings text,
> PRIMARY KEY (( signalid, monthyear ), fromtime, totime)
> )
>
>
> What your primary key REALLY MEANS is:
>
> The database on reads and writes will hash(signalid+monthyear) to find
> which hosts have the data, then
>
> In each data file, the data for a given (signalid,monthyear) is stored
> sorted by fromtime and totime
>
> The database is already of round about 260 GB in size.
> I now need to know what is the most recent entry in it; the correct
> column to learn this would be "insertdate".
>
> In SQL I would do something like this:
>
> SELECT insertdate FROM tagdata.central
> ORDER BY insertdate DESC LIMIT 1;
>
> In CQL, however, I just can't get it to work.
>
> What I have tried already is this:
>
> SELECT insertdate FROM "tagdata.central"
> ORDER BY insertdate DESC LIMIT 1;
>
>
> Because you didnt provide a signalid and monthyear, it doesn't know
> which machine in your cluster to us

RE: Multi DC replication between different Cassandra versions

2020-04-16 Thread Durity, Sean R
I agree – do not aim for a mixed version as normal. Mixed versions are fine 
during an upgrade process, but the goal is to complete the upgrade as soon as 
possible.

As for other parts of your plan, the Kafka Connector is a “sink-only,” which 
means that it can only insert into Cassandra. It doesn’t go the other way.

I usually suggest that if the data is needed in two (or more) places, that the 
application write to a queue. Then, let the queue feed all the downstream 
destinations.


Sean Durity – Staff Systems Engineer, Cassandra

From: Christopher Bradford 
Sent: Thursday, April 16, 2020 1:13 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Multi DC replication between different Cassandra 
versions

It’s worth noting there can be issues with streaming between different versions 
of C*. Note this excerpt from
https://thelastpickle.com/blog/2019/02/26/data-center-switch.html 
[thelastpickle.com]

Note that with an upgrade it’s important to keep in mind that streaming in a 
cluster running mixed versions of Casandra is not recommended

Emphasis mine. With the approach you’re suggesting streaming would be involved 
both during bootstrap and repair. Would it be possible to upgrade to a more 
recent release prior to pursuing this course of action?

On Thu, Apr 16, 2020 at 1:02 AM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
I don't mean any disrespect but let me offer you a friendly advice -- don't do 
it to yourself. I think you would have a very hard time finding someone who 
would recommend implementing a solution that involves mixed versions. If you 
run into issues, it would be hell trying to unscramble that egg.

On top of that, Cassandra 3.0.9 is an ancient version released 4 years ago 
(September 2016). There are several pages of fixes deployed since then. So in 
the nicest possible way, what you're planning to do is not a good idea. I 
personally wouldn't do it. Cheers!
--

Christopher Bradford




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Table not updating

2020-03-24 Thread Durity, Sean R
Oh, I see it was clock drift in this case. Glad you found that out.

Sean Durity

From: Durity, Sean R 
Sent: Tuesday, March 24, 2020 2:10 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Table not updating

I’m wondering about nulls. They are written as tombstones. So, it is an 
interesting question for a prepared statement where you are not binding all the 
variables. The driver or framework might be doing something you don’t expect.

Sean Durity

From: Sebastian Estevez 
mailto:sebastian.este...@datastax.com>>
Sent: Monday, March 23, 2020 9:02 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Table not updating

I have seen cases where folks thought they were writing successfully to the 
database but were really hitting timeouts due to an unhandled future in their 
loading program. This may very well not be your issue but it's common enough 
that I thought I would mention it.

Hope you get to the bottom of it!


All the best,





Sebastián Estévez


On Mon, Mar 23, 2020 at 8:50 PM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
You need to see what's in that place, it could be:

1) Delete in the future (viewable with SELECT WRITETIME(column) ...). This 
could be clock skew or using the wrong resolution timestamps (millis vs micros)
2) Some form of corruption if you dont have compression + crc check chance. 
It's possible (but unlikely) that you can have a really broken data file that 
simulates a deletion marker. You may be able to find this with sstable2json 
(older versions) or sstabledump (3.0+)

sstabledump your data files that have the key (nodetool getendpoints, nodetool 
getsstables, sstabledump), look for something unusual.



On Mon, Mar 23, 2020 at 4:00 PM Oliver Herrmann 
mailto:o.herrmann...@gmail.com>> wrote:
Hello,
we are facing a strange issue in one of our Cassandra clusters.
We are using prepared statements to update a table with consistency local 
quorum. When updating some tables it happes very often that data values are not 
written to the database. When verifying the table using cqlsh (with consistency 
all) the row does not exist.
When using the prepared statements we do not bind values to all placeholder for 
data columns but I think this should not be a problem, right?
I checked system.log and debug.log for any hints but nothing is written into 
these log files.
It's only happening in one specific cluster. When running the same software in 
other clusters everything is working fine.

We are using Cassanda server version 3.11.1 and datastax cpp driver 2.13.0.

Any idea how to analyze/fix this problem?
Regards
Oliver




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Table not updating

2020-03-24 Thread Durity, Sean R
I’m wondering about nulls. They are written as tombstones. So, it is an 
interesting question for a prepared statement where you are not binding all the 
variables. The driver or framework might be doing something you don’t expect.

Sean Durity

From: Sebastian Estevez 
Sent: Monday, March 23, 2020 9:02 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Table not updating

I have seen cases where folks thought they were writing successfully to the 
database but were really hitting timeouts due to an unhandled future in their 
loading program. This may very well not be your issue but it's common enough 
that I thought I would mention it.

Hope you get to the bottom of it!


All the best,





Sebastián Estévez


On Mon, Mar 23, 2020 at 8:50 PM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
You need to see what's in that place, it could be:

1) Delete in the future (viewable with SELECT WRITETIME(column) ...). This 
could be clock skew or using the wrong resolution timestamps (millis vs micros)
2) Some form of corruption if you dont have compression + crc check chance. 
It's possible (but unlikely) that you can have a really broken data file that 
simulates a deletion marker. You may be able to find this with sstable2json 
(older versions) or sstabledump (3.0+)

sstabledump your data files that have the key (nodetool getendpoints, nodetool 
getsstables, sstabledump), look for something unusual.



On Mon, Mar 23, 2020 at 4:00 PM Oliver Herrmann 
mailto:o.herrmann...@gmail.com>> wrote:
Hello,
we are facing a strange issue in one of our Cassandra clusters.
We are using prepared statements to update a table with consistency local 
quorum. When updating some tables it happes very often that data values are not 
written to the database. When verifying the table using cqlsh (with consistency 
all) the row does not exist.
When using the prepared statements we do not bind values to all placeholder for 
data columns but I think this should not be a problem, right?
I checked system.log and debug.log for any hints but nothing is written into 
these log files.
It's only happening in one specific cluster. When running the same software in 
other clusters everything is working fine.

We are using Cassanda server version 3.11.1 and datastax cpp driver 2.13.0.

Any idea how to analyze/fix this problem?
Regards
Oliver




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Performance of Data Types used for Primary keys

2020-03-06 Thread Durity, Sean R
I agree. Cassandra already hashes the partition key to a numeric token.

Sean Durity

From: Jon Haddad 
Sent: Friday, March 6, 2020 9:29 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Performance of Data Types used for Primary keys

It's not going to matter at all.

On Fri, Mar 6, 2020, 2:15 AM Hanauer, Arnulf, Vodacom South Africa (External) 
mailto:arnulf.hana...@vcontractor.co.za>> 
wrote:
Hi Cassandra folks,

Is there any difference in performance of general operations if using a TEXT 
based Primary key versus a BIGINT Primary key.

Our use-case requires low latency reads but currently the Primary key is TEXT 
based but the data could work on BIGINT. We are trying to optimise where 
possible.
Any experiences that could point to a winner?


Kind regards
Arnulf Hanauer









"This e-mail is sent on the Terms and Conditions that can be accessed by 
Clicking on this link https://webmail.vodacom.co.za/tc/default.html 
[vodacom.co.za]
 "



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Cassandra 3.11.X upgrades

2020-03-04 Thread Durity, Sean R
I agree – a back out becomes practically very challenging after the second node 
is upgraded, because the new data is written in the new disk format. To satisfy 
the “you must have a backout” rules, I just say that after node 1, I could stop 
that node, wipe the data, downgrade the binaries, and replace that node back to 
the original version (and yes, there could still be consistency problems with 
that). There is no going back after node 2. And I have never needed to try and 
go back, either. Test well in NP, and be ready to tackle any PR problems to 
keep going forward.

Sean Durity

From: Erick Ramirez 
Sent: Tuesday, March 3, 2020 11:35 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Cassandra 3.11.X upgrades

Should upgradesstables not be run after every node is upgraded? If we need to 
rollback then  we will not be able to downgrade sstables to older version

You can choose to (a) upgrade the SSTables one node at a time as you complete 
the binary upgrade, or (b) upgrade the binaries on all nodes then perform the 
SSTables upgrade in one hit. The choice is up to you but beware of the 
following caveats:
- there's a performance hit when C* reads "n - 1" versions of SSTables so the 
sooner you do it the better
- upgrading SSTables one node at a time is preferable due to the considerable 
performance hit or schedule it during low traffic periods

The idea of a rollback isn't what you're accustomed to. There is no concept of 
"downgrade" in Cassandra. If you decide to rollback or backout of an upgrade 
implementation, it means that you have to restore your cluster from backups so 
be aware of that too. This is because once you've performed an upgrade, things 
like schema and system tables are generally no longer backward-compatible. 
Also, new incoming mutations are written in the new format which again is not 
backward-compatible. To cut to the chase -- the decision to upgrade the 
SSTables has zero bearing on rollback. Cheers!




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: IN OPERATOR VS BATCH QUERY

2020-02-21 Thread Durity, Sean R
Batches are for atomicity, not performance.

I would do single deletes with a prepared statement. An IN clause causes extra 
work for the coordinator because multiple partitions are being impacted. So, 
the coordinator has to coordinate all nodes involved in those writes (up to the 
whole cluster). Availability and performance are compromised for multiple 
partition operations. I do not allow them.

Also – TTL at insert (or update) is a much better solution than large purge 
strategies. As someone who spent a month wrangling hundreds of billions of 
deletes, I am an ardent preacher of TTL during design time.

Sean Durity

From: Attila Wind 
Sent: Friday, February 21, 2020 2:52 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: IN OPERATOR VS BATCH QUERY

Hi Sergio,

AFAIK you use batches when you want to get "all or nothing" approach from 
Cassandra. So turning multiple statements into one atomic operation.

One very typical use case for this is when you have denormalized data in 
multiple tables (optimized for different queries) but you need to modify all of 
them the same way as they were just one entity.

This means that if any ofyour delete statements would fail for whatever reason 
then all of your delete statements would be rolled back.

I think you dont want that overhead here for sure...

We are not there yet with our development but we will need similar "cleanup" 
functionality soon.
I was also thinking about the IN operator for similar cases but I am curious if 
anyone here has better idea...
Why does the IN operator blowing up the coordinator? I do not entirely get it...

Thanks
Attila

Sergio mailto:lapostadiser...@gmail.com>> ezt írta 
(időpont: 2020. febr. 21., P 3:44):
The current approach is delete from key_value where id = whatever and it is 
performed asynchronously from the client.
I was thinking to reduce at least the network round-trips between client  and 
coordinator with that Batch approach. :)

In any case, I would test it it will improve or not. So when do you use batch 
then?

Best,

Sergio

On Thu, Feb 20, 2020, 6:18 PM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
Batches aren't really meant for optimisation in the same way as RDBMS. If 
anything, it will just put pressure on the coordinator having to fire off 
multiple requests to lots of replicas. The IN operator falls into the same 
category and I personally wouldn't use it with more than 2 or 3 partitions 
because then the coordinator will suffer from the same problem.

If it were me, I'd just issue single-partition deletes and throttle it to a 
"reasonable" throughput that your cluster can handle. The word "reasonable" is 
in quotes because only you can determine that magic number for your cluster 
through testing. Cheers!



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Null values in sasi indexed column

2020-02-21 Thread Durity, Sean R
I would consider building a lookup table instead. Something like:
Create table new_lookup (
   new-lookup-partition text,
   existing-key text
   PRIMARY KEY (new-lookup-partition)
)

For me, these are easier to understand and reason through for Cassandra 
performance and availability. I would use this approach for up to 3 or 4 
different lookup patterns. If it got to be more than that, I would be using DSE 
Search/SOLR.

Just be warned, I have seen teams asking for these kind of options just because 
they are guessing the access patterns they want. If they cannot identify their 
access patterns, I encourage them to use other technologies. Otherwise the pain 
will be great.


Sean Durity

From: Erick Ramirez 
Sent: Wednesday, February 19, 2020 6:58 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Null values in sasi indexed column

Rahul, in my opinion SASI is an experimental feature and isn't ready for 
primetime yet. It has some advantages over secondary indexes but if it were me, 
I'd stick with native secondary indexes. But test, test and test so you can 
make an informed decision on what works for your use case. Cheers!


Erick Ramirez  |  Developer Relations

erick.rami...@datastax.com | datastax.com 
[datastax.com]
[https://lh4.googleusercontent.com/GivrE4j_1bWvQnZP67Zpa5eomhEeKON-X6kFljLxDatL7QPL_aineBJzM_rXzrqNQkENnZt7KyXLROlLTHuMM3OFNlZ8IrW-adjXKRiD7ojG6OjjFoLio3HbKwVwXt7_Qna02H8I][linkedin.com]
 
[https://lh6.googleusercontent.com/0juOULc-Qhs6qzVY5mN0tzIMZ4w17Jv2fiV5xboewGBH0SFiEwV0uPTO_W5OwGr0jCOXmoJLBq1aNLsr1oChLMgJNvNt1e4bHxO2KJUK-iagQ4jw9SiuTMmpktVSfygdLS_vQe6v]
 
[facebook.com]
 
[https://lh5.googleusercontent.com/IdGeRVBWRf50wPOny50XQ3O0rtkebOh8e2D9DCanVuy-f3a-wpI3PpQJnGtVFL5aHPOwm4hsginvqhQfTXnP_XT_8fuQWS6Mt0KKRFkRANDhS22T3UiXpGfBkMHJxy48ZQJFaXsZ]
 
[twitter.com]
 
[https://lh4.googleusercontent.com/PbPMGIQsTltjGio5a_e7dp35l6ysZMG_ib69EUHmIvbXHXzRkrNKNMfMR8uwSS1AAoQaG6xn96PH-L1wLQE8FBLSjN_g10Q8y0n1Tp5SYtKO3L1JDN_T73HgSSQJayqn7YMTFXn-]
 
[feeds.feedburner.com]
 
[https://lh5.googleusercontent.com/Rnk5QTWTovfX-z1uRr0FQjt17VnMURyI8rDCi4rTJUY5lnX-QevuQWTFa39GS9fJCMqP0SXSkCLtKf064p0-59f80PmA2hZRqGRFFOlZlbJzXv2EevvdbeKYFq4s9g5zzP54KKQB]
 
[github.com]

[https://datastax.com/sites/default/files/content/graphics/email/datastax-email-signature-2019.jpg][datastax.com]



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Mechanism to Bulk Export from Cassandra on daily Basis

2020-02-21 Thread Durity, Sean R
I would also push for something besides a full refresh, if at all possible. It 
feels like a waste of resources to me – and not predictably scalable. 
Suggestions: use a queue to send writes to both systems. If the downstream 
system doesn’t handle TTL, perhaps set an expiration date and a purge query on 
the downstream target.

If you have to do the full refresh, perhaps a Spark job would be a decent 
solution. I would probably create a separate DC (with a lower replication 
factor and smaller number of nodes) just to handle the analytical/unload kind 
of workload (if the other functions of the cluster might be impacted by the 
unload).

DSBulk from DataStax is very fast and scriptable, too.

Sean Durity – Staff Systems Engineer, Cassandra

From: JOHN, BIBIN 
Sent: Wednesday, February 19, 2020 5:25 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Mechanism to Bulk Export from Cassandra on daily Basis

Thank you for suggestion. Full refresh is currently designed because with delta 
we cannot identify what got deleted. So downstreams prefer full data everyday.


Thanks
Bibin John

From: Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>>
Sent: Wednesday, February 19, 2020 3:14 PM
To: user@cassandra.apache.org
Subject: Re: Mechanism to Bulk Export from Cassandra on daily Basis

To the question of ‘best approach’, so far the comments have been about 
alternatives in tools.

Another axis you might want to consider is from the data model viewpoint.  So, 
for example, let’s say you have 600M rows.  You want to do a daily transfer of 
data for some reason.  First question that comes to mind is, do you need all 
the data every day?  Usually that would only be the case if all of the data is 
at risk of changing.

Generally the way I’d cut down the pain on something like this is to figure out 
if the data model currently does, or could be made to, only mutate in a limited 
subset.  Then maybe all you are transferring are the daily changes.  Systems 
based on catching up to daily changes will usually be pulling single-digit 
percentages of data volume compared to the entire storage footprint.  That’s 
not only a lot less data to pull, it’s also a lot less impact on the ongoing 
operations of the cluster while you are pulling that data.

R

From: "JOHN, BIBIN" mailto:bj9...@att.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, February 19, 2020 at 1:13 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Mechanism to Bulk Export from Cassandra on daily Basis

Message from External Sender
Team,
We have a requirement to bulk export data from Cassandra on daily basis? Table 
contain close to 600M records and cluster is having 12 nodes. What is the best 
approach to do this?


Thanks
Bibin John



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Cassandra 3.11.X upgrades

2020-02-13 Thread Durity, Sean R
+1 on nodetool drain. I added that to our upgrade automation and it really 
helps with post-upgrade start-up time.

Sean Durity

From: Erick Ramirez 
Sent: Wednesday, February 12, 2020 10:29 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Cassandra 3.11.X upgrades

Yes to the steps. The only thing I would add is to run a nodetool drain before 
shutting C* down so all mutations are flushed to SSTables and there won't be 
any commit logs to replay on startup.

Also, the usual "backup your cluster and configuration files" boilerplate 
applies. 😁



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Cassandra Encyrption between DC

2020-02-13 Thread Durity, Sean R
I will just add-on that I usually reserve security changes as the primary 
exception where app downtime may be necessary with Cassandra. (DSE has some 
Transitional tools that are useful, though.) Sometimes a short outage is 
preferred over a longer, more-complicated attempt to keep the app up. And, in 
many cases, there is no way to guarantee availability when making 
security-related changes (new cipher suites, adding encryption, turning on 
authentication, etc.). It is better to try and have those implemented from the 
beginning, where possible.


Sean Durity

From: Erick Ramirez 
Sent: Wednesday, February 12, 2020 9:02 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Encyrption between DC

I've just seen your questions on ASF Slack and didn't immediately make the 
connection that this post in the mailing list is one and the same. I understand 
what you're doing now -- you have an existing DC with no encryption and you 
want to add a new DC with encryption enabled but don't want the downtime 
associated with enabling encryption on the existing DC.

As driftx, exlt, myself & co pointed out, there isn't a "transitional path" of 
implementing it without downtime in the current (released) versions of C*. 
Cheers!



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Cassandra 3.11.X upgrades

2020-02-12 Thread Durity, Sean R
Ah - I should have looked it up! Thank you for fixing my mistake.

Sean Durity

-Original Message-
From: Michael Shuler 
Sent: Wednesday, February 12, 2020 3:17 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Cassandra 3.11.X upgrades

On 2/12/20 12:58 PM, Durity, Sean R wrote:
> Check the readme.txt for any upgrade notes

Just a quick correction:

NEWS.txt (upgrade (and other important) notes)
CHANGES.txt (changelog with JIRAs)

This is why we list links to these two files in the release announcements.

--
Kind regards,
Michael

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Cassandra 3.11.X upgrades

2020-02-12 Thread Durity, Sean R
>>A while ago, on my first cluster

Understatement used so effectively. Jon is a master.



On Wed, Feb 12, 2020 at 11:02 AM Sergio 
mailto:lapostadiser...@gmail.com>> wrote:
Thanks for your reply!

So unless the sstable format has not been changed I can avoid to do that.

Correct?

Best,

Sergio

On Wed, Feb 12, 2020, 10:58 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Check the readme.txt for any upgrade notes, but the basic procedure is to:

  *   Verify that nodetool upgradesstables has completed successfully on all 
nodes from any previous upgrade
  *   Turn off repairs and any other streaming operations (add/remove nodes)
  *   Stop an un-upgraded node (seeds first, preferably)
  *   Install new binaries and configs on the down node
  *   Restart that node and make sure it comes up clean (it will function 
normally in the cluster – even with mixed versions)
  *   Repeat for all nodes
  *   Run upgradesstables on each node (as many at a time as your load will 
allow). Minor upgrades usually don’t require this step (only if the sstable 
format has changed), but it is good to check.
  *   NOTE: in most cases applications can keep running and will not notice 
much impact – unless the cluster is overloaded and a single node down causes 
impact.



Sean Durity – Staff Systems Engineer, Cassandra

From: Sergio mailto:lapostadiser...@gmail.com>>
Sent: Wednesday, February 12, 2020 11:36 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Cassandra 3.11.X upgrades

Hi guys!

How do you usually upgrade your cluster for minor version upgrades?

I tried to add a node with 3.11.5 version to a test cluster with 3.11.4 nodes.

Is there any restriction?

Best,

Sergio



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Cassandra 3.11.X upgrades

2020-02-12 Thread Durity, Sean R
Check the readme.txt for any upgrade notes, but the basic procedure is to:

  *   Verify that nodetool upgradesstables has completed successfully on all 
nodes from any previous upgrade
  *   Turn off repairs and any other streaming operations (add/remove nodes)
  *   Stop an un-upgraded node (seeds first, preferably)
  *   Install new binaries and configs on the down node
  *   Restart that node and make sure it comes up clean (it will function 
normally in the cluster – even with mixed versions)
  *   Repeat for all nodes
  *   Run upgradesstables on each node (as many at a time as your load will 
allow). Minor upgrades usually don’t require this step (only if the sstable 
format has changed), but it is good to check.
  *   NOTE: in most cases applications can keep running and will not notice 
much impact – unless the cluster is overloaded and a single node down causes 
impact.



Sean Durity – Staff Systems Engineer, Cassandra

From: Sergio 
Sent: Wednesday, February 12, 2020 11:36 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Cassandra 3.11.X upgrades

Hi guys!

How do you usually upgrade your cluster for minor version upgrades?

I tried to add a node with 3.11.5 version to a test cluster with 3.11.4 nodes.

Is there any restriction?

Best,

Sergio



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Connection reset by peer

2020-02-12 Thread Durity, Sean R
This looks like an error between your client and the cluster. Is the other ip 
address your client app? I have typically seen this when there are network 
issues between the client and the cluster. Cassandra driver connections are 
typically very long-lived. If something like a switch or firewall times out the 
connection you can get errors like this. Tcp_keepalive settings on the cluster 
nodes can help. See here: 
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configRecommendedSettings.html



Sean Durity

From: Hanauer, Arnulf, Vodacom South Africa (External) 

Sent: Wednesday, February 12, 2020 7:06 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Connection reset by peer


Hi Cassandra folks,

We are getting a lot of these errors and transactions are timing out and I was 
wondering if this can be caused by Cassandra itself or if this is a Linux 
network issue only. The client job reports Cassandra node down after this 
occurs but I suspect this is due to the connection failure - need some 
clarification as where to go look for a solution.


INFO  [epollEventLoopGroup-2-10] 2020-02-12 11:53:42,748 Message.java:623 - 
Unexpected exception during request; channel = [id: 0x8a3e6831, 
L:/10.132.65.152:9042 - R:/10.132.11.15:48020]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]

INFO  [epollEventLoopGroup-2-15] 2020-02-12 11:42:46,871 Message.java:623 - 
Unexpected exception during request; channel = [id: 0xa071f1c8, 
L:/10.132.65.152:9042 - R:/10.132.11.15:45134]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]


Source and Destination IP addresses are in the same DC (LAN).

I did recycle all the Cassandra services on all the nodes in both clusters but 
the problem remains.

The only change made recently was the adding of replicas in the second DC for 
the keyspace that is being written to when these messages occur (not had a 
chance to run a full repair yet to sync the replicas)


FYI:
Cassandra 3.11.2
5 Node cluster each in 2 DC's


Kind regards
Arnulf Hanauer









"This e-mail is sent on the Terms and Conditions that can be accessed by 
Clicking on this link https://webmail.vodacom.co.za/tc/default.html 
[vodacom.co.za]
 "



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
Reid is right. You build the tables to easily answer the queries you want. So, 
start with the query! I inferred a query for you based on what you mentioned. 
If my inference is wrong, the table structure is likely wrong, too.

So, what kind of query do you want to run?

(NOTE: a select count(*) that is not restricted to within a single partition is 
a very bad option. Don’t do that)

The query for my table below is simply:
select user_count [, other columns] from users_by_day where date = ? and hour = 
? and minute = ?


Sean Durity

From: Reid Pinchback 
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel mailto:abd786...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>

__

RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
From reports on this mailing list, I do not allow materialized views.


Sean Durity

From: Reid Pinchback 
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you 
are looking for. That’s kind of the data model philosophy that you want to 
shoot for as much as feasible with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it 
keyed because it suits a partition structure much better aligned with what you 
want to request.  So I’d say yes, if a materialized view is how you want to 
achieve a denormalized state where the query model directly supports giving you 
want you want to query for, that sounds like an appropriate option to consider. 
 You might want a composite partition key for having an efficient selection of 
narrow time ranges.

From: Abdul Patel mailto:abd786...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - 
concurrent count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for 
materialized views ?

CREATE TABLE  usr_session (
userid bigint,
session_usr text,
last_access_time timestamp,
login_time timestamp,
status int,
PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel mailto:abd786...@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the 

RE: [EXTERNAL] Re: Running select against cassandra

2020-02-06 Thread Durity, Sean R
Do you only need the current count or do you want to keep the historical counts 
also? By active users, does that mean some kind of user that the application 
tracks (as opposed to the Cassandra user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute 
slice. Of course, the app would need to have a timer and write out the data. I 
would also suggest a TTL on the data so that you only keep what you need (a 
week, a year, whatever). Of course, if your reporting requires different 
granularities, you could consider a different time bucket for the table (by 
hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel 
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say 
 every 1 to 5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
You'll have to be more specific. What is your table schema and what is the 
SELECT query? What is the normal response time?

As a basic guide for your general question, if the query is something sort of 
irrelevant that should be stored some other way, like a total row count, or 
most any SELECT that requires ALLOW FILTERING, you're doing it wrong and should 
re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per 
minute that a Cassandra cluster should be able to handle with good data 
modeling and table-relevant query. All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from 
cassandra for reporting purpose, if no then whats the alternative?


-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] How to reduce vnodes without downtime

2020-01-31 Thread Durity, Sean R
These are good clarifications and expansions.

Sean Durity

From: Anthony Grasso 
Sent: Thursday, January 30, 2020 7:25 PM
To: user 
Subject: Re: [EXTERNAL] How to reduce vnodes without downtime

Hi Maxim,

Basically what Sean suggested is the way to do this without downtime.

To clarify the, the three steps following the "Decommission each node in the DC 
you are working on" step should be applied to only the decommissioned nodes. So 
where it say "all nodes" or "every node" it applies to only the decommissioned 
nodes.

In addition, the step that says "Wipe data on all the nodes", I would delete 
all files in the following directories on the decommissioned nodes.

  *   data (usually located in /var/lib/cassandra/data)
  *   commitlogs (usually located in /var/lib/cassandra/commitlogs)
  *   hints (usually located in /var/lib/casandra/hints)
  *   saved_caches (usually located in /var/lib/cassandra/saved_caches)

Cheers,
Anthony

On Fri, 31 Jan 2020 at 03:05, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Your procedure won’t work very well. On the first node, if you switched to 4, 
you would end up with only a tiny fraction of the data (because the other nodes 
would still be at 256). I updated a large cluster (over 150 nodes – 2 DCs) to 
smaller number of vnodes. The basic outline was this:


  *   Stop all repairs
  *   Make sure the app is running against one DC only
  *   Change the replication settings on keyspaces to use only 1 DC (basically 
cutting off the other DC)
  *   Decommission each node in the DC you are working on. Because the 
replication setting are changed, no streaming occurs. But it releases the token 
assignments
  *   Wipe data on all the nodes
  *   Update configuration on every node to your new settings, including 
auto_bootstrap = false
  *   Start all nodes. They will choose tokens, but not stream any data
  *   Update replication factor for all keyspaces to include the new DC
  *   I disabled binary on those nodes to prevent app connections
  *   Run nodetool reduild with -dc (other DC) on as many nodes as your system 
can safely handle until they are all rebuilt.
  *   Re-enable binary (and app connections to the rebuilt DC)
  *   Turn on repairs
  *   Rest for a bit, then reverse the process for the remaining DCs



Sean Durity – Staff Systems Engineer, Cassandra

From: Maxim Parkachov mailto:lazy.gop...@gmail.com>>
Sent: Thursday, January 30, 2020 10:05 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] How to reduce vnodes without downtime

Hi everyone,

with discussion about reducing default vnodes in version 4.0 I would like to 
ask, what would be optimal procedure to perform reduction of vnodes in existing 
3.11.x cluster which was set up with default value 256. Cluster has 2 DC with 5 
nodes each and RF=3. There is one more restriction, I could not add more 
servers, nor to create additional DC, everything is physical. This should be 
done without downtime.

My idea for such procedure would be

for each node:
- decommission node
- set auto_bootstrap to true and vnodes to 4
- start and wait till node joins cluster
- run cleanup on rest of nodes in cluster
- run repair on whole cluster (not sure if needed after cleanup)
- set auto_bootstrap to false
repeat for each node

rolling restart of cluster
cluster repair

Is this sounds right ? My concern is that after decommission, node will start 
on the same IP which could create some confusion.

Regards,
Maxim.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and ma

RE: [EXTERNAL] How to reduce vnodes without downtime

2020-01-30 Thread Durity, Sean R
Your procedure won’t work very well. On the first node, if you switched to 4, 
you would end up with only a tiny fraction of the data (because the other nodes 
would still be at 256). I updated a large cluster (over 150 nodes – 2 DCs) to 
smaller number of vnodes. The basic outline was this:


  *   Stop all repairs
  *   Make sure the app is running against one DC only
  *   Change the replication settings on keyspaces to use only 1 DC (basically 
cutting off the other DC)
  *   Decommission each node in the DC you are working on. Because the 
replication setting are changed, no streaming occurs. But it releases the token 
assignments
  *   Wipe data on all the nodes
  *   Update configuration on every node to your new settings, including 
auto_bootstrap = false
  *   Start all nodes. They will choose tokens, but not stream any data
  *   Update replication factor for all keyspaces to include the new DC
  *   I disabled binary on those nodes to prevent app connections
  *   Run nodetool reduild with -dc (other DC) on as many nodes as your system 
can safely handle until they are all rebuilt.
  *   Re-enable binary (and app connections to the rebuilt DC)
  *   Turn on repairs
  *   Rest for a bit, then reverse the process for the remaining DCs



Sean Durity – Staff Systems Engineer, Cassandra

From: Maxim Parkachov 
Sent: Thursday, January 30, 2020 10:05 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] How to reduce vnodes without downtime

Hi everyone,

with discussion about reducing default vnodes in version 4.0 I would like to 
ask, what would be optimal procedure to perform reduction of vnodes in existing 
3.11.x cluster which was set up with default value 256. Cluster has 2 DC with 5 
nodes each and RF=3. There is one more restriction, I could not add more 
servers, nor to create additional DC, everything is physical. This should be 
done without downtime.

My idea for such procedure would be

for each node:
- decommission node
- set auto_bootstrap to true and vnodes to 4
- start and wait till node joins cluster
- run cleanup on rest of nodes in cluster
- run repair on whole cluster (not sure if needed after cleanup)
- set auto_bootstrap to false
repeat for each node

rolling restart of cluster
cluster repair

Is this sounds right ? My concern is that after decommission, node will start 
on the same IP which could create some confusion.

Regards,
Maxim.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: sstableloader & num_tokens change

2020-01-27 Thread Durity, Sean R
I would suggest to be aware of potential data size expansion. If you load (for 
example) three copies of the data into a new cluster (because the RF of the 
origin cluster is 3), it will also get written to the RF of the new cluster (3 
more times). So, you could see data expansion of 9x the original data size (or, 
origin RF * target RF), until compaction can run.


Sean Durity – Staff Systems Engineer, Cassandra

From: Erick Ramirez 
Sent: Friday, January 24, 2020 11:03 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: sstableloader & num_tokens change


If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore 
snapshots taken from 256-token nodes into a cluster with 32-token (or your 
preferred number of tokens) nodes (otherwise same # of nodes and same RF).

No, there isn't. It will work as designed so you're good to go. Cheers!





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: COPY command with where condition

2020-01-17 Thread Durity, Sean R
sstablekeys (in the tools directory?) can extract the actual keys from your 
sstables. You have to run it on each node and then combine and de-dupe the 
final results, but I have used this technique with a query generator to extract 
data more efficiently.


Sean Durity

From: Chris Splinter 
Sent: Friday, January 17, 2020 1:47 PM
To: adrien ruffie 
Cc: user@cassandra.apache.org; Erick Ramirez 
Subject: [EXTERNAL] Re: COPY command with where condition

Do you know your partition keys?

One option could be to enumerate that list of partition keys in separate cmds 
to make the individual operations less expensive for the cluster.

For example:
Say your partition key column is called id and the ids in your database are 
[1,2,3]

You could do
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 1 AND localisation_id = 208812" -url /home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 2 AND localisation_id = 208812" -url /home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 3 AND localisation_id = 208812" -url /home/dump


Does that option work for you?



On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
I don't really know for the moment in production environment, but for 
developpment environment the table contains more than 10.000.000 rows.
But we need just a sub dataset of this table not the entirety ...

De : Chris Splinter 
mailto:chris.splinter...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 17:40
À : adrien ruffie mailto:adriennolar...@hotmail.fr>>
Cc : user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>; Erick Ramirez 
mailto:flightc...@gmail.com>>
Objet : Re: COPY command with where condition

What you are seeing there is a standard read timeout, how many rows do you 
expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Thank you very much,

 so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump


But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement 
execution failed: SELECT * FROM crt_sensors WHERE site_id = 208812 ALLOW 
FILTERING (Cassandra timeout during read query at consistency LOCAL_ONE (1 
responses were required but only 0 replica responded))

but I configured my driver with following driver.conf, but nothing work 
correctly. Do you know what is the problem ?

datastax-java-driver {
basic {


contact-points = ["data1com:9042","data2.com:9042 
[data2.com]"]

request {
timeout = "200"
consistency = "LOCAL_ONE"

}
}
advanced {

auth-provider {
class = PlainTextAuthProvider
username = "superuser"
password = "mypass"

}
}
}

De : Chris Splinter 
mailto:chris.splinter...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 16:17
À : user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Cc : Erick Ramirez mailto:flightc...@gmail.com>>
Objet : Re: COPY command with where condition

DSBulk has an option that lets you specify the query ( including a WHERE clause 
)

See Example 19 in this blog post for details: 
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading 
[datastax.com]

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay 
mailto:jean.tremb...@zen-innovations.com>> 
wrote:
Did you think about using a Materialised View to generate what you want to 
keep, and then use DSBulk to extract the data?


On 17 Jan 2020, at 14:30 , adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:

Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]

I read this : "Operations such as converting strings to lowercase, arithmetic 
on input columns, or filtering out rows based on some criteria, are not 
supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of 
busi

RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Durity, Sean R
Not sure what you mean by “online” migration. You can load data into the same 
name table in cluster B. If the primary keys match, data will be overwritten 
(effectively, not actually on disk). I think you can pipe the output of a 
dsbulk unload to a dsbulk load and make the data transfer very quick. Your 
clusters are very small, so this probably wouldn’t take long.

How you get the client apps to connect to the correct cluster/stop running/etc. 
is beyond the scope of Cassandra.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya 
Sent: Friday, January 17, 2020 1:05 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra 
cluster few having same keyspace/table names

Hi Sean,

You got all valid points.

Please see my answers below -

1. Reason we want to move from 'A' to 'B' is to get rid of 'A' Azure region 
completely.

2. Cluster names in 'A' and 'B' are different.

3. DSbulk - Is there anyway I can do online migration? - I still need to get 
clarity on whether data for same keyspace/table names can be merged between A 
and B. So 2 cases -  1. If merge is not an issue - I guess DSBulk or 
SSTableloader would be an option? 2. If merge is an issue - I am guessing 
without app code change - this wont be possible ,right?


Thanks & Regards,
Ankit Gadhiya


On Fri, Jan 17, 2020 at 9:40 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
A couple things to consider:

  *   A separation of apps into their own clusters is typically a better model 
to avoid later entanglements
  *   Dsbulk (1.4.1) is now available for only open source clusters. It is a 
great tool for unloading/loading
  *   What data problem are you trying to solve with Cassandra and this move to 
another cluster? If it is high-availability, then trying to get to 2 DCs would 
be important. However, I think you will need at least a new keyspace if you 
can’t combine the data from the clusters. Whether this requires a code or 
config change depends on how configurable the developers made the connection 
and query details. (As a side rant: why is it that developers will write all 
kinds of new code, but don’t want to touch existing code?)
  *   Your migration requirements are quite stringent (“we don’t want to change 
anything, lose anything, or stop anything. Make it happen!”). There may be a 
solution, but you may end up with something even more fragile afterwards. I 
would push back to see what is negotiable.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya mailto:ankitgadh...@gmail.com>>
Sent: Friday, January 17, 2020 8:50 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster 
few having same keyspace/table names

Hi Upasana,

Thanks for your response. I’d love to do that as a first strategy but since 
they are both separate clusters , how would I do that? Keyspaces already have 
networktopologystrategy with RF=3.


— Ankit

On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma 
<028upasana...@gmail.com<mailto:028upasana...@gmail.com>> wrote:
Hi,

Did you consider adding Cassandra nodes from cluster B,  into cluster A as a 
different data center ?

Your keyspace would than be on Network topology data strategy.

In this case, all data can be synced between both data centers by Cassandra 
using rebalancing.


At client/application level you will have to ensure local quorum/ local 
consistency  so that there is no impact on latencies.

Once you have moved data applications to new cluster , you can then remove the 
old data center (cluster A),  and cluster B would have fresh data.




On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Thanks but there’s no DSE License.
Wondering how sstableloader will help as some oh the Keyspace and tables names 
are same. Also how do i sync few system keyspaces.


Thanks & Regards,
Ankit

On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
Loader*

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!ZYeKjPXZF1wl9Nz0tJN8gy3m46Qf4nw7EmJX_Wd5ecuSBeP0V8GyjQhTiQh8hnDvcRk_RUg$>

On Fri, Jan 17, 2020, 09:09 Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
DataStax bulk loaded can be an option if data is large.

On Fri, Jan 17, 2020, 07:33 Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If the keyspace already exist, use copy command or sstableloader to merge data. 
If data volume it too big, consider spark or a custom java program

Regards,
Nitan
Cell: 510 449 9629

On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:

Any leads on this ?

— Ankit

On Thu, Jan 16, 20

RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Durity, Sean R
A couple things to consider:

  *   A separation of apps into their own clusters is typically a better model 
to avoid later entanglements
  *   Dsbulk (1.4.1) is now available for only open source clusters. It is a 
great tool for unloading/loading
  *   What data problem are you trying to solve with Cassandra and this move to 
another cluster? If it is high-availability, then trying to get to 2 DCs would 
be important. However, I think you will need at least a new keyspace if you 
can’t combine the data from the clusters. Whether this requires a code or 
config change depends on how configurable the developers made the connection 
and query details. (As a side rant: why is it that developers will write all 
kinds of new code, but don’t want to touch existing code?)
  *   Your migration requirements are quite stringent (“we don’t want to change 
anything, lose anything, or stop anything. Make it happen!”). There may be a 
solution, but you may end up with something even more fragile afterwards. I 
would push back to see what is negotiable.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya 
Sent: Friday, January 17, 2020 8:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster 
few having same keyspace/table names

Hi Upasana,

Thanks for your response. I’d love to do that as a first strategy but since 
they are both separate clusters , how would I do that? Keyspaces already have 
networktopologystrategy with RF=3.


— Ankit

On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma 
<028upasana...@gmail.com> wrote:
Hi,

Did you consider adding Cassandra nodes from cluster B,  into cluster A as a 
different data center ?

Your keyspace would than be on Network topology data strategy.

In this case, all data can be synced between both data centers by Cassandra 
using rebalancing.


At client/application level you will have to ensure local quorum/ local 
consistency  so that there is no impact on latencies.

Once you have moved data applications to new cluster , you can then remove the 
old data center (cluster A),  and cluster B would have fresh data.




On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Thanks but there’s no DSE License.
Wondering how sstableloader will help as some oh the Keyspace and tables names 
are same. Also how do i sync few system keyspaces.


Thanks & Regards,
Ankit

On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
Loader*

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]

On Fri, Jan 17, 2020, 09:09 Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
DataStax bulk loaded can be an option if data is large.

On Fri, Jan 17, 2020, 07:33 Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If the keyspace already exist, use copy command or sstableloader to merge data. 
If data volume it too big, consider spark or a custom java program

Regards,
Nitan
Cell: 510 449 9629


On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:

Any leads on this ?

— Ankit

On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Hi Arvinder,

Thanks for your response.

Yes - Cluster B already has some data. Tables/KS names are identical ; for data 
- I still haven't got the clarity if it has identical data or no - I am 
assuming no since it's for different customers but need the confirmation.

Thanks & Regards,
Ankit Gadhiya


On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
mailto:dhillona...@gmail.com>> wrote:
So as I understand, Cluster B already has some data and not an empty cluster.

When you say, clusters share same keyspace and table names, do you mean both 
clusters have identical data on those ks/tables?

-Arvi

On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Hello Group,

I have a requirement in one of the production systems where I need to be able 
to migrate entire dataset from Cluster A (Azure Region A) to Cluster B (Azure 
Region B).

Each cluster have 3 Cassandra nodes (RF=3) running used by different 
applications. Few of the applications are common is Cluster A and Cluster B 
thereby sharing same keyspace/table names.
Need suggestion for the best possible migration strategy here considering - 1. 
No Application code changes possible - Minor config/infra changes can be 
considered. 2. Zero data loss. 3. No/Minimal downtime.

It'd be great to hear ideas from all of you based on your experiences.

Cassandra Version - Cassandra 3.0.13 on both sides.
Total Data size - Cluster A: 70 GB, Cluster B: 15 GB

Thanks & Regards,
Ankit Gadhiya
--
Thanks & Regards,
Ankit Gadhiya
--
Thanks & Regards,
Ankit Gadhiya
--
Thanks

RE: [EXTERNAL] Re: Log output when Cassandra is "up"?

2020-01-08 Thread Durity, Sean R
I use a script that calls nodetool info. If nodetool info returns an error 
(instance isn’t up, on the way up, etc.) then I return that same error code 
(and I know the node is NOT OK). If nodetool info succeeds, I then parse the 
output for each protocol to be up. A node can be up, but have gossip or cql 
down/unavailable. That node is also NOT OK.

If all this passes, then the node is OK.

I would love if there was something simpler, like the Informix onstat - | grep 
“On-line” …

Sean Durity

From: Deepak Vohra 
Sent: Wednesday, January 8, 2020 4:12 PM
To: user@cassandra.apache.org; Voytek Jarnot 
Subject: [EXTERNAL] Re: Log output when Cassandra is "up"?


Use the nodetool status command.


nodetool status

Datacenter: us-east-1

=

Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--  Address Load
Tokens  Owns (effective)  Host ID   Rack

UN  10.0.1.115  205.6 KiB   256 66.9% 
b64cb32a-b32a-46b4-9eeb-e123fa8fc287  us-east-1b

UN  10.0.3.206  182.67 KiB  256 63.5% 
74863177-684b-45f4-99f7-d1006625dc9e  us-east-1d

UN  10.0.2.238  240.81 KiB  256 69.6% 
4dcdadd2-41f9-4f34-9892-1f20868b27c7  us-east-1c


UN in 1st column implies status is Up and Normal.


On Wednesday, January 8, 2020, 08:37:42 p.m. UTC, Voytek Jarnot 
mailto:voytek.jar...@gmail.com>> wrote:


Needing to know when Cassandra is finished initializing and is up & running.

Had some scripts which were looking through system.log for "No gossip backlog; 
proceeding", but that turns out not to be 100% reliable.

Is looking for "Starting listening for CQL clients" considered definitive? 
I.E., always gets output on success, and not on failure?

Thanks



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: How bottom of cassandra save data efficiently?

2020-01-02 Thread Durity, Sean R
100,000 rows is pretty small. Import your data to your cluster, do a nodetool 
flush on each node, then you can see how much disk space is actually used.

There are different compression tools available to you when you create the 
table. It also matters if the rows are in separate partitions or you have many 
rows per partition. In one exercise I have done, individual partitions can 
cause the data to expand from 0.3 MB (with many rows per partition) to 20 MB 
(one row per partition) – all from the same data set. Your compaction settings 
can also change the size of data on disk.

Bottom line – precise math requires more parameters than you have given. Actual 
experimentation is easier.


Sean Durity

From: lampahome 
Sent: Wednesday, January 1, 2020 8:33 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: How bottom of cassandra save data efficiently?



Dipan Shah mailto:dipan@hotmail.com>> 於 2019年12月31日 
週二 下午5:34寫道:
Hello lampahome,

Data will be compressed but you will also have to account for the replication 
factor that you will be using.


Thanks,

Dipan Shah


The key factor about efficiency is replication factor. Are there other factors?




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Facing issues while starting Cassandra

2020-01-02 Thread Durity, Sean R
Any read-only file systems? Have you tried to start from the command line 
(instead of a service)? Sometimes that will give a more helpful error when 
start-up can’t complete.

If your error is literally what you included, it looks like the executable 
can’t find the cassandra.yaml file.

I will agree with Jeff, though. When I have seen a similar error it has usually 
been a yaml violation, such as having a tab (instead of spaces) in the yaml 
file. Check that specific node’s file with a yaml lint detector?

Sean Durity

From: Inquistive allen 
Sent: Tuesday, December 24, 2019 2:01 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Facing issues while starting Cassandra

Hello Osman,

Thanks for the suggestion.
I did try "export LC_ALL=C"
It didn't help.

Thanks

On Tue, 24 Dec, 2019, 12:05 PM Osman Yozgatlıoğlu, 
mailto:osman.yozgatlio...@gmail.com>> wrote:
I faced similar issues with different locale settings.
Could you try following command before running?
export LC_ALL=C;

Regards,
Osman

On Tue, 24 Dec 2019 at 09:01, Inquistive allen 
mailto:inquial...@gmail.com>> wrote:
>
> Hello Jeff,
>
> Thanks for responding.
> I have validated the cassandra.yaml file with other hosts in the cluster.
> There is no difference. I copied a yaml file from other node to this node and 
> changed the required configs. Still facing the same issue.
> The server went down for patching and after coming back up, Cassandra dosent 
> seem to start.
> Having looked for solutions on google, I found that it might be a problem 
> with the /tmp directory where the classes are stored.
> Each time I try starting Cassandra, in the /tmp directory a new directory is 
> created, but nothing is inside the directory. After some time, the node goes 
> down.
>
> I believe there is something to do with the /tmp directory.
> Request you to comment on the same.
>
> Thanks
>
> On Tue, 24 Dec, 2019, 3:42 AM Jeff Jirsa, 
> mailto:jji...@gmail.com>> wrote:
>>
>> Are you able to share the yaml? Almost certainly something in it that’s 
>> invalid.
>>
>> On Dec 23, 2019, at 12:51 PM, Inquistive allen 
>> mailto:inquial...@gmail.com>> wrote:
>>
>> 
>> Hello Team,
>>
>> I am facing issues while starting Cassandra.
>>
>> Caused by: org.apache.cassandra.exceptions.ConfigurationException : Invalid 
>> yaml: file: /path/to/yaml
>> Error: null ; can't construct a java object for tag: yaml.org 
>> [yaml.org],2002:org.apache.cassandra.config.Config;
>>  exception= java.lang.reflect.InvocationTargetException
>>
>> Request to comment on how to resolve the issue.
>>
>> Thanks & Regards
>> Allen

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Migration a Keyspace from 3.0.X to 3.11.2 Cluster which already have keyspaces

2019-12-02 Thread Durity, Sean R
The size of the data matters here. Copy to/from is ok if the data is a few 
million rows per table, but not billions. It is also relatively slow (but with 
small data or a decent outage window, it could be fine). If the data is large 
and the outage time matters, you may need custom code to read from one cluster 
and write to another. If this is DataStax, the dsbulk utility would be ideal.


Sean Durity
-Original Message-
From: slmnjobs - 
Sent: Sunday, December 1, 2019 4:41 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Migration a Keyspace from 3.0.X to 3.11.2 Cluster which 
already have keyspaces

Hi everyone,

I have a question about migrate a keyspace another cluster. The main problem 
for us, our new cluster already have 2 keyspaces and we using it in production. 
Because of we don't sure how token ranges will be change, we would like the 
share our migration plan here and take back your comments.
 We have two Cassandra clusters which one of them:

CLUSTER-A :
- Cassandra version 3.0.10
- describe keyspace:
CREATE KEYSPACE mykeyspace WITH replication = {'class': 
'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3', 'DC3': '1'}  AND 
durable_writes = true;
- DC1 : 6 nodes
- DC2 : 6 nodes
- DC3 : 1 node (backup node, have all data)

CLUSTER-B :
- Cassandra version 3.11.2- DC1 : 5 nodes
- DC2 : 5 nodes- DC3 : 1 node
- Already have 2 keyspaces and write/read traffic

We want to migrate a keyspace which on CLUSTER-A to CLUSTER-B. There're some 
solutions for restore or migrate a keyspace on a new cluster but I haven't seen 
any safety way about how we can migrate a keyspace on existing cluster which 
already have keyspaces.

Replication Factor won't change.
We think about two ways : one of them using sstableloader and other one using 
COPY TO/COPY FROM commands.

Our migration plan is:

- export of keyspace schema structure with DESC keyspace on CLUSTER-A
- create keyspace schema on CLUSTER-B
- disable writing traffic on application layer
- load data from CLUSTER-A, DC3 backup node (which have all data) to CLUSTER-B, 
DC1 with sstableloader or COPY command (each table wiil be copy one by one).
- update cluster IP addresses in application configuration
- enable writing traffic on application layer
So do you see any risk or any suggestion for this plan? Thanks a lot.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Upgrade strategy for high number of nodes

2019-12-02 Thread Durity, Sean R
All my upgrades are without downtime for the application. Yes, do the binary 
upgrade one node at a time. Then run upgradesstables on as many nodes as your 
app load can handle (maybe you can point the app to a different DC, while 
another DC is doing upgradesstables). Upgradesstables doesn’t cause downtime – 
it just increases the IO load on the nodes executing the upgradesstables. I try 
to get it done as quickly as possible, because I suspend streaming operations 
(repairs, etc.) until the sstable rewrites are completed.

Sean Durity

From: Shishir Kumar 
Sent: Saturday, November 30, 2019 1:00 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Upgrade strategy for high number of nodes

Thanks for pointer. We haven't much changed data model since long, so before 
workarounds (scrub) worth understanding root cause of problem.
This might be reason why running upgradesstables in parallel was not 
recommended.
-Shishir
On Sat, 30 Nov 2019, 10:37 Jeff Jirsa, 
mailto:jji...@gmail.com>> wrote:
Scrub really shouldn’t be required here.

If there’s ever a step that reports corruption, it’s either a very very old 
table where you dropped columns previously or did something “wrong” in the past 
or a software bug. The old dropped column really should be obvious in the stack 
trace - anything else deserves a bug report.

It’s unfortunate that people jump to just scrubbing the unreadable data - would 
appreciate an anonymized JIRA if possible. Alternatively work with your vendor 
to make sure they don’t have bugs in their readers somehow.





On Nov 29, 2019, at 8:58 PM, Shishir Kumar 
mailto:shishirroy2...@gmail.com>> wrote:

Some more background. We are planning (tested) binary upgrade across all nodes 
without downtime. As next step running upgradesstables. As C* file format and 
version (from format big, version mc to format bti, version aa (Refer 
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/ToolsSSTableupgrade.html
 
[docs.datastax.com]
 - upgrade from DSE 5.1 to 6.x). Underlying changes explains why it takes too 
much time to upgrade.
Running  upgradesstables  in parallel across RAC - This is where I am not sure 
on impact of running in parallel (document recommends to run one node at time). 
During upgradesstables there are scenario's where it report file corruption, 
hence require corrective step I.e. scrub. Due to file corruption at times nodes 
goes down due to sstable corruption or result in high CPU usage ~100%. 
Performing above in parallel without downtime might result in more 
inconsistency across nodes. This scenario have not tested, so will need group 
help in case they have done similar upgrade in past (I.e. scenario's/complexity 
which needs to be considered and why guideline recommend to run upgradesstable 
one node at time).
-Shishir

On Fri, Nov 29, 2019 at 11:52 PM Josh Snyder 
mailto:j...@code406.com>> wrote:
Hello Shishir,

It shouldn't be necessary to take downtime to perform upgrades of a Cassandra 
cluster. It sounds like the biggest issue you're facing is the upgradesstables 
step. upgradesstables is not strictly necessary before a Cassandra node 
re-enters the cluster to serve traffic; in my experience it is purely for 
optimizing the performance of the database once the software upgrade is 
complete. I recommend trying out an upgrade in a test environment without using 
upgradesstables, which should bring the 5 hours per node down to just a few 
minutes.

If you're running NetworkTopologyStrategy and you want to optimize further, you 
could consider performing the upgrade on multiple nodes within the same rack in 
parallel. When correctly configured, NetworkTopologyStrategy can protect your 
database from an outage of an entire rack. So performing an upgrade on a few 
nodes at a time within a rack is the same as a partial rack outage, from the 
database's perspective.

Have a nice upgrade!

Josh

On Fri, Nov 29, 2019 at 7:22 AM Shishir Kumar 
mailto:shishirroy2...@gmail.com>> wrote:
Hi,

Need input on cassandra upgrade strategy for below:
1. We have Datacenter across 4 geography (multiple isolated deployments in each 
DC).
2. Number of Cassandra nodes in each deployment is between 6 to 24
3. Data volume on each nodes between 150 to 400 GB
4. All production environment has DR set up
5. During upgrade we do not want downtime

We are planning to go for stack upgrade but upgradesstables is taking approx. 5 
hours per node (if data volume is approx 200 GB).
Options-
No downtime - As per recommendation (DataStax documentation) if we plan to 
upgrade one node at time I.e. in sequence upgrade cycle for one environment 
will take weeks, so DevOps concern.
Read Only (No downtime) - Route read only load to DR system. We have 

RE: [EXTERNAL] performance

2019-12-02 Thread Durity, Sean R
I’m not sure this is the fully correct question to ask. The size of the data 
will matter. The importance of high availability matters. Performance can be 
tuned by taking advantage of Cassandra’s design strengths. In general, you 
should not be doing queries with a where clause on non-key columns. Secondary 
indexes are not what you would expect from a relational background (and should 
normally be avoided).

In short, choose Cassandra if you need high-availability and low latency on 
KNOWN access patterns (on which you base your table design).

If you want an opinion – I would never put data over a few hundred GB that I 
care about into mysql. I don’t like the engine, the history, the company, or 
anything about it. But that’s just my opinion. I know many companies have 
successfully used mysql.


Sean Durity

From: hahaha sc 
Sent: Friday, November 29, 2019 3:27 AM
To: cassandra-user 
Subject: [EXTERNAL] performance

Query based on a field with a non-primary key and a secondary index, and then 
update based on the primary key. Can it be  more efficient than mysql?



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Cassandra 3.11.4 Node the load starts to increase after few minutes to 40 on 4 CPU machine

2019-10-31 Thread Durity, Sean R
There is definitely a resource risk to having thousands of open connections to 
each node. Some of the drivers have (had?) less than optimal default settings, 
like acquiring 50 connections per Cassandra node. This is usually overkill. I 
think 5-10/node is much more reasonable. It depends on your app architecture 
and cluster node count. If there are lots of small micro-services, maybe they 
only need 2 connections per node.


Sean Durity – Staff Systems Engineer, Cassandra

From: Sergio 
Sent: Wednesday, October 30, 2019 5:39 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra 3.11.4 Node the load starts to increase after 
few minutes to 40 on 4 CPU machine

Hi Reid,

I don't have anymore this loading problem.
I solved by changing the Cassandra Driver Configuration.
Now my cluster is pretty stable and I don't have machines with crazy CPU Load.
The only thing not urgent but I need to investigate is the number of 
ESTABLISHED TCP connections. I see just one node having 7K TCP connections 
ESTABLISHED while the others are having around 4-6K connection opened. So the 
newest nodes added into the cluster have a higher number of ESTABLISHED TCP 
connections.

default['cassandra']['sysctl'] = {
'net.ipv4.tcp_keepalive_time' => 60,
'net.ipv4.tcp_keepalive_probes' => 3,
'net.ipv4.tcp_keepalive_intvl' => 10,
'net.core.rmem_max' => 16777216,
'net.core.wmem_max' => 16777216,
'net.core.rmem_default' => 16777216,
'net.core.wmem_default' => 16777216,
'net.core.optmem_max' => 40960,
'net.ipv4.tcp_rmem' => '4096 87380 16777216',
'net.ipv4.tcp_wmem' => '4096 65536 16777216',
'net.ipv4.ip_local_port_range' => '1 65535',
'net.ipv4.tcp_window_scaling' => 1,
  'net.core.netdev_max_backlog' => 2500,
  'net.core.somaxconn' => 65000,
'vm.max_map_count' => 1048575,
'vm.swappiness' => 0
}

These are my tweaked value and I used the values recommended from datastax.

Do you have something different?

Best,
Sergio

Il giorno mer 30 ott 2019 alle ore 13:27 Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>> ha scritto:
Oh nvm, didn't see the later msg about just posting what your fix was.

R


On 10/30/19, 4:24 PM, "Reid Pinchback" 
mailto:rpinchb...@tripadvisor.com>> wrote:

 Message from External Sender

Hi Sergio,

Assuming nobody is actually mounting a SYN flood attack, then this sounds 
like you're either being hammered with connection requests in very short 
periods of time, or your TCP backlog tuning is off.   At least, that's where 
I'd start looking.  If you take that log message and google it (Possible SYN 
flooding... Sending cookies") you'll find explanations.  Or just googling "TCP 
backlog tuning".

R


On 10/30/19, 3:29 PM, "Sergio Bilello" 
mailto:lapostadiser...@gmail.com>> wrote:

>
>Oct 17 00:23:03 prod-personalization-live-data-cassandra-08 kernel: 
TCP: request_sock_TCP: Possible SYN flooding on port 9042. Sending cookies. 
Check SNMP counters.




-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] n00b q re UPDATE v. INSERT in CQL

2019-10-25 Thread Durity, Sean R
Everything in Cassandra is an insert. So, an update and an insert are 
functionally equivalent. An update doesn't go update the existing data on disk; 
it is a new write of the columns involved. So, the difference in your scenario 
is that with the "targeted" update, you are writing less of the columns again.

So, instead of inserting 10 values (for example), you are inserting 3 (pk1, 
pk2, and col1). This would mean less disk space used for your data cleanup. 
Once Cassandra runs its compaction across your data (with a simplifying 
assumption that all data gets compacted), there would be no disk space 
difference for the final result.

I would do the updates, if the size/scope of the data involved is significant.


Sean Durity – Staff Systems Engineer, Cassandra

-Original Message-
From: James A. Robinson 
Sent: Friday, October 25, 2019 10:49 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] n00b q re UPDATE v. INSERT in CQL

Hi folks,

I'm working on a clean-up task for some bad data in a cassandra db.
The bad data in this case are values with mixed case that will need to
be lowercased.  In some tables the value that needs to be changed is a
primary key, in other cases it is not.

From the reading I've done, the situations where I need to change a
primary key column to lowercase will mean I need to perform an INSERT
of the entire row using the new primary key values merged with the old
non-primary-key values, followed by a DELETE of the old primary key
row.

My question is, on a table where I need to update a column that isn't
primary key, should I perform a limited UPDATE in that situation like
I would in SQL:

UPDATE ks.table SET col1 = ? WHERE pk1 = ? AND pk2 = ?

or will there be any downsides to that over an INSERT where I specify
all columns?

INSERT INTO ks.table (pk1, pk2, col1, col2, ...) VALUES (?,?,?,?, ...)

In SQL I'd never question just using the update but my impression
reading the blogosphere is that Cassandra has subtleties that I might
not be grasping when it comes to UPDATE v. INSERT behavior...

Jim

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


RE: Cassandra Rack - Datacenter Load Balancing relations

2019-10-25 Thread Durity, Sean R
+1 for removing complexity to be able to create (and maintain!) “reasoned” 
systems!


Sean Durity – Staff Systems Engineer, Cassandra

From: Reid Pinchback 
Sent: Thursday, October 24, 2019 10:28 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Rack - Datacenter Load Balancing relations

Hey Sergio,

Forgive but I’m at work and had to skim the info quickly.

When in doubt, simplify.  So 1 rack per DC.  Distributed systems get rapidly 
harder to reason about the more complicated you make them.  There’s more than 
enough to learn about C* without jumping into the complexity too soon.

To deal with the unbalancing issue, pay attention to Jon Haddad’s advice on 
vnode count and how to fairly distribute tokens with a small vnode count.  I’d 
rather point you to his information, as I haven’t dug into vnode counts and 
token distribution in detail; he’s got a lot more time in C* than I do.  I come 
at this more as a traditional RDBMS and Java guy who has slowly gotten up to 
speed on C* over the last few years, and dealt with DynamoDB a lot so have 
lived with a lot of similarity in data modelling concerns.  Detailed internals 
I only know in cases where I had reason to dig into C* source.

There are so many knobs to turn in C* that it can be very easy to overthink 
things.  Simplify where you can.  Remove GC pressure wherever you can.  
Negotiate with your consumers to have data models that make sense for C*.  If 
you have those three criteria foremost in mind, you’ll likely be fine for quite 
some time.  And in the times where something isn’t going well, simpler is 
easier to investigate.

R

From: Sergio mailto:lapostadiser...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, October 23, 2019 at 3:34 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra Rack - Datacenter Load Balancing relations

Message from External Sender
Hi Reid,

Thank you very much for clearing these concepts for me.
https://community.datastax.com/comments/1133/view.html
 I posted this question on the datastax forum regarding our cluster that it is 
unbalanced and the reply was related that the number of racks should be a 
multiplier of the replication factor in order to be balanced or 1. I thought 
then if I have 3 availability zones I should have 3 racks for each datacenter 
and not 2 (us-east-1b, us-east-1a) as I have right now or in the easiest way, I 
should have a rack for each datacenter.



1.   Datacenter: live

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens   OwnsHost ID
   Rack
UN  10.1.20.49   289.75 GiB  256  ?   
be5a0193-56e7-4d42-8cc8-5d2141ab4872  us-east-1a
UN  10.1.30.112  103.03 GiB  256  ?   
e5108a8e-cc2f-4914-a86e-fccf770e3f0f  us-east-1b
UN  10.1.19.163  129.61 GiB  256  ?   
3c2efdda-8dd4-4f08-b991-9aff062a5388  us-east-1a
UN  10.1.26.181  145.28 GiB  256  ?   
0a8f07ba-a129-42b0-b73a-df649bd076ef  us-east-1b
UN  10.1.17.213  149.04 GiB  256  ?   
71563e86-b2ae-4d2c-91c5-49aa08386f67  us-east-1a
DN  10.1.19.198  52.41 GiB  256  ?   
613b43c0-0688-4b86-994c-dc772b6fb8d2  us-east-1b
UN  10.1.31.60   195.17 GiB  256  ?   
3647fcca-688a-4851-ab15-df36819910f4  us-east-1b
UN  10.1.25.206  100.67 GiB  256  ?   
f43532ad-7d2e-4480-a9ce-2529b47f823d  us-east-1b
So each rack label right now matches the availability zone and we have 3 
Datacenters and 2 Availability Zone with 2 racks per DC but the above is 
clearly unbalanced
If I have a keyspace with a replication factor = 3 and I want to minimize the 
number of nodes to scale up and down the cluster and keep it balanced should I 
consider an approach like OPTION A)

2.   Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a

3.   3 read ONE us-east-1a

4.   4 write ONE us-east-1b 5 write ONE us-east-1b

5.   6 write ONE us-east-1b

6.   OPTION B)

7.   Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1a

8.   3 read ONE us-east-1a

9.   4 write TWO us-east-1b 5 write TWO us-east-1b

10.   6 write TWO us-east-1b

11.   7 read ONE us-east-1c 8 write TWO us-east-1c

12.   9 read ONE us-east-1c Option B looks to be unbalanced and I would exclude 
it OPTION C)

13.   Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1b

14.   3 read ONE us-east-1c

15.   4 write TWO us-east-1a 5 write TWO us-east-1b

16.   6 write TWO us-east-1c

17.
so I am thinking of A if I have

RE: merge two cluster

2019-10-23 Thread Durity, Sean R
Beneficial to whom? The apps, the admins, the developers?

I suggest that app teams have separate clusters per application. This prevents 
the noisy neighbor problem, isolates any security issues, and helps when it is 
time for maintenance, upgrade, performance testing, etc. to not have to 
coordinate multiple app teams at the same time. Also, an individual cluster can 
be tuned for its specific workload. Sometimes, though, costs and data size push 
us towards combining smaller apps owned by the same team onto a single cluster. 
Those are the exceptions.

As a Cassandra admin, I am always trying to scale the ability to admin multiple 
clusters without just adding new admins. That is an on-going task, dependent on 
your operating environment.

Also, because every table has a portion of memory (memtable), there is a 
practical limit to the number of tables that any one cluster should have. I 
have heard it is in the low hundreds of tables. This puts a limit on the number 
of applications that a cluster can safely support.


Sean Durity – Staff Systems Engineer, Cassandra

From: Osman YOZGATLIOĞLU 
Sent: Wednesday, October 23, 2019 6:23 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] merge two cluster


Hello,

I have two cluster and both contains different data sets with different node 
counts.

Would it be beneficial to merge two cluster?



Regards,

Osman



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

2019-10-21 Thread Durity, Sean R
I don’t disagree with Jon, who has all kinds of performance tuning experience. 
But for ease of operation, we only use G1GC (on Java 8), because the tuning of 
ParNew+CMS requires a high degree of knowledge and very repeatable testing 
harnesses. It isn’t worth our time. As a previous writer mentioned, there is 
usually better return on our time tuning the schema (aka helping developers 
understand Cassandra’s strengths).

We use 16 – 32 GB heaps, nothing smaller than that.

Sean Durity

From: Jon Haddad 
Sent: Monday, October 21, 2019 10:43 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: GC Tuning 
https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

I still use ParNew + CMS over G1GC with Java 8.  I haven't done a comparison 
with JDK 11 yet, so I'm not sure if it's any better.  I've heard it is, but I 
like to verify first.  The pause times with ParNew + CMS are generally lower 
than G1 when tuned right, but as Chris said it can be tricky.  If you aren't 
willing to spend the time understanding how it works and why each setting 
matters, G1 is a better option.

I wouldn't run Cassandra in production on less than 8GB of heap - I consider it 
the absolute minimum.  For G1 I'd use 16GB, and never 4GB with Cassandra unless 
you're rarely querying it.

I typically use the following as a starting point now:

ParNew + CMS
16GB heap
10GB new gen
2GB memtable cap, otherwise you'll spend a bunch of time copying around 
memtables (cassandra.yaml)
Max tenuring threshold: 2
survivor ratio 6

I've also done some tests with a 30GB heap, 24 GB of which was new gen.  This 
worked surprisingly well in my tests since it essentially keeps everything out 
of the old gen.  New gen allocations are just a pointer bump and are pretty 
fast, so in my (limited) tests of this I was seeing really good p99 times.  I 
was seeing a 200-400 ms pause roughly once a minute running a workload that 
deliberately wasn't hitting a resource limit (testing real world looking stress 
vs overwhelming the cluster).

We built tlp-cluster [1] and tlp-stress [2] to help figure these things out.

[1] https://thelastpickle.com/tlp-cluster/ 
[thelastpickle.com]
[2] http://thelastpickle.com/tlp-stress 
[thelastpickle.com]

Jon




On Mon, Oct 21, 2019 at 10:24 AM Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>> wrote:
An i3x large has 30.5 gb of RAM but you’re using less than 4gb for C*.  So 
minus room for other uses of jvm memory and for kernel activity, that’s about 
25 gb for file cache.  You’ll have to see if you either want a bigger heap to 
allow for less frequent gc cycles, or you could save money on the instance 
size.  C* generates a lot of medium-length lifetime objects which can easily 
end up in old gen.  A larger heap will reduce the burn of more old-gen 
collections.  There are no magic numbers to just give because it’ll depend on 
your usage patterns.

From: Sergio mailto:lapostadiser...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Sunday, October 20, 2019 at 2:51 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html 
[thelastpickle.com]

Message from External Sender
Thanks for the answer.

This is the JVM version that I have right now.

openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

These are the current flags. Would you change anything in a i3x.large aws node?

java -Xloggc:/var/log/cassandra/gc.log 
-Dcassandra.max_queued_native_transport_requests=4096 -ea 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 
-XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
-XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB 
-XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true 
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:+UseG1GC 
-XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=200 
-XX:InitiatingHeapOccupancyPercent=45 -XX:G1HeapRegionSize=0 
-XX:-ParallelRefProcEnabled -Xms3821M -Xmx3821M 
-XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
-Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.rmi.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.password.file=/etc/cassandra/conf/j

RE: [EXTERNAL] Cassandra Export error in COPY command

2019-09-22 Thread Durity, Sean R
Copy command tries to export all rows in the table, not just the ones on the 
node. It will eventually timeout if the table is large. It is really built for 
something under 5 million rows or so. Dsbulk (from DataStax) is great for this, 
if you are a customer. Otherwise, you will probably need to write an extract of 
some kind. You can get keys from the sstables, then dedupe, then export rows 
one by one using the keys (kind of painful). How large is the table you are 
trying to export?

Sean Durity

From: Hossein Ghiyasi Mehr 
Sent: Saturday, September 21, 2019 8:02 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Cassandra Export error in COPY command

Hi all members,
I want to export (pk, another_int_column) from single node using COPY command. 
But after about 1h 45m, I've got a lot of read errors:

[cid:image002.png@01D57199.74B91800]

I tried this action many times but after maximum 2h, it failed with the errors.

Any idea may help me!
Thanks.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: loading big amount of data to Cassandra

2019-08-05 Thread Durity, Sean R
DataStax has a very fast bulk load tool - dsebulk. Not sure if it is available 
for open source or not. In my experience so far, I am very impressed with it.



Sean Durity – Staff Systems Engineer, Cassandra

-Original Message-
From: p...@xvalheru.org 
Sent: Saturday, August 3, 2019 6:06 AM
To: user@cassandra.apache.org
Cc: Dimo Velev 
Subject: [EXTERNAL] Re: loading big amount of data to Cassandra

Thanks to all,

I'll try the SSTables.

Thanks

Pat

On 2019-08-03 09:54, Dimo Velev wrote:
> Check out the CQLSSTableWriter java class -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_src_java_org_apache_cassandra_io_sstable_CQLSSTableWriter.java&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=F43aPz7NPfAfs5c_oRJQvUiTMJjDmpB_BXAHKhPfW2A&e=
> . You use it to generate sstables - you need to write a small program
> for that. You can then stream them over the network using the
> sstableloader (either use the utility or use the underlying classes to
> embed it in your program).
>
> On 3. Aug 2019, at 07:17, Ayub M  wrote:
>
>> Dimo, how do you generate sstables? Do you mean load data locally on
>> a cassandra node and use sstableloader?
>>
>> On Fri, Aug 2, 2019, 5:48 PM Dimo Velev 
>> wrote:
>>
>>> Hi,
>>>
>>> Batches will actually slow down the process because they mean a
>>> different thing in C* - as you read they are just grouping changes
>>> together that you want executed atomically.
>>>
>>> Cassandra does not really have indices so that is different than a
>>> relational DB. However, after writing stuff to Cassandra it
>>> generates many smallish partitions of the data. These are then
>>> joined in the background together to improve read performance.
>>>
>>> You have two options from my experience:
>>>
>>> Option 1: use normal CQL api in async mode. This will create a
>>> high CPU load on your cluster. Depending on whether that is fine
>>> for you that might be the easiest solution.
>>>
>>> Option 2: generate sstables locally and use the sstableloader to
>>> upload them into the cluster. The streaming does not generate high
>>> cpu load so it is a viable option for clusters with other
>>> operational load.
>>>
>>> Option 2 scales with the number of cores of the machine generating
>>> the sstables. If you can split your data you can generate sstables
>>> on multiple machines. In contrast, option 1 scales with your
>>> cluster. If you have a large cluster that is idling, it would be
>>> better to use option 1.
>>>
>>> With both options I was able to write at about 50-100K rows / sec
>>> on my laptop and local Cassandra. The speed heavily depends on the
>>> size of your rows.
>>>
>>> Back to your question — I guess option2 is similar to what you
>>> are used to from tools like sqlloader for relational DBMSes
>>>
>>> I had a requirement of loading a few 100 mio rows per day into an
>>> operational cluster so I went with option 2 to offload the cpu
>>> load to reduce impact on the reading side during the loads.
>>>
>>> Cheers,
>>> Dimo
>>>
>>> Sent from my iPad
>>>
 On 2. Aug 2019, at 18:59, p...@xvalheru.org wrote:

 Hi,

 I need to upload to Cassandra about 7 billions of records. What
>>> is the best setup of Cassandra for this task? Will usage of batch
>>> speeds up the upload (I've read somewhere that batch in Cassandra
>>> is dedicated to atomicity not to speeding up communication)? How
>>> Cassandra internally works related to indexing? In SQL databases
>>> when uploading such amount of data is suggested to turn off
>>> indexing and then turn on. Is something simmillar possible in
>>> Cassandra?

 Thanks for all suggestions.

 Pat

 
 Freehosting PIPNI - 
 https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e=



>>>
>>
> -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org

>>>
>>>
>>
> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> ---
>
> Freehosting PIPNI - 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e=


Freehosting PIPNI - 
https://urldefense.proofpoint.com/v2/url?u=h

RE: [EXTERNAL] Apache Cassandra upgrade path

2019-07-26 Thread Durity, Sean R
This would handle client protocol, but not streaming protocol between nodes.


Sean Durity – Staff Systems Engineer, Cassandra

From: Alok Dwivedi 
Sent: Friday, July 26, 2019 3:21 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Apache Cassandra upgrade path

Hi Sean
The recommended practice for upgrade is to explicitly control protocol version 
in your application during upgrade process. Basically the protocol version is 
negotiated on first connection and based on chance it can talk to an already 
upgraded node first which means it will negotiate a higher version that will 
not be compatible with those nodes which are still one lower Cassandra version. 
So initially you set it a lower version that is like lower common denominator 
for mixed mode cluster and then remove the call to explicit setting once 
upgrade has completed.

Cluster cluster = Cluster.builder()
.addContactPoint("127.0.0.1")
.withProtocolVersion(ProtocolVersion.V2)
.build();

Refer here for more information if using Java driver
https://docs.datastax.com/en/developer/java-driver/3.7/manual/native_protocol/#protocol-version-with-mixed-clusters<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_developer_java-2Ddriver_3.7_manual_native-5Fprotocol_-23protocol-2Dversion-2Dwith-2Dmixed-2Dclusters&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JUUAJpaOGj5fhLX2uWOwUVqUcHN3c24hEaDC1T8RZVQ&s=WLqlcmEjAYjj7TAAmvYA3NyPqe7ZqgFTNuRNZXryUQE&e=>

Same thing applies to drivers in other languages.

Thanks
Alok Dwivedi
Senior Consultant
https://www.instaclustr.com/<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JUUAJpaOGj5fhLX2uWOwUVqUcHN3c24hEaDC1T8RZVQ&s=gQuE9u1lRiSA9uZsshvcKIuYih5Rvz3v6lhUOLZzvw4&e=>


On Fri, 26 Jul 2019 at 20:03, Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>> wrote:
Thanks Sean,

In my use case all my clusters are multi DC, and I am trying my best effort to 
upgrade ASAP, however there is a chance since all machines are VMs. Also my key 
spaces are not uniform across DCs. some are replicated to all DCs and some of 
them are just one DC, so I am worried there.

Is there a way to override the protocol version until the upgrade is done and 
then change it back once the upgrade is completed?

On Fri, Jul 26, 2019 at 11:42 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
What you have seen is totally expected. You can’t stream between different 
major versions of Cassandra. Get the upgrade done, then worry about any down 
hardware. If you are using DCs, upgrade one DC at a time, so that there is an 
available environment in case of any disasters.

My advice, though, is to get through the rolling upgrade process as quickly as 
possible. Don’t stay in a mixed state very long. The cluster will function fine 
in a mixed state – except for those streaming operations. No repairs, no 
bootstraps.


Sean Durity – Staff Systems Engineer, Cassandra

From: Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>>
Sent: Friday, July 26, 2019 2:24 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Apache Cassandra upgrade path

Hello,

I am trying to upgrade Apache Cassandra from 2.1.16 to 3.11.3, the regular 
rolling upgrade process works fine without any issues.

However, I am running into an issue where if there is a node with older version 
dies (hardware failure) and a new node comes up and tries to bootstrap, it's 
failing.

I tried two combinations:

1. Joining replacement node with 2.1.16 version of cassandra
In this case nodes with 2.1.16 version are able to stream data to the new node, 
but the nodes with 3.11.3 version are failing with the below error.

ERROR [STREAM-INIT-/10.x.x.x:40296] 2019-07-26 17:45:17,775 
IncomingStreamingConnection.java:80 - Error while reading from socket from 
/10.y.y.y:40296.
java.io.IOException: Received stream using protocol version 2 (my version 4). 
Terminating connection
2. Joining replacement node with 3.11.3 version of cassandra
In this case the nodes with 3.11.3 version of cassandra are able to stream the 
data but it's not able to stream data from the 2.1.16 nodes and failing with 
the below error.

ERROR [STREAM-IN-/10.z.z.z:7000] 2019-07-26 18:08:10,380 StreamSession.java:593 
- [Stream #538c6900-afd0-11e9-a649-ab2e045ee53b] Streaming error occurred on 
session with peer 10.z.z.z
java.io.IOException: Connection reset by peer
   at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_151]
   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
~[na:1.8.0_151]
   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
~[na:1.8.0_151]
   at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_151]
  

RE: [EXTERNAL] Apache Cassandra upgrade path

2019-07-26 Thread Durity, Sean R
What you have seen is totally expected. You can’t stream between different 
major versions of Cassandra. Get the upgrade done, then worry about any down 
hardware. If you are using DCs, upgrade one DC at a time, so that there is an 
available environment in case of any disasters.

My advice, though, is to get through the rolling upgrade process as quickly as 
possible. Don’t stay in a mixed state very long. The cluster will function fine 
in a mixed state – except for those streaming operations. No repairs, no 
bootstraps.


Sean Durity – Staff Systems Engineer, Cassandra

From: Jai Bheemsen Rao Dhanwada 
Sent: Friday, July 26, 2019 2:24 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Apache Cassandra upgrade path

Hello,

I am trying to upgrade Apache Cassandra from 2.1.16 to 3.11.3, the regular 
rolling upgrade process works fine without any issues.

However, I am running into an issue where if there is a node with older version 
dies (hardware failure) and a new node comes up and tries to bootstrap, it's 
failing.

I tried two combinations:

1. Joining replacement node with 2.1.16 version of cassandra
In this case nodes with 2.1.16 version are able to stream data to the new node, 
but the nodes with 3.11.3 version are failing with the below error.

ERROR [STREAM-INIT-/10.x.x.x:40296] 2019-07-26 17:45:17,775 
IncomingStreamingConnection.java:80 - Error while reading from socket from 
/10.y.y.y:40296.
java.io.IOException: Received stream using protocol version 2 (my version 4). 
Terminating connection
2. Joining replacement node with 3.11.3 version of cassandra
In this case the nodes with 3.11.3 version of cassandra are able to stream the 
data but it's not able to stream data from the 2.1.16 nodes and failing with 
the below error.

ERROR [STREAM-IN-/10.z.z.z:7000] 2019-07-26 18:08:10,380 StreamSession.java:593 
- [Stream #538c6900-afd0-11e9-a649-ab2e045ee53b] Streaming error occurred on 
session with peer 10.z.z.z
java.io.IOException: Connection reset by peer
   at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_151]
   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
~[na:1.8.0_151]
   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
~[na:1.8.0_151]
   at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_151]
   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) 
~[na:1.8.0_151]
   at 
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) 
~[na:1.8.0_151]
   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) 
~[na:1.8.0_151]
   at 
java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) 
~[na:1.8.0_151]
   at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
   at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]

Note: In both cases I am using replace_address to replace dead node, as I am 
running into some issues with "nodetool removenode" . I use ephemeral disk, so 
replacement node always comes up with empty data dir and bootstrap.

Any other work around to mitigate this problem? I am worried about any nodes 
going down while we are in the process of upgrade, as it could take several 
hours to upgrade depending on the cluster size.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Bursts of Thrift threads make cluster unresponsive

2019-06-28 Thread Durity, Sean R
This sounds like a bad query or large partition. If a large partition is 
requested on multiple nodes (because of consistency level), it will pressure 
all those replica nodes. Then, as the cluster tries to adjust the rest of the 
load, the other nodes can get overwhelmed, too.

Look at cfstats to see if you have some large partitions. You may also see them 
as warnings in the system.log when they are getting compacted.

Also check for any ALLOW FILTERING queries in the code (or slow query stats, if 
you have them)

Sean


From: Dmitry Simonov 
Sent: Thursday, June 27, 2019 5:22 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Bursts of Thrift threads make cluster unresponsive

> Is there an order in which the events you described happened, or is the order 
> with which you presented them the order you notice things going wrong?

At first, threads count (Thrift) start increasing.
After 2 or 3 minutes they consume all CPU cores.
After that, simultaneously: message drops occur, read latency increases, active 
read tasks are noticed.

пт, 28 июн. 2019 г. в 01:40, Avinash Mandava 
mailto:avin...@vorstella.com>>:
Yeah i skimmed too fast, don't add more work if CPU is pegged, and if using 
thrift protocol NTR would not have values.

Is there an order in which the events you described happened, or is the order 
with which you presented them the order you notice things going wrong?

On Thu, Jun 27, 2019 at 1:29 PM Dmitry Simonov 
mailto:dimmobor...@gmail.com>> wrote:
Thanks for your reply!

> Have you tried increasing concurrent reads until you see more activity in 
> disk?
When problem occurs, freshly created 1.2k - 2k Thrift threads consume all CPU 
on all cores.
Does increasing concurrent reads may help in this situation?

> org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count
This metric is 0 at all cluster nodes.

пт, 28 июн. 2019 г. в 00:34, Avinash Mandava 
mailto:avin...@vorstella.com>>:
Have you tried increasing concurrent reads until you see more activity in disk? 
If you've always got 32 active reads and high pending reads it could just be 
dropping the reads because the queues are saturated. Could be artificially 
bottlenecking at the C* process level.

Also what does this metric show over time:

org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count



On Thu, Jun 27, 2019 at 1:52 AM Dmitry Simonov 
mailto:dimmobor...@gmail.com>> wrote:
Hello!

We've met several times the following problem.

Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes:
- all CPUs have 100% load (normally we have LA 5 on 16-cores machine)
- cassandra's threads count raises from 300 to 1300 - 2000,most of them are 
Thrift threads in java.net.SocketInputStream.socketRead0(Native Method) method, 
count of other threads doesn't increase
- some Read messages are dropped
- read latency (p99.9) increases to 20-30 seconds
- there are up to 32 active Read Tasks, up to 3k - 6k pending Read Tasks

Problem starts synchronously on all nodes of cluster.
I cannot tie this problem with increased load from clients ("read rate" does't 
increase during the problem).
Also looks like there is no problem with disks (I/O latencies are OK).

Could anybody please give some advice in further troubleshooting?

--
Best Regards,
Dmitry Simonov


--
www.vorstella.com
408 691 8402


--
Best Regards,
Dmitry Simonov


--
www.vorstella.com
408 691 8402


--
Best Regards,
Dmitry Simonov



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
ind

RE: [EXTERNAL] Re: Cassandra migration from 1.25 to 3.x

2019-06-17 Thread Durity, Sean R
The advice so far is exactly correct for an in-place kind of upgrade. The blog 
post you mentioned is different. They decided to jump versions in Cassandra by 
standing up a new cluster and using a dual-write/dual-read process for their 
app. They also wrote code to read and interpret sstables in order to migrate 
existing data. Getting that right with compaction running, data consistency, 
etc. it not easy. That is what Cassandra does, of course. They had to reverse 
engineer that process.

I would not personally take that path as it seems a more difficult way to go -- 
for the DBA/admin. It is a nice path for the development team, though. They 
only had to look at their reads and writes (already encapsulated in a DAO) for 
the dual clusters. In a multi-upgrade scenario, drivers and statements probably 
have to get upgraded at several steps along the way (including a move from 
Thrift to CQL, probably). More app testing is required each upgrade. So, the 
decision has to be based on which resources you have and trust (app dev and 
testing + Cassandra upgrades or data migration and testing). Once you have 
automated/semi-automated Cassandra upgrades in place, that is an easier path, 
but that company obviously hadn't invested there.

Sean Durity

-Original Message-
From: Michael Shuler  On Behalf Of Michael Shuler
Sent: Monday, June 17, 2019 8:26 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra migration from 1.25 to 3.x

First and foremost, read NEWS.txt from your current version to the version you 
wish to upgrade to. There are too may details that you many need to be aware 
of. For instance, in the 2.0.0 Upgrading notes:

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_cassandra-2D3.11_NEWS.txt-23L1169-2DL1178&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=CPWy2XBaDFHzVDapNO4E7kLIFFfbkRd8KqftSrypjSU&s=D3Y18E9gxewpushCMETjHt9cS8lKvLMrhUdhPriF4Dk&e=

I assume you meant 1.2.5, so you're first step is to upgrade to at least
1.2.9 (I would suggest using latest 1.2.x, which is 1.2.19). Then you can to to 
2.0.x and up.

Practicing on a scratch cluster is valuable experience. Reading the upgrade 
notes in NEWS.txt is a must.

--
Kind regards,
Michael

On 6/17/19 3:34 AM, Anurag Sharma wrote:
> Thanks Alex,
>
> I came across some interesting and efficient ways of upgrading from
> 1.x to 3.x as described in the blog here
>  eezilkha_database-2Dmigration-2Dat-2Dscale-2Dae85c14c3621&d=DwIDaQ&c=M
> tgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=
> CPWy2XBaDFHzVDapNO4E7kLIFFfbkRd8KqftSrypjSU&s=7ekpEXHT1Qm_xL9l6_1Kty32
> fDDerlB_PgO1-4K1-VQ&e= > and others. Was curious if someone has
> open-sourced their custom utility.  :D
>
> Regards
> Anurag
>
> On Mon, Jun 17, 2019 at 1:27 PM Oleksandr Shulgin
> mailto:oleksandr.shul...@zalando.de>> wrote:
>
> On Mon, Jun 17, 2019 at 9:30 AM Anurag Sharma
> mailto:anurag.rp.sha...@gmail.com>> wrote:
>
>
> We are upgrading Cassandra from 1.25 to 3.X. Just curious if
> there is any recommended open source utility for the same.
>
>
> Hi,
>
> The "recommended  open source utility" is the Apache Cassandra
> itself. ;-)
>
> Given the huge difference between the major versions, though, you
> will need a decent amount of planning and preparation to
> successfully complete such a migration.  Most likely you will want
> to do it in small steps, first upgrading to the latest minor version
> in the 1.x series, then making a jump to 2.x, then to 3.0, and only
> then to 3.x if you really mean to.  On each upgrade step, be sure to
> examine the release notes carefully to understand if there is any
> impact for your cluster and/or client applications.  Do have a test
> system with preferably identical setup and configuration and execute
> the upgrade steps there first to verify your expectations.
>
> Good luck!
> --
> Alex
>


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this atta

RE: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Durity, Sean R
I’m not sure it is correct to say, “you cannot.” However, that is a more 
complicated restore and more likely to lead to inconsistent data and take 
longer to do. You are basically trying to start from a backup point and roll 
everything forward and catch up to current.

Replacing/re-streaming is the well-trodden path. You are getting the net result 
of all that has happened since the node failure. And the node is not returning 
data to the clients while the bootstrap is running. If you have a 
restored/repairing node, it will accept client (and coordinator) connections 
even though it isn’t (guaranteed) consistent, yet.

As I understand it – a full cluster recovery from backup still requires repair 
across the cluster to ensure consistency. In my experience, most apps cannot 
wait for a full restore/repair. Availability matters more. They also don’t want 
to pay for even more disk to hold some level of backups.

There are some companies that provide finer-grained backup and recovery 
options, though.

Sean Durity

From: Alan Gano 
Sent: Wednesday, June 12, 2019 1:43 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Recover lost node from backup or evict/re-add?


Is it correct to say that a lost node cannot be restored from backup?  You must 
either replace the node or evict/re-add (i.e., rebuild from other nodes).

Also, that snapshot, incremental, commitlog backups are relegated to 
application keyspace recovery only?


How about recovery of the entire cluster? (rolling it back).  Are snapshots 
exact enough, in time, to not have a nodes that differ, in point-in-time, from 
the rest of the cluster?  Would those nodes be recoverable (nodetool repair?) … 
which brings me back to recovering a lost node from backup (restore last 
snapshot, and run nodetool repair?).


Thanks,

Alan Gano


From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Wednesday, June 12, 2019 10:14 AM
To: user@cassandra.apache.org
Subject: Re: Recover lost node from backup or evict/re-add?

A host can replace itself using the method I described

On Jun 12, 2019, at 7:10 AM, Alan Gano mailto:ag...@tsys.com>> 
wrote:
I guess I’m considering this scenario:

  *   host and configuration have survived
  *   /data is gone
  *   /backups have survived

I have tested recovering from this scenario with an evict/re-add, which worked 
fine.

If I restore from backup, the node will be behind the cluster – e, does it 
get caught up after a restore and start it up?

Alan

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Wednesday, June 12, 2019 10:02 AM
To: user@cassandra.apache.org
Subject: Re: Recover lost node from backup or evict/re-add?

To avoid violating consistency guarantees, you have to repair the replicas 
while the lost node is down

Once you do that it’s typically easiest to bootstrap a replacement (there’s a 
property named “replace address first boot” you can google or someone can link) 
that tells a new joining host to take over for a failed machine.


On Jun 12, 2019, at 6:54 AM, Alan Gano mailto:ag...@tsys.com>> 
wrote:

If I lose a node, does it make sense to even restore from 
snapshot/incrementals/commitlogs?

Or is the best way to do an evict/re-add?


Thanks,

Alan.

NOTICE: This communication is intended only for the person or entity to whom it 
is addressed and may contain confidential, proprietary, and/or privileged 
material. Unless you are the intended addressee, any review, reliance, 
dissemination, distribution, copying or use whatsoever of this communication is 
strictly prohibited. If you received this in error, please reply immediately 
and delete the material from all computers. Email sent through the Internet is 
not secure. Do not use email to send us confidential information such as credit 
card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, 
or other important and confidential information.
NOTICE: This communication is intended only for the person or entity to whom it 
is addressed and may contain confidential, proprietary, and/or privileged 
material. Unless you are the intended addressee, any review, reliance, 
dissemination, distribution, copying or use whatsoever of this communication is 
strictly prohibited. If you received this in error, please reply immediately 
and delete the material from all computers. Email sent through the Internet is 
not secure. Do not use email to send us confidential information such as credit 
card numbers, PIN numbers, passwords, Social Security Numbers, Account numbers, 
or other important and confidential information.
NOTICE: This communication is intended only for the person or entity to whom it 
is addressed and may contain confidential, proprietary, and/or privileged 
material. Unless you are the intended addressee, any review, reliance, 
dissemination, distribution, copying or use whatsoever of this communication is 
strictly prohibited. If you received this in error, p

RE: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Durity, Sean R
This may sound a bit harsh, but I teach my developers that if they are trying 
to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for 
its high availability and scalability characteristics. We love no downtime. 
ALLOW FILTERING is breaking the rules of availability and scalability.

Look at the full text of the error (not just the ending):
Bad Request: Cannot execute this query as it might involve data filtering and 
thus may have unpredictable performance. If you want to execute this query 
despite the performance unpredictability, use ALLOW FILTERING.
It is being polite, but it does warn you that performance is unpredictable. I 
can predict this: allow filtering will not scale. It won’t scale to large 
numbers of nodes (with small tables) or to large numbers of rows (regardless of 
node count). If you ignore the admittedly too polite warning, Cassandra will 
try to answer your query. It does it with a brute force, scan everything 
approach on all nodes (because you didn’t give it any partitions to target 
directly). That gets expensive and dangerous quickly. And, yes, it can endanger 
the whole cluster.

As an administrator, I do think that Cassandra should be able to protect itself 
better, perhaps by allowing the administrator to disallow those queries at all. 
It does at least warn you.


From: Attila Wind 
Sent: Tuesday, May 28, 2019 4:47 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to 
prevent such behavior?


Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine

Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror the data 
(somehow) OR add denormalized tables (write + code complexity overhead) to 
fulfill those queries

Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect them being aware 
of the consequences of a particular query.
We also have test data fully mirrored into a test cluster. So running those 
queries on test system is possible.
Plus If for whatever reason we really really need to run such a query in Prod I 
can simply instruct them test query like this in the test system for sure

cheers
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 28. 8:59, shalom sagges wrote:
Hi Attila,

I'm definitely no guru, but I've experienced several cases where people at my 
company used allow filtering and caused major performance issues.
As data size increases, the impact will be stronger. If you have large 
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many times, you 
will feel it.

I sincerely believe the best way would be to educate the users and remodel the 
data. Perhaps you need to denormalize your tables or at least use secondary 
indices (I prefer to keep it as simple as possible and denormalize).
If it's a cluster for analytics, perhaps you need to build a designated cluster 
only for that so if something does break or get too pressured, normal 
activities wouldn't be affected, but there are pros and cons for that idea too.

Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind 
 wrote:

Hi Gurus,

Looks we stopped this thread. However I would be very much curious answers 
regarding b) ...

Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as we are 
planning to run analysis queries by hand exactly like that over the cluster...

thanks!
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 23. 11:42, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key right? 
(so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I using ALLOW 
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to multiple tables or at least create an index 
on the requested column (preferably queried together with a known partition 
key).


b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness 
to external requests" behavior... Even if servers are busy with the request 
seriously becoming non-responsive...?

I think it can justify t

  1   2   3   >