Tombstone experience

2018-08-23 Thread Charulata Sharma (charshar)
Hi All,

   I have shared my experience of tombstone clearing in this blog post.
Sharing it in this forum for wider distribution.

https://medium.com/cassandra-tombstones-clearing-use-case/the-curios-case-of-tombstones-d897f681a378


Thanks,
Charu


Re: Upgrade from 2.1 to 3.11

2018-08-23 Thread Mun Dega
Looks like there are a couple of issues open regarding 3.11.2 release:

https://issues.apache.org/jira/browse/CASSANDRA-14355
https://issues.apache.org/jira/browse/CASSANDRA-14495

Ma


On Thu, Aug 23, 2018 at 10:54 PM Mun Dega  wrote:

> Interesting.  Any other suggestions what other things changed that would
> be major between 2.1 and 3.x?
>
> In Change Log all isee is bug fixes and further improvements.
>
>
>
> On Thu, Aug 23, 2018, 21:37 Gosar M  wrote:
>
>> we also increased the max_heap_count (sysctl.conf) value to resolve OOO
>> memory issue.
>>
>> On Thursday, 23 August 2018, 16:04:20 GMT-7, Mun Dega <
>> mundeg...@gmail.com> wrote:
>>
>>
>> 120G data
>> 28G heap out of 48 on system
>> 9 node cluster, RF3
>>
>>
>> On Thu, Aug 23, 2018, 17:19 Mohamadreza Rostami <
>> mohamadrezarosta...@gmail.com> wrote:
>>
>> Hi,
>> How much data do you have? How much RAM do your servers have? How much do
>> you have a heep?
>> On Thu, Aug 23, 2018 at 10:14 PM Mun Dega  wrote:
>>
>> Hello,
>>
>> We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.  The
>> process went OK including upgradesstable but we started to experience high
>> latency for r/w, occasional OOM and long GC pause after.
>>
>> For the same cluster with 2.1, we didn't have any issues like this.  We
>> also kept server specs, heap, all the same in post upgrade
>>
>> Has anyone else had similar issues going to 3.11 and what are the major
>> changes that could have such a major setback in the new version?
>>
>> Ma Dega
>>
>>


Re: Upgrade from 2.1 to 3.11

2018-08-23 Thread Mun Dega
Interesting.  Any other suggestions what other things changed that would be
major between 2.1 and 3.x?

In Change Log all isee is bug fixes and further improvements.



On Thu, Aug 23, 2018, 21:37 Gosar M  wrote:

> we also increased the max_heap_count (sysctl.conf) value to resolve OOO
> memory issue.
>
> On Thursday, 23 August 2018, 16:04:20 GMT-7, Mun Dega 
> wrote:
>
>
> 120G data
> 28G heap out of 48 on system
> 9 node cluster, RF3
>
>
> On Thu, Aug 23, 2018, 17:19 Mohamadreza Rostami <
> mohamadrezarosta...@gmail.com> wrote:
>
> Hi,
> How much data do you have? How much RAM do your servers have? How much do
> you have a heep?
> On Thu, Aug 23, 2018 at 10:14 PM Mun Dega  wrote:
>
> Hello,
>
> We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.  The
> process went OK including upgradesstable but we started to experience high
> latency for r/w, occasional OOM and long GC pause after.
>
> For the same cluster with 2.1, we didn't have any issues like this.  We
> also kept server specs, heap, all the same in post upgrade
>
> Has anyone else had similar issues going to 3.11 and what are the major
> changes that could have such a major setback in the new version?
>
> Ma Dega
>
>


Re: Cassandra 2.2.7 Compaction after Truncate issue

2018-08-23 Thread James Shaw
you may go OS level to delete the files.That's what I did before.  Truncate
action is frequently failed on some remote nodes in a heavy transactions
env.

Thanks,

James

On Thu, Aug 23, 2018 at 8:54 PM, Rahul Singh 
wrote:

> David ,
>
> What CL do you set when running this command?
>
> Rahul Singh
> Chief Executive Officer
> m 202.905.2818
>
> Anant Corporation
> 1010 Wisconsin Ave NW, Suite 250
> 
> Washington, D.C. 20007
>
> We build and manage digital business technology platforms.
> On Aug 14, 2018, 11:49 AM -0500, David Payne , wrote:
>
> Scenario: Cassandra 2.2.7, 3 nodes, RF=3 keyspace.
>
>
>
> Truncate a table.
>
> More than 24 hours later… FileCacheService is still reporting cold readers
> for sstables of truncated data for node 2 and 3, but not node 1.
>
> The output of nodeool compactionstats shows stuck compaction for the
> truncated table for node 2 and 3, but not node 1.
>
>
>
> This appears to be a defect that was fixed in 2.1.0.
> https://issues.apache.org/jira/browse/CASSANDRA-7803
>
>
>
> Any ideas?
>
>
>
> Thanks,
>
> David Payne
>
> | ̄ ̄|
> _☆☆☆_
> ( ´_⊃`)
>
> c. 303-717-0548
>
> dav...@cqg.com
>
>
>
>


Re: Upgrade from 2.1 to 3.11

2018-08-23 Thread Gosar M
we also increased the max_heap_count (sysctl.conf) value to resolve OOO memory 
issue. 

   On Thursday, 23 August 2018, 16:04:20 GMT-7, Mun Dega  
wrote:  
 
 120G data28G heap out of 48 on system9 node cluster, RF3

On Thu, Aug 23, 2018, 17:19 Mohamadreza Rostami  
wrote:

Hi,How much data do you have? How much RAM do your servers have? How much do 
you have a heep?
On Thu, Aug 23, 2018 at 10:14 PM Mun Dega  wrote:

Hello,
We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.  The process 
went OK including upgradesstable but we started to experience high latency for 
r/w, occasional OOM and long GC pause after.
For the same cluster with 2.1, we didn't have any issues like this.  We also 
kept server specs, heap, all the same in post upgrade
Has anyone else had similar issues going to 3.11 and what are the major changes 
that could have such a major setback in the new version?
Ma Dega

  

Re: How to rename the column name in Cassandra tables

2018-08-23 Thread Rahul Singh
Which documentation are you referring to? Which version of Cassandra?

https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlAlterTable.html

Renaming a column
The main purpose of RENAME is to change the names of CQL-generated primary key 
and column names that are missing from a legacy table. The following 
restrictions apply to the RENAME operation:
• You can only rename clustering columns, which are part of the primary key.
• You cannot rename the partition key.
• You can index a renamed column.
• You cannot rename a column if an index has been created on it.
• You cannot rename a static column, since you cannot use a static column in 
the table's primary key.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 13, 2018, 7:42 AM -0500, Irtiza Ali , wrote:
> Hello everyone,
>
> Issue
> Currently, we are facing an issue of renaming the Cassandra table's column 
> name. According to the documentation, one can change the name of only those 
> columns that are part of primary or clustering columns(keys).
>
>
> Question
> Is there any way to rename the name of non-primary or clustering 
> columns(keys)?
>
> Thank you
>
> IA


Re: Fwd: Removing Extra Spaces and Row counts while using Capture Command

2018-08-23 Thread Rahul Singh
What’s your goal? Just output the results and save as JSON?


There may be a better way to do what you want.

https://github.com/tenmax/cqlkit/blob/master/README.md


Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 13, 2018, 9:14 PM -0500, kumar bharath , 
wrote:
> >
> > Hi All,
> >
> > I am using Cassandra Capture Command to perform a select query operation to 
> > write data from a column family into JSON format file for further 
> > processing. I am able to do that successfully, but  I am seeing extra 
> > spaces and row count values after every few records.
> >
> > please suggest a to get rid of these unusual extra spaces and row count 
> > values.
> >
> > Regards,
> > Bharath Kumar B
>


Re: Cassandra 2.2.7 Compaction after Truncate issue

2018-08-23 Thread Rahul Singh
David ,

What CL do you set when running this command?

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 14, 2018, 11:49 AM -0500, David Payne , wrote:
> Scenario: Cassandra 2.2.7, 3 nodes, RF=3 keyspace.
>
> Truncate a table.
> More than 24 hours later… FileCacheService is still reporting cold readers 
> for sstables of truncated data for node 2 and 3, but not node 1.
> The output of nodeool compactionstats shows stuck compaction for the 
> truncated table for node 2 and 3, but not node 1.
>
> This appears to be a defect that was fixed in 2.1.0. 
> https://issues.apache.org/jira/browse/CASSANDRA-7803
>
> Any ideas?
>
> Thanks,
> David Payne
>     | ̄ ̄|
> _☆☆☆_
> ( ´_⊃`)
> c. 303-717-0548
> dav...@cqg.com
>


Re: 90million reads

2018-08-23 Thread Rahul Singh
Agreed. If your data model is good and no major read latencies due to little or 
no data skew, wide partitions, or tombstones, you can literally scale linearly.

You could also consider having a plan in which you ramp up as the traffic 
increases.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 14, 2018, 6:31 PM -0500, kurt greaves , wrote:
> Not a great idea to make config changes without testing. For a lot of changes 
> you can make the change on one node and measure of three is an improvement 
> however.
>
> You'd probably be best to add nodes (double should be sufficient), do tuning 
> and testing afterwards, and then decommission a few nodes if you can.
>
> > On Wed., 15 Aug. 2018, 05:00 Abdul Patel,  wrote:
> > > Currently our cassandra prod is 18 node 3 dc cluster and application does 
> > > 55 million reads per day and want to add load and make it 90 millon reads 
> > > per day.they need a guestimate of resources which we need to bump without 
> > > testing ..on top of my head we can increase heap and  native trasport 
> > > value ..any other paramters i should be concern?


Re: Upgrade from 2.1 to 3.11

2018-08-23 Thread Mun Dega
120G data
28G heap out of 48 on system
9 node cluster, RF3


On Thu, Aug 23, 2018, 17:19 Mohamadreza Rostami <
mohamadrezarosta...@gmail.com> wrote:

> Hi,
> How much data do you have? How much RAM do your servers have? How much do
> you have a heep?
> On Thu, Aug 23, 2018 at 10:14 PM Mun Dega  wrote:
>
>> Hello,
>>
>> We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.  The
>> process went OK including upgradesstable but we started to experience high
>> latency for r/w, occasional OOM and long GC pause after.
>>
>> For the same cluster with 2.1, we didn't have any issues like this.  We
>> also kept server specs, heap, all the same in post upgrade
>>
>> Has anyone else had similar issues going to 3.11 and what are the major
>> changes that could have such a major setback in the new version?
>>
>> Ma Dega
>>
>


Re: A blog about Cassandra in the IoT arena

2018-08-23 Thread Rahul Singh
Agreed. One of the ideas I had on partition size is to automatically 
synthetically shard based on some basic patterns seen in the data.

It could be implemented as a tool that would create a new table with an 
additional part of the key that is an automatic created shard, or it would use 
an existing key and then migrate the data.

The internal automatic shard would adjust as needed and keep “Subpartitons” or 
“rowsets” but return the full partition given some special CQL

This is done today at the Data Access layer and he data model design but it’s 
pretty much a step by step process that could be algorithmically done.

Regarding the tombstone — maybe we have another thread dedicated to cleaning 
tombstones - separate from compaction. Depending on the amount of tombstones 
and a threshold, it would be dedicated to deletion. It may be an edge case , 
but people face issues with tombstones all the time because they don’t know 
better.

Rahul
On Aug 23, 2018, 11:50 AM -0500, DuyHai Doan , wrote:
> As I used to tell some people, the day we make :
>
> 1. partition size unlimited, or at least huge partition easily manageable 
> (compaction, repair, streaming, partition index file)
> 2. tombstone a non-issue
>
> that day, Cassandra will dominate any other IoT technology out there
>
> Until then ...
>
> > On Thu, Aug 23, 2018 at 4:54 PM, Rahul Singh  
> > wrote:
> > > Good analysis of how the different key structures affect use cases and 
> > > performance. I think you could extend this article with potential 
> > > evaluation of FiloDB which specifically tries to solve the OLAP issue 
> > > with arbitrary queries.
> > >
> > > Another option is leveraging Elassandra (index in Elasticsearch 
> > > collocates with C*) or DataStax (index in Solr collocated with C*)
> > >
> > > I personally haven’t used SnappyData but that’s another Spark based DB 
> > > that could be leveraged for performance real-time queries on the OLTP 
> > > side.
> > >
> > > Rahul
> > > On Aug 23, 2018, 2:48 AM -0500, Affan Syed , wrote:
> > > > Hi,
> > > >
> > > > we wrote a blog about some of the results that engineers from AN10 
> > > > shared earlier.
> > > >
> > > > I am sharing it here for greater comments and discussions.
> > > >
> > > > http://www.an10.io/technology/cassandra-and-iot-queries-are-they-a-good-match/
> > > >
> > > >
> > > > Thank you.
> > > >
> > > >
> > > >
> > > > - Affan
>


Re: Upgrade from 2.1 to 3.11

2018-08-23 Thread Mohamadreza Rostami
Hi,
How much data do you have? How much RAM do your servers have? How much do
you have a heep?
On Thu, Aug 23, 2018 at 10:14 PM Mun Dega  wrote:

> Hello,
>
> We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.  The
> process went OK including upgradesstable but we started to experience high
> latency for r/w, occasional OOM and long GC pause after.
>
> For the same cluster with 2.1, we didn't have any issues like this.  We
> also kept server specs, heap, all the same in post upgrade
>
> Has anyone else had similar issues going to 3.11 and what are the major
> changes that could have such a major setback in the new version?
>
> Ma Dega
>


Upgrade from 2.1 to 3.11

2018-08-23 Thread Mun Dega
Hello,

We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.  The
process went OK including upgradesstable but we started to experience high
latency for r/w, occasional OOM and long GC pause after.

For the same cluster with 2.1, we didn't have any issues like this.  We
also kept server specs, heap, all the same in post upgrade

Has anyone else had similar issues going to 3.11 and what are the major
changes that could have such a major setback in the new version?

Ma Dega


Switching Snitch

2018-08-23 Thread Pradeep Chhetri
Hello,

I am currently running a 3.11.2 cluster in SimpleSnitch hence the
datacenter is datacenter1 and rack is rack1 for all nodes on AWS. I want to
switch to GPFS by changing the rack name to the availability-zone name and
datacenter name to region name.

When I try to restart individual nodes by changing those values, it failed
to start throwing the error about dc and rack name mismatch but gives me an
option to set ignore_dc and ignore_rack to true to bypass it.

I am not sure if it is safe to set those two flags to true and if there is
any drawback now or in future when i add a new datacenter to the cluster. I
went through the documentation on Switching Snitches but didn't get much
explanation.

Regards,
Pradeep


Re: A blog about Cassandra in the IoT arena

2018-08-23 Thread DuyHai Doan
As I used to tell some people, the day we make :

1. partition size unlimited, or at least huge partition easily manageable
(compaction, repair, streaming, partition index file)
2. tombstone a non-issue

that day, Cassandra will dominate any other IoT technology out there

Until then ...

On Thu, Aug 23, 2018 at 4:54 PM, Rahul Singh 
wrote:

> Good analysis of how the different key structures affect use cases and
> performance. I think you could extend this article with potential
> evaluation of FiloDB which specifically tries to solve the OLAP issue with
> arbitrary queries.
>
> Another option is leveraging Elassandra (index in Elasticsearch collocates
> with C*) or DataStax (index in Solr collocated with C*)
>
> I personally haven’t used SnappyData but that’s another Spark based DB
> that could be leveraged for performance real-time queries on the OLTP side.
>
> Rahul
> On Aug 23, 2018, 2:48 AM -0500, Affan Syed , wrote:
>
> Hi,
>
> we wrote a blog about some of the results that engineers from AN10 shared
> earlier.
>
> I am sharing it here for greater comments and discussions.
>
> http://www.an10.io/technology/cassandra-and-iot-queries-are-
> they-a-good-match/
>
>
> Thank you.
>
>
>
> - Affan
>
>


Re: A blog about Cassandra in the IoT arena

2018-08-23 Thread Rahul Singh
Good analysis of how the different key structures affect use cases and 
performance. I think you could extend this article with potential evaluation of 
FiloDB which specifically tries to solve the OLAP issue with arbitrary queries.

Another option is leveraging Elassandra (index in Elasticsearch collocates with 
C*) or DataStax (index in Solr collocated with C*)

I personally haven’t used SnappyData but that’s another Spark based DB that 
could be leveraged for performance real-time queries on the OLTP side.

Rahul
On Aug 23, 2018, 2:48 AM -0500, Affan Syed , wrote:
> Hi,
>
> we wrote a blog about some of the results that engineers from AN10 shared 
> earlier.
>
> I am sharing it here for greater comments and discussions.
>
> http://www.an10.io/technology/cassandra-and-iot-queries-are-they-a-good-match/
>
>
> Thank you.
>
>
>
> - Affan


Re: duplicate rows for partition

2018-08-23 Thread Marcus Olsson
Hi,

I believe the "tDate" is stored with milliseconds precision which could explain 
why these duplicate dates are shown.

Could you try running the following query:
"SELECT userid, secondaryid, blobAsBigint(timestampAsBlob("tDate")), tid3, 
sid4, pid5, associate_degree FROM user_data;"

The "tDate" column should be shown with different values at that point.

Best Regards
Marcus Olsson

On ons, 2018-08-22 at 19:20 -0400, James Shaw wrote:
can you run this:
select associate_degree, writetime( associate_degree ) from user_data where 

Thanks,

James

On Wed, Aug 22, 2018 at 7:13 PM, James Shaw 
mailto:jxys...@gmail.com>> wrote:
can you run this:
select writetime( associate_degree ) from user_data where 
see what are writetime

On Wed, Aug 22, 2018 at 7:03 PM, James Shaw 
mailto:jxys...@gmail.com>> wrote:
interesting. what are insert statement and select statement ?

Thanks,

James

On Wed, Aug 22, 2018 at 6:55 PM, Gosar M 
mailto:koolja...@yahoo.com.invalid>> wrote:
CREATE TABLE user_data (
"userid" text,
"secondaryid" text,
"tDate" timestamp,
"tid3" text,
"sid4" text,
"pid5" text,
associate_degree text
  PRIMARY KEY (("userid", "secondaryid"),"tDate", "tid3", "sid4", "pid5")
  WITH CLUSTERING ORDER BY ("tDate" ASC, "tid3" ASC, "sid4" ASC, "pid5" ASC)



On Wednesday, 22 August 2018, 15:08:03 GMT-7, dinesh.jo...@yahoo.com.INVALID 
mailto:dinesh.jo...@yahoo.com>.INVALID> wrote:


What is the schema of the table? Could your include the output of DESCRIBE?

Dinesh


On Wednesday, August 22, 2018, 2:22:31 PM PDT, Gosar M 
 wrote:


Hello,

Have a table with following partition and clustering keys

partition key - ("userid", "secondaryid"),
clustering key - "tDate", "tid3", "sid4", "pid5"

Data is inserted based on above partition and clustering key. For 1 record 
seeing 2 rows returned when queried by both partition and clustering key.


 userid  | secondaryid  | tdate   | tid3  | sid4 | pid5 
   | associate_degree
 
--+-+
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694989


We did not had any node which was down longer than gc_grace_period.


Thank you.








A blog about Cassandra in the IoT arena

2018-08-23 Thread Affan Syed
Hi,

we wrote a blog about some of the results that engineers from AN10 shared
earlier.

I am sharing it here for greater comments and discussions.

http://www.an10.io/technology/cassandra-and-iot-queries-are-they-a-good-match/


Thank you.



- Affan