RE: Unable to track compaction completion

2019-02-19 Thread Kenneth Brotman
Hi Rajsekhar,

 

I think monitoring the CompactionManagerMBean is what you need.

 

Kenneth Brotman

 

From: Rajsekhar Mallick [mailto:raj.mallic...@gmail.com] 
Sent: Friday, February 15, 2019 8:59 AM
To: user@cassandra.apache.org
Subject: Unable to track compaction completion

 

Hello team,

 

I have been trying to figure out, how to track the completion of a compaction 
on a node.

Nodetool compactionstats show instantaneous results.

 

I found that system.compaction_in_progress gets me the same details as that of 
compactionstats. Also it gets me a id for running compaction.

I was of the view that checking for the same id in system.compaction_history 
would fetch me the compaction details after a running compaction ends.

But no such relationship exists I see.

Please do confirm on the above.

 

Thanks,

Rajsekhar Mallick



RE: tombstones threshold warning

2019-02-19 Thread Kenneth Brotman
There is another good article called Common Problems with Cassandra Tombstones 
by Alla Babkina at 
https://opencredo.com/blogs/cassandra-tombstones-common-issues/ .   It says 
interesting stuff like:

 

1.   You can get tombstones without deleting anything

2.   Inserting null values causes tombstones

3.   Inserting values into collection columns results in tombstones even if 
you never delete a value

4.   Expiring Data with TTL results in tombstones (of course)

5.   The Invisible Column Ranges Tombstones – Resolved in CASSANDRA-11166 
though; I should have said that in the last email  too not CASSANDRA-8527.  But 
it shouldn’t be this since you are on 3.11.3.

 

I think number three above answers your questions based on your original post.  
See the article for the details.  It’s really good.

 

Kenneth Brotman 

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Tuesday, February 19, 2019 10:12 PM
To: 'user@cassandra.apache.org'
Subject: RE: tombstones threshold warning

 

Hi Ayub,

 

Is everything flushing to SSTables?  It has to be somewhere right?  So is it in 
the memtables?

 

Or is it that there are tombstones that are sometimes detected and sometimes 
not detected as described in the very detailed article on The Last Pickle by 
Alex Dejanovski called Undetectable tombstones in Apache Cassandra: 
http://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html
 . 

 

I thought that was resolved in 3.11.2 by CASSANDRA-8527; and you are running 
3.11.3!  Is there still an outstanding issue?

 

Kenneth Brotman

 

 

From: Ayub M [mailto:hia...@gmail.com] 
Sent: Saturday, February 16, 2019 9:58 PM
To: user@cassandra.apache.org
Subject: tombstones threshold warning

 

In the logs I see tombstone warning threshold.

Read 411 live rows and 1644 tombstone cells for query SELECT * FROM ks.tbl 
WHERE key = XYZ LIMIT 5000 (see tombstone_warn_threshold)

This is Cassandra 3.11.3, I see there are 2 sstables for this table and the 
partition XYZ exists in only one file. Now I dumped this sstable into json 
using sstabledump. I extracted the data of only this partition and I see there 
are only 411 rows in it. And all of them are active/live records, so I do not 
understand from where these tombstone are coming from?

This table has collection columns and there are cell tombstones for the 
collection columns when they were inserted. Does collection cell tombstones get 
counted as tombstones cells in the warning displayed?

Did a small test to see if collection tombstones are counted as tombstones and 
it does not seem so. So wondering where are those tombstones coming from in my 
above query.

CREATE TABLE tbl (
col1 text,
col2 text,
c1 int,
col3 map,
PRIMARY KEY (col1, col2)
) WITH CLUSTERING ORDER BY (col2 ASC)
 
cassandra@cqlsh:dev_test> insert into tbl (col1 , col2 , c1, col3 ) 
values('3','3',3,{'key':'value'});
cassandra@cqlsh:dev_test> select * from tbl where col1 = '3';
 col1 | col2 | c1 | col3
+--++--
  3 |3 |  3 | {'key': 'value'}
(1 rows)
 
Tracing session: 4c2a1894-3151-11e9-838d-29ed5fcf59ee
 activity   
  | timestamp  | source| source_elapsed | client
--++---++---
   Execute 
CQL3 query | 2019-02-15 18:41:25.145000 | 10.216.1.1 |  0 | 
127.0.0.1
  Parsing select * from tbl where col1 = '3'; [CoreThread-3]
  | 2019-02-15 18:41:25.145000 | 10.216.1.1 |177 | 127.0.0.1
   Preparing statement 
[CoreThread-3] | 2019-02-15 18:41:25.145001 | 10.216.1.1 |295 | 
127.0.0.1
Reading data from [/10.216.1.1] 
[CoreThread-3]| 2019-02-15 18:41:25.146000 | 10.216.1.1 |491 | 
127.0.0.1
Executing single-partition query on tbl 
[CoreThread-2]| 2019-02-15 18:41:25.146000 | 10.216.1.1 |770 | 
127.0.0.1
  Acquiring sstable references 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |897 | 
127.0.0.1
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1096 | 
127.0.0.1
 Merged data from memtables and 1 sstables 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1235 | 
127.0.0.1
Read 1 live rows and 0 tombstone cells 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1317 | 
127.0.0.1

RE: tombstones threshold warning

2019-02-19 Thread Kenneth Brotman
Hi Ayub,

 

Is everything flushing to SSTables?  It has to be somewhere right?  So is it in 
the memtables?

 

Or is it that there are tombstones that are sometimes detected and sometimes 
not detected as described in the very detailed article on The Last Pickle by 
Alex Dejanovski called Undetectable tombstones in Apache Cassandra: 
http://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html
 . 

 

I thought that was resolved in 3.11.2 by CASSANDRA-8527; and you are running 
3.11.3!  Is there still an outstanding issue?

 

Kenneth Brotman

 

 

From: Ayub M [mailto:hia...@gmail.com] 
Sent: Saturday, February 16, 2019 9:58 PM
To: user@cassandra.apache.org
Subject: tombstones threshold warning

 

In the logs I see tombstone warning threshold.

Read 411 live rows and 1644 tombstone cells for query SELECT * FROM ks.tbl 
WHERE key = XYZ LIMIT 5000 (see tombstone_warn_threshold)

This is Cassandra 3.11.3, I see there are 2 sstables for this table and the 
partition XYZ exists in only one file. Now I dumped this sstable into json 
using sstabledump. I extracted the data of only this partition and I see there 
are only 411 rows in it. And all of them are active/live records, so I do not 
understand from where these tombstone are coming from?

This table has collection columns and there are cell tombstones for the 
collection columns when they were inserted. Does collection cell tombstones get 
counted as tombstones cells in the warning displayed?

Did a small test to see if collection tombstones are counted as tombstones and 
it does not seem so. So wondering where are those tombstones coming from in my 
above query.

CREATE TABLE tbl (
col1 text,
col2 text,
c1 int,
col3 map,
PRIMARY KEY (col1, col2)
) WITH CLUSTERING ORDER BY (col2 ASC)
 
cassandra@cqlsh:dev_test> insert into tbl (col1 , col2 , c1, col3 ) 
values('3','3',3,{'key':'value'});
cassandra@cqlsh:dev_test> select * from tbl where col1 = '3';
 col1 | col2 | c1 | col3
+--++--
  3 |3 |  3 | {'key': 'value'}
(1 rows)
 
Tracing session: 4c2a1894-3151-11e9-838d-29ed5fcf59ee
 activity   
  | timestamp  | source| source_elapsed | client
--++---++---
   Execute 
CQL3 query | 2019-02-15 18:41:25.145000 | 10.216.1.1 |  0 | 
127.0.0.1
  Parsing select * from tbl where col1 = '3'; [CoreThread-3]
  | 2019-02-15 18:41:25.145000 | 10.216.1.1 |177 | 127.0.0.1
   Preparing statement 
[CoreThread-3] | 2019-02-15 18:41:25.145001 | 10.216.1.1 |295 | 
127.0.0.1
Reading data from [/10.216.1.1] 
[CoreThread-3]| 2019-02-15 18:41:25.146000 | 10.216.1.1 |491 | 
127.0.0.1
Executing single-partition query on tbl 
[CoreThread-2]| 2019-02-15 18:41:25.146000 | 10.216.1.1 |770 | 
127.0.0.1
  Acquiring sstable references 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |897 | 
127.0.0.1
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1096 | 
127.0.0.1
 Merged data from memtables and 1 sstables 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1235 | 
127.0.0.1
Read 1 live rows and 0 tombstone cells 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1317 | 
127.0.0.1
 
Request complete | 2019-02-15 18:41:25.146529 | 10.216.1.1 |   1529 | 
127.0.0.1
[root@localhost tbl-8aaa6bc1315011e991e523330936276b]# sstabledump 
aa-1-bti-Data.db 
[
  {
"partition" : {
  "key" : [ "3" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 41,
"clustering" : [ "3" ],
"liveness_info" : { "tstamp" : "2019-02-15T18:36:16.838103Z" },
"cells" : [
  { "name" : "c1", "value" : 3 },
  { "name" : "col3", "deletion_info" : { "marked_deleted" : 
"2019-02-15T18:36:16.838102Z", "local_delete_time" : "2019-02-15T18:36:17Z" } },
  { "name" : "col3", "path" : [ "key" ], "value" : "value" }
]
  }
]
  }


RE: Looking for feedback on automated root-cause system

2019-02-19 Thread Kenneth Brotman
Any information you can share on the inputs it needs/uses would be helpful.

 

Kenneth Brotman

 

From: daemeon reiydelle [mailto:daeme...@gmail.com] 
Sent: Tuesday, February 19, 2019 4:27 PM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

Welcome to the world of testing predictive analytics. I will pass this on to my 
folks at Accenture, know of a couple of C* clients we run, wondering what you 
had in mind?

 

 

Daemeon C.M. Reiydelle

email: daeme...@gmail.com

San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype daemeon.c.mreiydelle

 

 

On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8 
years, and have worked on hundreds of Cassandra deployments. One of the things 
I’ve noticed in myself and a lot of my peers that have done consulting, support 
or worked on really big deployments is that we get burnt out. We fight a lot of 
the same fires over and over again, and don’t get to work on new or interesting 
stuff Also, what we do is really hard to transfer to other people because it’s 
based on experience. 

Over the past year my team and I have been working to overcome that gap, 
creating an assistant that’s able to scale some of this knowledge. We’ve got it 
to the point where it’s able to classify known root causes for an outage or an 
SLA breach in Cassandra with an accuracy greater than 90%. It can accurately 
diagnose bugs, data-modeling issues, or misuse of certain features and when it 
does give you specific remediation steps with links to knowledge base articles. 

 

We think we’ve seeded our database with enough root causes that it’ll catch the 
vast majority of issues but there is always the possibility that we’ll run into 
something previously unknown like CASSANDRA-11170 (one of the issues our system 
found in the wild).

We’re looking for feedback and would like to know if anyone is interested in 
giving the product a trial. The process would be a collaboration, where we both 
get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump



Re: Looking for feedback on automated root-cause system

2019-02-19 Thread daemeon reiydelle
Welcome to the world of testing predictive analytics. I will pass this on
to my folks at Accenture, know of a couple of C* clients we run, wondering
what you had in mind?


*Daemeon C.M. Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype
daemeon.c.m.reiydelle*



On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

> Howdy,
>
> I’ve been engaged in the Cassandra user community for a long time, almost
> 8 years, and have worked on hundreds of Cassandra deployments. One of the
> things I’ve noticed in myself and a lot of my peers that have done
> consulting, support or worked on really big deployments is that we get
> burnt out. We fight a lot of the same fires over and over again, and don’t
> get to work on new or interesting stuff. Also, what we do is really hard to
> transfer to other people because it’s based on experience.
>
> Over the past year my team and I have been working to overcome that gap,
> creating an assistant that’s able to scale some of this knowledge. We’ve
> got it to the point where it’s able to classify known root causes for an
> outage or an SLA breach in Cassandra with an accuracy greater than 90%. It
> can accurately diagnose bugs, data-modeling issues, or misuse of certain
> features and when it does give you specific remediation steps with links to
> knowledge base articles.
>
> We think we’ve seeded our database with enough root causes that it’ll
> catch the vast majority of issues but there is always the possibility that
> we’ll run into something previously unknown like CASSANDRA-11170 (one of
> the issues our system found in the wild).
>
> We’re looking for feedback and would like to know if anyone is interested
> in giving the product a trial. The process would be a collaboration, where
> we both get to learn from each other and improve how we’re doing things.
>
> Thanks,
> Matt Stump
>
>


Looking for feedback on automated root-cause system

2019-02-19 Thread Matthew Stump
Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8
years, and have worked on hundreds of Cassandra deployments. One of the
things I’ve noticed in myself and a lot of my peers that have done
consulting, support or worked on really big deployments is that we get
burnt out. We fight a lot of the same fires over and over again, and don’t
get to work on new or interesting stuff. Also, what we do is really hard to
transfer to other people because it’s based on experience.

Over the past year my team and I have been working to overcome that gap,
creating an assistant that’s able to scale some of this knowledge. We’ve
got it to the point where it’s able to classify known root causes for an
outage or an SLA breach in Cassandra with an accuracy greater than 90%. It
can accurately diagnose bugs, data-modeling issues, or misuse of certain
features and when it does give you specific remediation steps with links to
knowledge base articles.

We think we’ve seeded our database with enough root causes that it’ll catch
the vast majority of issues but there is always the possibility that we’ll
run into something previously unknown like CASSANDRA-11170 (one of the
issues our system found in the wild).

We’re looking for feedback and would like to know if anyone is interested
in giving the product a trial. The process would be a collaboration, where
we both get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump


Re: Restore a table with dropped columns to a new cluster fails

2019-02-19 Thread Jeff Jirsa
You can also manually add the dropped column to the appropriate table to
eliminate the issue. Has to be done by a human, a new cluster would have no
way of learning about a dropped column, and the missing metadata cannot be
inferred.


On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims  wrote:

> When a snapshot is taken, it includes a "schema.cql" file.  That should be
> sufficient to restore whatever you need to restore.  I'd argue that neither
> automatically resurrecting a dropped table nor silently failing to restore
> it is a good behavior, so it's not unreasonable to have the user re-create
> the table then choose if they want to re-drop it.
>
>
> On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger  wrote:
>
>> Hi,
>>
>> I would like to bring this issue to your attention.
>>
>> Link to the ticket:
>> https://issues.apache.org/jira/browse/CASSANDRA-14336
>>
>> Basically if a table contains dropped columns and you try to restore a
>> snapshot to a new cluster, that will fail because of an error like
>> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>>
>> I feel this is quite serious problem for backup and restore functionality
>> of Cassandra. You cannot restore a backup to a new cluster if columns have
>> been dropped.
>>
>> There have been other similar tickets that have been apparently closed
>> but based on my test with 3.11.4, the issue still persists.
>>
>> Best Regards,
>> Hannu Kröger
>>
>


Re: Restore a table with dropped columns to a new cluster fails

2019-02-19 Thread Elliott Sims
When a snapshot is taken, it includes a "schema.cql" file.  That should be
sufficient to restore whatever you need to restore.  I'd argue that neither
automatically resurrecting a dropped table nor silently failing to restore
it is a good behavior, so it's not unreasonable to have the user re-create
the table then choose if they want to re-drop it.


On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger  wrote:

> Hi,
>
> I would like to bring this issue to your attention.
>
> Link to the ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-14336
>
> Basically if a table contains dropped columns and you try to restore a
> snapshot to a new cluster, that will fail because of an error like
> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>
> I feel this is quite serious problem for backup and restore functionality
> of Cassandra. You cannot restore a backup to a new cluster if columns have
> been dropped.
>
> There have been other similar tickets that have been apparently closed but
> based on my test with 3.11.4, the issue still persists.
>
> Best Regards,
> Hannu Kröger
>


Restore a table with dropped columns to a new cluster fails

2019-02-19 Thread Hannu Kröger
Hi,

I would like to bring this issue to your attention.

Link to the ticket:
https://issues.apache.org/jira/browse/CASSANDRA-14336 


Basically if a table contains dropped columns and you try to restore a snapshot 
to a new cluster, that will fail because of an error like 
"java.lang.RuntimeException: Unknown column XXX during deserialization”.

I feel this is quite serious problem for backup and restore functionality of 
Cassandra. You cannot restore a backup to a new cluster if columns have been 
dropped.

There have been other similar tickets that have been apparently closed but 
based on my test with 3.11.4, the issue still persists.

Best Regards,
Hannu Kröger