Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-16 Thread Ahmed Eljami
The issue is fixed with nodetool scrub, now both rows are under the same
clustering.

I'll open a jira to analyze the source of this issue with Cassandra 3.11.3

Thanks.

Le jeu. 16 mai 2019 à 04:53, Jeff Jirsa  a écrit :

> I don’t have a good answer for you - I don’t know if scrub will fix this
> (you could copy an sstable offline and try it locally in ccm) - you may
> need to delete and reinsert, though I’m really interested in knowing how
> this happened if you weren’t ever exposed to #14008.
>
> Can you open a JIRA? If your sstables aren’t especially sensitive,
> uploading them would be swell. Otherwise , an anonymized JSON dump may be
> good enough for whichever developer looks at fixing this
>
> --
> Jeff Jirsa
>
>
> On May 15, 2019, at 7:27 PM, Ahmed Eljami  wrote:
>
> Jeff, In this case is there any solution to resolve that directly in the
> sstable (compact, scrub...) or we have to apply a batch on the client level
> (delete a partition and re write it)?
>
> Thank you for your reply.
>
> Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a
> écrit :
>
>> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
>> should not be impacted by this issue ?!
>> thanks
>>
>>


Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Jeff Jirsa
I don’t have a good answer for you - I don’t know if scrub will fix this (you 
could copy an sstable offline and try it locally in ccm) - you may need to 
delete and reinsert, though I’m really interested in knowing how this happened 
if you weren’t ever exposed to #14008. 

Can you open a JIRA? If your sstables aren’t especially sensitive, uploading 
them would be swell. Otherwise , an anonymized JSON dump may be good enough for 
whichever developer looks at fixing this 

-- 
Jeff Jirsa


> On May 15, 2019, at 7:27 PM, Ahmed Eljami  wrote:
> 
> Jeff, In this case is there any solution to resolve that directly in the 
> sstable (compact, scrub...) or we have to apply a batch on the client level 
> (delete a partition and re write it)?
> 
> Thank you for your reply. 
> 
>> Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a écrit :
>> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we 
>> should not be impacted by this issue ?!
>> thanks
>> 


Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
Jeff, In this case is there any solution to resolve that directly in the
sstable (compact, scrub...) or we have to apply a batch on the client level
(delete a partition and re write it)?

Thank you for your reply.

Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a écrit :

> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
> should not be impacted by this issue ?!
> thanks
>
>


Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
 effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
should not be impacted by this issue ?!
thanks


Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Jeff Jirsa
https://issues.apache.org/jira/browse/CASSANDRA-14008

If this was written in 2.1/2.2 and you upgraded to 3.0.x (x < 16) or 
3.1-3.11.1, could be this issue. 

-- 
Jeff Jirsa


> On May 15, 2019, at 8:43 AM, Ahmed Eljami  wrote:
> 
> What about this part of the dump:
> 
> "type" : "row",
> "position" : 4123,
> "clustering" : [ "", "Token", "abcd", "" ],
> "cells" : [
>   { "name" : "dvalue", "value" : "", "tstamp" : 
> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : 
> "2020-04-27T17:20:31Z", "expired" : false } 
> 
> Why we don't have a liveness_info for this row ?
> 
> Thanks
> 
>> Le mer. 15 mai 2019 à 17:40, Ahmed Eljami  a écrit :
>> Hi Sean,
>> Thanks for reply,
>> I'm agree with you about uniquness but when  the output of sstabledump show 
>> that we have the same value for the column g => "clustering" : [ "", 
>> "Token", "abcd", "" ], 
>> and when we select with the whole primary key with the valuers wich I see in 
>> the sstable, cqlsh return 2 rows..
>> 
>>> Le mer. 15 mai 2019 à 17:27, Durity, Sean R  a 
>>> écrit :
>>> Uniqueness is determined by the partition key PLUS the clustering columns. 
>>> Hard to tell from your data below, but is it possible that one of the 
>>> clustering columns (perhaps g) has different values? That would easily 
>>> explain the 2 rows returned – because they ARE different rows in the same 
>>> partition. In your data model, make sure you need all the clustering 
>>> columns to determine uniqueness or you will indeed have more rows than you 
>>> might expect.
>>> 
>>>  
>>> 
>>> Sean Durity
>>> 
>>>  
>>> 
>>>  
>>> 
>>> From: Ahmed Eljami  
>>> Sent: Wednesday, May 15, 2019 10:56 AM
>>> To: user@cassandra.apache.org
>>> Subject: [EXTERNAL] Two separate rows for the same partition !!
>>> 
>>>  
>>> 
>>> Hi guys,
>>> 
>>>  
>>> 
>>> We have a strange problem with the data in cassandra, after inserting twice 
>>> the same partition with differents columns, we see that cassandra returns 2 
>>> rows on cqlsh rather than one...:
>>> 
>>>  
>>> 
>>> a| b| c| d| f| g| h| i| j| k| l
>>> 
>>> --++---+--+---+-++---+--++
>>> 
>>> |bbb|  rrr| | Token | abcd|| False | 
>>> {'expiration': '1557943260838', 'fname': 'WS', 'freshness': 
>>> '1556299239910'} |   null |   null
>>> 
>>> |bbb|  rrr| | Token | abcd||  null |
>>>  null | 
>>>|   null
>>> 
>>>  
>>> 
>>> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>>> 
>>>  
>>> 
>>> On the sstable we have the following data:
>>> 
>>>  
>>> 
>>> [
>>>   {
>>> "partition" : {
>>>   "key" : [ "", "bbb", "rrr" ],
>>>   "position" : 3760
>>> },
>>> "rows" : [
>>>   {
>>> "type" : "range_tombstone_bound",
>>> "start" : {
>>>   "type" : "inclusive",
>>>   "clustering" : [ "", "Token", "abcd", "*" ],
>>>   "deletion_info" : { "marked_deleted" : 
>>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>>> }
>>>   },
>>>   {
>>> "type" : "range_tombstone_bound",
>>> "end" : {
>>>   "type" : "exclusive",
>>>   "clustering" : [ "", "Token", "abcd", "" ],
>>>   "deletion_info" :

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
What about this part of the dump:

"type" : "row",
"position" : 4123,
"clustering" : [ "", "Token", "abcd", "" ],
"cells" : [
  { "name" : "dvalue", "value" : "", "tstamp" :
"2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" :
"2020-04-27T17:20:31Z", "expired" : false }

Why we don't have a *liveness_info* for this row ?

Thanks

Le mer. 15 mai 2019 à 17:40, Ahmed Eljami  a écrit :

> Hi Sean,
> Thanks for reply,
> I'm agree with you about uniquness but when  the output of sstabledump
> show that we have the same value for the column g => "clustering" : [
> "", "Token", "abcd", "" ],
> and when we select with the whole primary key with the valuers wich I see
> in the sstable, cqlsh return 2 rows..
>
> Le mer. 15 mai 2019 à 17:27, Durity, Sean R 
> a écrit :
>
>> Uniqueness is determined by the partition key PLUS the clustering
>> columns. Hard to tell from your data below, but is it possible that one of
>> the clustering columns (perhaps g) has different values? That would easily
>> explain the 2 rows returned – because they ARE different rows in the same
>> partition. In your data model, make sure you need all the clustering
>> columns to determine uniqueness or you will indeed have more rows than you
>> might expect.
>>
>>
>>
>> Sean Durity
>>
>>
>>
>>
>>
>> *From:* Ahmed Eljami 
>> *Sent:* Wednesday, May 15, 2019 10:56 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Two separate rows for the same partition !!
>>
>>
>>
>> Hi guys,
>>
>>
>>
>> We have a strange problem with the data in cassandra, after inserting
>> twice the same partition with differents columns, we see that cassandra
>> returns 2 rows on cqlsh rather than one...:
>>
>>
>>
>> a| b| c| d| f| g| h| i| j| k| l
>>
>>
>> --++---+--+---+-++---+--++
>>
>> |bbb|  rrr| | Token | abcd|| False |
>> {'expiration': '1557943260838', 'fname': 'WS', 'freshness':
>> '1556299239910'} |   null |   null
>>
>> |bbb|  rrr| | Token | abcd||  null |
>>
>> null ||   null
>>
>>
>>
>> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>>
>>
>>
>> On the sstable we have the following data:
>>
>>
>>
>> [
>>   {
>> "partition" : {
>>   "key" : [ "", "bbb", "rrr" ],
>>   "position" : 3760
>> },
>> "rows" : [
>>   {
>> "type" : "range_tombstone_bound",
>> "start" : {
>>   "type" : "inclusive",
>>   "clustering" : [ "", "Token", "abcd", "*" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>> }
>>   },
>>   {
>> "type" : "range_tombstone_bound",
>> "end" : {
>>   "type" : "exclusive",
>>   "clustering" : [ "", "Token", "abcd", "" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>> }
>>   },
>>   {
>> "type" : "row",
>> "position" : 3974,
>> "clustering" : [ "", "Token", "abcd", "" ],
>> "liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl"
>> : 31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
>> "cells" : [
>>   { "name" : "connected", "value" : false },
>>   { "

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
Hi Sean,
Thanks for reply,
I'm agree with you about uniquness but when  the output of sstabledump show
that we have the same value for the column g => "clustering" : [ "",
"Token", "abcd", "" ],
and when we select with the whole primary key with the valuers wich I see
in the sstable, cqlsh return 2 rows..

Le mer. 15 mai 2019 à 17:27, Durity, Sean R  a
écrit :

> Uniqueness is determined by the partition key PLUS the clustering columns.
> Hard to tell from your data below, but is it possible that one of the
> clustering columns (perhaps g) has different values? That would easily
> explain the 2 rows returned – because they ARE different rows in the same
> partition. In your data model, make sure you need all the clustering
> columns to determine uniqueness or you will indeed have more rows than you
> might expect.
>
>
>
> Sean Durity
>
>
>
>
>
> *From:* Ahmed Eljami 
> *Sent:* Wednesday, May 15, 2019 10:56 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Two separate rows for the same partition !!
>
>
>
> Hi guys,
>
>
>
> We have a strange problem with the data in cassandra, after inserting
> twice the same partition with differents columns, we see that cassandra
> returns 2 rows on cqlsh rather than one...:
>
>
>
> a| b| c| d| f| g| h| i| j| k| l
>
>
> --++---+--+---+-++---+--++
>
> |bbb|  rrr| | Token | abcd|| False |
> {'expiration': '1557943260838', 'fname': 'WS', 'freshness':
> '1556299239910'} |   null |   null
>
> |bbb|  rrr| | Token | abcd||  null |
>
> null ||   null
>
>
>
> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>
>
>
> On the sstable we have the following data:
>
>
>
> [
>   {
> "partition" : {
>   "key" : [ "", "bbb", "rrr" ],
>   "position" : 3760
> },
> "rows" : [
>   {
> "type" : "range_tombstone_bound",
> "start" : {
>   "type" : "inclusive",
>   "clustering" : [ "", "Token", "abcd", "*" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   },
>   {
> "type" : "range_tombstone_bound",
> "end" : {
>   "type" : "exclusive",
>   "clustering" : [ "", "Token", "abcd", "" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   },
>   {
> "type" : "row",
> "position" : 3974,
> "clustering" : [ "", "Token", "abcd", "" ],
> "liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" :
> 31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
> "cells" : [
>   { "name" : "connected", "value" : false },
>   { "name" : "dattrib", "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z"
> } },
>   { "name" : "dattrib", "path" : [ "expiration" ], "value" :
> "1557943260838" },
>   { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
>   { "name" : "dattrib", "path" : [ "freshness" ], "value" :
> "1556299239910" }
> ]
>   },
>   {
> "type" : "row",
> "position" : 4123,
> "clustering" : [ "", "Token", "abcd", "" ],
> "cells" : [
>   { "name" : "dvalue", "value" : ""

RE: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Durity, Sean R
Uniqueness is determined by the partition key PLUS the clustering columns. Hard 
to tell from your data below, but is it possible that one of the clustering 
columns (perhaps g) has different values? That would easily explain the 2 rows 
returned – because they ARE different rows in the same partition. In your data 
model, make sure you need all the clustering columns to determine uniqueness or 
you will indeed have more rows than you might expect.

Sean Durity


From: Ahmed Eljami 
Sent: Wednesday, May 15, 2019 10:56 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Two separate rows for the same partition !!

Hi guys,

We have a strange problem with the data in cassandra, after inserting twice the 
same partition with differents columns, we see that cassandra returns 2 rows on 
cqlsh rather than one...:

a| b| c| d| f| g| h| i| j| k| l

--++---+--+---+-++---+--++

|bbb|  rrr| | Token | abcd|| False | 
{'expiration': '1557943260838', 'fname': 'WS', 'freshness': '1556299239910'} |  
 null |   null

|bbb|  rrr| | Token | abcd||  null |
 null |
|   null

With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)

On the sstable we have the following data:

[
  {
"partition" : {
  "key" : [ "", "bbb", "rrr" ],
  "position" : 3760
},
"rows" : [
  {
"type" : "range_tombstone_bound",
"start" : {
  "type" : "inclusive",
  "clustering" : [ "", "Token", "abcd", "*" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "range_tombstone_bound",
"end" : {
  "type" : "exclusive",
  "clustering" : [ "", "Token", "abcd", "" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "row",
"position" : 3974,
"clustering" : [ "", "Token", "abcd", "" ],
"liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" : 
31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
"cells" : [
  { "name" : "connected", "value" : false },
  { "name" : "dattrib", "deletion_info" : { "marked_deleted" : 
"2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z" } },
  { "name" : "dattrib", "path" : [ "expiration" ], "value" : 
"1557943260838" },
  { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
  { "name" : "dattrib", "path" : [ "freshness" ], "value" : 
"1556299239910" }
]
  },
  {
"type" : "row",
"position" : 4123,
"clustering" : [ "", "Token", "abcd", "" ],
"cells" : [
  { "name" : "dvalue", "value" : "", "tstamp" : 
"2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : 
"2020-04-27T17:20:31Z", "expired" : false }
]
  },
  {
"type" : "range_tombstone_bound",
"start" : {
  "type" : "exclusive",
  "clustering" : [ "", "Token", "abcd", "" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "range_tombstone_bound",
"end" : {
  "type" : "inclusive",
  "clustering" : [ "&