Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2014-02-04 Thread olek.stas...@gmail.com
I don't know what is the real cause of my problem. We are still guessing.
All operations I have done one cluster are described on timeline:
1.1.7- 1.2.10 - upgradesstable - 2.0.2 - normal operations -2.0.3
- normal operations - now
normal operations means reads/writes/repairs.
Could you please, describe briefly how to recover data? I have a
problem with scenario described under link:
http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html ,
I can't apply this solution to my case.
regards
Olek

2014-02-03 Robert Coli rc...@eventbrite.com:
 On Mon, Feb 3, 2014 at 2:17 PM, olek.stas...@gmail.com
 olek.stas...@gmail.com wrote:

 No, i've done repair after upgrade sstables. In fact it was about 4
 weeks after, because of bug:


 If you only did a repair after you upgraded SSTables, when did you have an
 opportunity to hit :

 https://issues.apache.org/jira/browse/CASSANDRA-6527

 ... which relies on you having multiple versions of SStables while
 streaming?

 Did you do any operation which involves streaming? (Add/Remove/Replace a
 node?)

 =Rob



Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2014-02-04 Thread Robert Coli
On Tue, Feb 4, 2014 at 12:21 AM, olek.stas...@gmail.com 
olek.stas...@gmail.com wrote:

 I don't know what is the real cause of my problem. We are still guessing.
 All operations I have done one cluster are described on timeline:
 1.1.7- 1.2.10 - upgradesstable - 2.0.2 - normal operations -2.0.3
 - normal operations - now
 normal operations means reads/writes/repairs.
 Could you please, describe briefly how to recover data? I have a
 problem with scenario described under link:

 http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html,
 I can't apply this solution to my case.


I think your only option is the following :

1) determine which SSTables contain rows have doomstones (tombstones from
the far future)
2) determine whether these tombstones mask a live or dead version of the
row, by looking at other row fragments
3) dump/filter/re-write all your data via some method, probably
sstable2json/json2sstable
4) load the corrected sstables by starting a node with the sstables in the
data directory

I understand you have a lot of data, but I am pretty sure there is no way
for you to fix it within Cassandra. Perhaps ask for advice on the JIRA
ticket mentioned upthread if this answer is not sufficient?

=Rob


Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2014-02-04 Thread olek.stas...@gmail.com
Seems good. I'll discus it with data owners and we choose the best method.
Best regards,
Aleksander
4 lut 2014 19:40 Robert Coli rc...@eventbrite.com napisaƂ(a):

 On Tue, Feb 4, 2014 at 12:21 AM, olek.stas...@gmail.com 
 olek.stas...@gmail.com wrote:

 I don't know what is the real cause of my problem. We are still guessing.
 All operations I have done one cluster are described on timeline:
 1.1.7- 1.2.10 - upgradesstable - 2.0.2 - normal operations -2.0.3
 - normal operations - now
 normal operations means reads/writes/repairs.
 Could you please, describe briefly how to recover data? I have a
 problem with scenario described under link:

 http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html,
 I can't apply this solution to my case.


 I think your only option is the following :

 1) determine which SSTables contain rows have doomstones (tombstones from
 the far future)
 2) determine whether these tombstones mask a live or dead version of the
 row, by looking at other row fragments
 3) dump/filter/re-write all your data via some method, probably
 sstable2json/json2sstable
 4) load the corrected sstables by starting a node with the sstables in the
 data directory

 I understand you have a lot of data, but I am pretty sure there is no way
 for you to fix it within Cassandra. Perhaps ask for advice on the JIRA
 ticket mentioned upthread if this answer is not sufficient?

 =Rob




Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2014-02-03 Thread olek.stas...@gmail.com
Hi All,
We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
1.2.10). Probably after upgradesstable  (but it's only a guess,
because we noticed problem few weeks later), some rows became
tombstoned. They just disappear from results of queries. After
inverstigation I've noticed, that they are reachable via sstable2json.
Example output for non-existent row:

{key: 6e6e37716c6d665f6f61695f6463,metadata: {deletionInfo:
{markedForDeleteAt:2201170739199,localDeletionTime:0}},columns:
[[DATA,3c6f61695f64633a64(...),1357677928108]]}
]

If I understand correctly row is marked as deleted with timestamp in
the far future, but it's still on the disk. Also localDeletionTime is
set to 0, which may means, that it's kind of internal bug, not effect
of client error. So my question is: is it true, that upgradesstable
may do soemthing like that? How to find reasons for such strange
cassandra behaviour? Is there any option of recovering such strange
marked nodes?
This problem touches about 500K rows of all 14M in our database, so
the percentage is quite big.
best regards
Aleksander

2013-12-12 Robert Coli rc...@eventbrite.com:
 On Wed, Dec 11, 2013 at 6:27 AM, Mathijs Vogelzang math...@apptornado.com
 wrote:

 When I use sstable2json on the sstable on the destination cluster, it has
 metadata: {deletionInfo:
 {markedForDeleteAt:1796952039620607,localDeletionTime:0}}, whereas
 it doesn't have that in the source sstable.
 (Yes, this is a timestamp far into the future. All our hosts are
 properly synced through ntp).


 This seems like a bug in sstableloader, I would report it on JIRA.


 Naturally, copying the data again doesn't work to fix it, as the
 tombstone is far in the future. Apart from not having this happen at
 all, how can it be fixed?


 Briefly, you'll want to purge that tombstone and then reload the data with a
 reasonable timestamp.

 Dealing with rows with data (and tombstones) in the far future is described
 in detail here :

 http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html

 =Rob



Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2014-02-03 Thread Yuki Morishita
if you are using  2.0.4, then you are hitting
https://issues.apache.org/jira/browse/CASSANDRA-6527


On Mon, Feb 3, 2014 at 2:51 AM, olek.stas...@gmail.com
olek.stas...@gmail.com wrote:
 Hi All,
 We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
 1.2.10). Probably after upgradesstable  (but it's only a guess,
 because we noticed problem few weeks later), some rows became
 tombstoned. They just disappear from results of queries. After
 inverstigation I've noticed, that they are reachable via sstable2json.
 Example output for non-existent row:

 {key: 6e6e37716c6d665f6f61695f6463,metadata: {deletionInfo:
 {markedForDeleteAt:2201170739199,localDeletionTime:0}},columns:
 [[DATA,3c6f61695f64633a64(...),1357677928108]]}
 ]

 If I understand correctly row is marked as deleted with timestamp in
 the far future, but it's still on the disk. Also localDeletionTime is
 set to 0, which may means, that it's kind of internal bug, not effect
 of client error. So my question is: is it true, that upgradesstable
 may do soemthing like that? How to find reasons for such strange
 cassandra behaviour? Is there any option of recovering such strange
 marked nodes?
 This problem touches about 500K rows of all 14M in our database, so
 the percentage is quite big.
 best regards
 Aleksander

 2013-12-12 Robert Coli rc...@eventbrite.com:
 On Wed, Dec 11, 2013 at 6:27 AM, Mathijs Vogelzang math...@apptornado.com
 wrote:

 When I use sstable2json on the sstable on the destination cluster, it has
 metadata: {deletionInfo:
 {markedForDeleteAt:1796952039620607,localDeletionTime:0}}, whereas
 it doesn't have that in the source sstable.
 (Yes, this is a timestamp far into the future. All our hosts are
 properly synced through ntp).


 This seems like a bug in sstableloader, I would report it on JIRA.


 Naturally, copying the data again doesn't work to fix it, as the
 tombstone is far in the future. Apart from not having this happen at
 all, how can it be fixed?


 Briefly, you'll want to purge that tombstone and then reload the data with a
 reasonable timestamp.

 Dealing with rows with data (and tombstones) in the far future is described
 in detail here :

 http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html

 =Rob




-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2014-02-03 Thread olek.stas...@gmail.com
Yes, I haven't run sstableloader. The data loss apperared somwhere on the line:
1.1.7- 1.2.10 - upgradesstable - 2.0.2 - normal operations -2.0.3
normal operations - now
Today I've noticed that oldest files with broken values appear during
repair (we do repair once a week on each node). Maybe it's the repair
operation, which caused data loss? I've no idea. Currently our cluster
is runing 2.0.3 version.
We can do some tests on data to give you all info to track the bug.
But our most crucial question is: can we recover loss, or should we
start to think how to re-gather them?
best regards
Aleksander
ps. I like your link Rob, i'll pin it over my desk ;) In Oracle there
were a rule: never deploy RDBMS before release 2 ;)

2014-02-03 Robert Coli rc...@eventbrite.com:
 On Mon, Feb 3, 2014 at 12:51 AM, olek.stas...@gmail.com
 olek.stas...@gmail.com wrote:

 We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
 1.2.10). Probably after upgradesstable  (but it's only a guess,
 because we noticed problem few weeks later), some rows became
 tombstoned.


 To be clear, you didn't run SSTableloader at all? If so, this is the
 hypothetical case where normal streaming operations (replacing a node? what
 streaming did you do?) results in data loss...

 Also, CASSANDRA-6527 is a good reminder regarding the following :

 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

 =Rob


Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2014-02-03 Thread olek.stas...@gmail.com
2014-02-03 Robert Coli rc...@eventbrite.com:
 On Mon, Feb 3, 2014 at 1:02 PM, olek.stas...@gmail.com
 olek.stas...@gmail.com wrote:

 Today I've noticed that oldest files with broken values appear during
 repair (we do repair once a week on each node). Maybe it's the repair
 operation, which caused data loss?


 Yes, unless you added or removed or replaced nodes, it would have to be the
 repair operation, which streams SSTables. Did you run the repair during the
 upgradesstables?

No, i've done repair after upgrade sstables. In fact it was about 4
weeks after, because of bug:
https://issues.apache.org/jira/browse/CASSANDRA-6277. We upgrded cass
to 2.0.2 and then after ca 1 month to 2.0.3 because of 6277. Then we
were able to do repair, so I set up cron to do it weekly on each node.
(it was about 10 dec 2013) the loss was discovered about new year's
eve.



 I've no idea. Currently our cluster
 is runing 2.0.3 version.


 2.0.3 has serious bugs, upgrade to 2.0.4 ASAP.
OK


 But our most crucial question is: can we recover loss, or should we
 start to think how to re-gather them?


 If I were you, I would do the latter. You can to some extent recover them
 via manual processes dumping with sstable2json and so forth, but it will be
 quite painful.

 http://thelastpickle.com/2011/12/15/Anatomy-of-a-Cassandra-Partition/

 Contains an explanation of how one could deal with it.
Sorry, but I have to admit, that i can't transfer this solution to my
problem. Could you briefly describe steps I should perform to recover?
best regards
Aleksander


 =Rob




 best regards
 Aleksander
 ps. I like your link Rob, i'll pin it over my desk ;) In Oracle there
 were a rule: never deploy RDBMS before release 2 ;)

 2014-02-03 Robert Coli rc...@eventbrite.com:
  On Mon, Feb 3, 2014 at 12:51 AM, olek.stas...@gmail.com
  olek.stas...@gmail.com wrote:
 
  We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
  1.2.10). Probably after upgradesstable  (but it's only a guess,
  because we noticed problem few weeks later), some rows became
  tombstoned.
 
 
  To be clear, you didn't run SSTableloader at all? If so, this is the
  hypothetical case where normal streaming operations (replacing a node?
  what
  streaming did you do?) results in data loss...
 
  Also, CASSANDRA-6527 is a good reminder regarding the following :
 
 
  https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
 
  =Rob




Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2014-02-03 Thread Robert Coli
On Mon, Feb 3, 2014 at 2:17 PM, olek.stas...@gmail.com 
olek.stas...@gmail.com wrote:

 No, i've done repair after upgrade sstables. In fact it was about 4
 weeks after, because of bug:


If you only did a repair after you upgraded SSTables, when did you have an
opportunity to hit :

https://issues.apache.org/jira/browse/CASSANDRA-6527

... which relies on you having multiple versions of SStables while
streaming?

Did you do any operation which involves streaming? (Add/Remove/Replace a
node?)

=Rob


Data tombstoned during bulk loading 1.2.10 - 2.0.3

2013-12-11 Thread Mathijs Vogelzang
Hi all,

We're running into a weird problem trying to migrate our data from a
1.2.10 cluster to a 2.0.3 one.

I've taken a snapshot on the old cluster, and for each host there, I'm running
sstableloader -d host of new cluster KEYSPACE/COLUMNFAMILY
(the sstableloader process from the 2.0.3 distribution, the one from
1.2.10 only gets java.lang.RuntimeException: java.io.IOException:
Connection reset by peer)

it then copies the data successfully but when checking the data i
noticed some rows seem to be missing. It turned out the data is not
missing, but has been tombstoned.
When I use sstable2json on the sstable on the destination cluster, it has
metadata: {deletionInfo:
{markedForDeleteAt:1796952039620607,localDeletionTime:0}}, whereas
it doesn't have that in the source sstable.
(Yes, this is a timestamp far into the future. All our hosts are
properly synced through ntp).

This has happened for a bunch of random rows. How is this possible?
Naturally, copying the data again doesn't work to fix it, as the
tombstone is far in the future. Apart from not having this happen at
all, how can it be fixed?

Best regards,

Mathijs


Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3

2013-12-11 Thread Robert Coli
On Wed, Dec 11, 2013 at 6:27 AM, Mathijs Vogelzang
math...@apptornado.comwrote:

 When I use sstable2json on the sstable on the destination cluster, it has
 metadata: {deletionInfo:
 {markedForDeleteAt:1796952039620607,localDeletionTime:0}}, whereas
 it doesn't have that in the source sstable.
 (Yes, this is a timestamp far into the future. All our hosts are
 properly synced through ntp).


This seems like a bug in sstableloader, I would report it on JIRA.


 Naturally, copying the data again doesn't work to fix it, as the
 tombstone is far in the future. Apart from not having this happen at
 all, how can it be fixed?


Briefly, you'll want to purge that tombstone and then reload the data with
a reasonable timestamp.

Dealing with rows with data (and tombstones) in the far future is described
in detail here :

http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html

=Rob