[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-03-17 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938256#comment-13938256
 ] 

Jeremiah Jordan commented on CASSANDRA-6756:


What ever happens here, I think the default should stay as it is now.  If you 
want this, you would add the flag to your cassandra-env.sh or cassandra.yaml 
(or where ever it get put).

> Provide option to avoid loading orphan SSTables on startup
> --
>
> Key: CASSANDRA-6756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vincent Mallet
> Fix For: 1.2.16
>
>
> When Cassandra starts up, it enumerates all SSTables on disk for a known 
> column family and proceeds to loading all of them, even those that were left 
> behind before the restart because of a problem of some sort. This can lead to 
> "data gain" (resurrected data) which is just as bad as data loss.
> The ask is to provide a yaml config option which would allow one to turn that 
> behavior off by default so a cassandra cluster would be immune to data gain 
> when nodes get restarted (at least with Leveled where Cassandra keeps track 
> of SSTables).
> This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
> extremely nervous that orphan SSTables could appear because of some other 
> potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-03-03 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918338#comment-13918338
 ] 

sankalp kohli commented on CASSANDRA-6756:
--

With this, we will also need an option to load all stables on startup. This 
will be useful in cases where you intentionally drop stables in data directory. 
Also will be useful during restore if system keyspace is not restored. 

> Provide option to avoid loading orphan SSTables on startup
> --
>
> Key: CASSANDRA-6756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vincent Mallet
> Fix For: 1.2.16
>
>
> When Cassandra starts up, it enumerates all SSTables on disk for a known 
> column family and proceeds to loading all of them, even those that were left 
> behind before the restart because of a problem of some sort. This can lead to 
> "data gain" (resurrected data) which is just as bad as data loss.
> The ask is to provide a yaml config option which would allow one to turn that 
> behavior off by default so a cassandra cluster would be immune to data gain 
> when nodes get restarted (at least with Leveled where Cassandra keeps track 
> of SSTables).
> This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
> extremely nervous that orphan SSTables could appear because of some other 
> potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-03-03 Thread Vincent Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918313#comment-13918313
 ] 

Vincent Mallet commented on CASSANDRA-6756:
---

+1 on the lost+found idea.
Btw we're trying to analyze the source of some of these SSTables we're finding 
in some clusters and there seems to be other causes than failed repairs (in 
1.1) (OOM, problem with compaction, etc; still investigating). Having that 
option would make us sleep better at night.



> Provide option to avoid loading orphan SSTables on startup
> --
>
> Key: CASSANDRA-6756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vincent Mallet
> Fix For: 1.2.16
>
>
> When Cassandra starts up, it enumerates all SSTables on disk for a known 
> column family and proceeds to loading all of them, even those that were left 
> behind before the restart because of a problem of some sort. This can lead to 
> "data gain" (resurrected data) which is just as bad as data loss.
> The ask is to provide a yaml config option which would allow one to turn that 
> behavior off by default so a cassandra cluster would be immune to data gain 
> when nodes get restarted (at least with Leveled where Cassandra keeps track 
> of SSTables).
> This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
> extremely nervous that orphan SSTables could appear because of some other 
> potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-02-28 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916409#comment-13916409
 ] 

sankalp kohli commented on CASSANDRA-6756:
--

[~rcoli] I think 1) should be part of this JIRA. 

> Provide option to avoid loading orphan SSTables on startup
> --
>
> Key: CASSANDRA-6756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vincent Mallet
> Fix For: 1.2.16
>
>
> When Cassandra starts up, it enumerates all SSTables on disk for a known 
> column family and proceeds to loading all of them, even those that were left 
> behind before the restart because of a problem of some sort. This can lead to 
> "data gain" (resurrected data) which is just as bad as data loss.
> The ask is to provide a yaml config option which would allow one to turn that 
> behavior off by default so a cassandra cluster would be immune to data gain 
> when nodes get restarted (at least with Leveled where Cassandra keeps track 
> of SSTables).
> This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
> extremely nervous that orphan SSTables could appear because of some other 
> potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-02-27 Thread Robert Coli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914895#comment-13914895
 ] 

Robert Coli commented on CASSANDRA-6756:


The behavior of Cassandra loading data files which happen to be in the data dir 
is useful in various cases, but most of those cases would be addressed fine 
with a safe version of "refresh." As a configurable option (with a log about 
the unexpected files?) this ticket seems reasonable as a protection against 
unintentional data gain... except that leaving these files in the data dir in a 
non-live state makes them susceptible to being silently overwritten by 
Cassandra.

CASSANDRA-6719 (CASSANDRA-6245 / CASSANDRA-6514) is about a certain case where 
non-live files end up in the data directory, but this ticket suggests that 
there is a more general issue. I would probably be fine if, given the proposed 
option, the non-live SSTables were moved to a "lost+found" directory so that 
they are protected from being silently overwritten by flush. 

The simplest solution to preventing silent overwriting of accidentally dead 
SSTables in the data directory would seem to be to check for the existence of a 
file with a given name at flush time, and to increment the sequence until such 
a file does not exist..

Should I file a JIRA for either or both of :

1) move orphan SSTables to lost+found directory on startup? (or might this be 
that ticket?)
2) check for existence of SSTables before flushing?

> Provide option to avoid loading orphan SSTables on startup
> --
>
> Key: CASSANDRA-6756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vincent Mallet
> Fix For: 1.2.16
>
>
> When Cassandra starts up, it enumerates all SSTables on disk for a known 
> column family and proceeds to loading all of them, even those that were left 
> behind before the restart because of a problem of some sort. This can lead to 
> "data gain" (resurrected data) which is just as bad as data loss.
> The ask is to provide a yaml config option which would allow one to turn that 
> behavior off by default so a cassandra cluster would be immune to data gain 
> when nodes get restarted (at least with Leveled where Cassandra keeps track 
> of SSTables).
> This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
> extremely nervous that orphan SSTables could appear because of some other 
> potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-02-24 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911146#comment-13911146
 ] 

sankalp kohli commented on CASSANDRA-6756:
--

Since json manifest is out, we can store the stable number in system table 
before making it live. That way we can avoid picking any stable that is not 
suppose to be live. 

> Provide option to avoid loading orphan SSTables on startup
> --
>
> Key: CASSANDRA-6756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vincent Mallet
> Fix For: 1.2.16
>
>
> When Cassandra starts up, it enumerates all SSTables on disk for a known 
> column family and proceeds to loading all of them, even those that were left 
> behind before the restart because of a problem of some sort. This can lead to 
> "data gain" (resurrected data) which is just as bad as data loss.
> The ask is to provide a yaml config option which would allow one to turn that 
> behavior off by default so a cassandra cluster would be immune to data gain 
> when nodes get restarted (at least with Leveled where Cassandra keeps track 
> of SSTables).
> This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
> extremely nervous that orphan SSTables could appear because of some other 
> potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-02-23 Thread Vincent Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910074#comment-13910074
 ] 

Vincent Mallet commented on CASSANDRA-6756:
---

Any kind, really. The stalled repair problem hit us pretty massively on a 
recent cluster bounce, and I'm thinking "who knows what other problem or other 
bug is going to leave orphan SSTables behind". Fair enough there shouldn't be 
any, but the day there are it's not worth us paying the price of zombie data. 
We're also thinking of grabbing that patch and porting it to 1.1 while we're on 
it until we migrate to 1.2. The default behavior of sucking in any SSTables 
that are laying around is just making us very nervous.

Hope that makes sense, thanks.


> Provide option to avoid loading orphan SSTables on startup
> --
>
> Key: CASSANDRA-6756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vincent Mallet
> Fix For: 1.2.16
>
>
> When Cassandra starts up, it enumerates all SSTables on disk for a known 
> column family and proceeds to loading all of them, even those that were left 
> behind before the restart because of a problem of some sort. This can lead to 
> "data gain" (resurrected data) which is just as bad as data loss.
> The ask is to provide a yaml config option which would allow one to turn that 
> behavior off by default so a cassandra cluster would be immune to data gain 
> when nodes get restarted (at least with Leveled where Cassandra keeps track 
> of SSTables).
> This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
> extremely nervous that orphan SSTables could appear because of some other 
> potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-02-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909904#comment-13909904
 ] 

Jonathan Ellis commented on CASSANDRA-6756:
---

What kind of orphan sstables are you thinking of?  We already record 
compactions-in-progress so we can clean out source files that we didn't get to 
remove before restart.

> Provide option to avoid loading orphan SSTables on startup
> --
>
> Key: CASSANDRA-6756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vincent Mallet
> Fix For: 1.2.16
>
>
> When Cassandra starts up, it enumerates all SSTables on disk for a known 
> column family and proceeds to loading all of them, even those that were left 
> behind before the restart because of a problem of some sort. This can lead to 
> "data gain" (resurrected data) which is just as bad as data loss.
> The ask is to provide a yaml config option which would allow one to turn that 
> behavior off by default so a cassandra cluster would be immune to data gain 
> when nodes get restarted (at least with Leveled where Cassandra keeps track 
> of SSTables).
> This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
> extremely nervous that orphan SSTables could appear because of some other 
> potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)