[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-7145: --- Component/s: Compaction > FileNotFoundException during compaction > --- > > Key: CASSANDRA-7145 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7145 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), > Java 1.7.0_55 >Reporter: PJ >Assignee: Marcus Eriksson > Fix For: 1.2.19, 2.0.11, 2.1.0 > > Attachments: > 0001-avoid-marking-compacted-sstables-as-compacting.patch, compaction - > FileNotFoundException.txt, repair - RuntimeException.txt, startup - > AssertionError.txt > > > I can't finish any compaction because my nodes always throw a > "FileNotFoundException". I've already tried the following but nothing helped: > 1. nodetool flush > 2. nodetool repair (ends with RuntimeException; see attachment) > 3. node restart (via dse cassandra-stop) > Whenever I restart the nodes, another type of exception is logged (see > attachment) somewhere near the end of startup process. This particular > exception doesn't seem to be critical because the nodes still manage to > finish the startup and become online. > I don't have specific steps to reproduce the problem that I'm experiencing > with compaction and repair. I'm in the middle of migrating 4.8 billion rows > from MySQL via SSTableLoader. > Some things that may or may not be relevant: > 1. I didn't drop and recreate the keyspace (so probably not related to > CASSANDRA-4857) > 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch > reaches 100% total progress (i.e. starts to build secondary index), I kill > the sstableloader process and cancel the index build > 3. I restart the nodes occasionally. It's possible that there is an on-going > compaction during one of those restarts. > Related StackOverflow question (mine): > http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-7145: --- Attachment: 0001-avoid-marking-compacted-sstables-as-compacting.patch If we have a situation where this happens (in sequence); # We ask LeveledManifest for a new CompactionCandidate # LCS returns a CompactionCandidate containing sstables marked as compacting (a bug) # The compaction that held one of the sstables we marked in #2 finishes and removes the files that were included in the compaction # We successfully mark the compacted sstable as compacting (it is no longer marked as compacting in the View) # FileNotFoundException once we start trying to compact Attached patch * removes a case in LCS where we could return compacting sstables in a CompactionCandidate * makes sure we can't mark compacted sstables as compacting It would be much appreciated if anyone that can reproduce this could try with the attached patch to see if the problem goes away. > FileNotFoundException during compaction > --- > > Key: CASSANDRA-7145 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7145 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), > Java 1.7.0_55 >Reporter: PJ >Assignee: Marcus Eriksson > Fix For: 2.0.10 > > Attachments: > 0001-avoid-marking-compacted-sstables-as-compacting.patch, compaction - > FileNotFoundException.txt, repair - RuntimeException.txt, startup - > AssertionError.txt > > > I can't finish any compaction because my nodes always throw a > "FileNotFoundException". I've already tried the following but nothing helped: > 1. nodetool flush > 2. nodetool repair (ends with RuntimeException; see attachment) > 3. node restart (via dse cassandra-stop) > Whenever I restart the nodes, another type of exception is logged (see > attachment) somewhere near the end of startup process. This particular > exception doesn't seem to be critical because the nodes still manage to > finish the startup and become online. > I don't have specific steps to reproduce the problem that I'm experiencing > with compaction and repair. I'm in the middle of migrating 4.8 billion rows > from MySQL via SSTableLoader. > Some things that may or may not be relevant: > 1. I didn't drop and recreate the keyspace (so probably not related to > CASSANDRA-4857) > 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch > reaches 100% total progress (i.e. starts to build secondary index), I kill > the sstableloader process and cancel the index build > 3. I restart the nodes occasionally. It's possible that there is an on-going > compaction during one of those restarts. > Related StackOverflow question (mine): > http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-7145: -- Priority: Major (was: Blocker) I'm really going to need more to troubleshoot this effectively. # How did your cluster get into this state? # Can you reproduce starting from a non-broken state? # Does it still happen on 2.0.8? > FileNotFoundException during compaction > --- > > Key: CASSANDRA-7145 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7145 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), > Java 1.7.0_55 >Reporter: PJ > Attachments: compaction - FileNotFoundException.txt, repair - > RuntimeException.txt, startup - AssertionError.txt > > > I can't finish any compaction because my nodes always throw a > "FileNotFoundException". I've already tried the following but nothing helped: > 1. nodetool flush > 2. nodetool repair (ends with RuntimeException; see attachment) > 3. node restart (via dse cassandra-stop) > Whenever I restart the nodes, another type of exception is logged (see > attachment) somewhere near the end of startup process. This particular > exception doesn't seem to be critical because the nodes still manage to > finish the startup and become online. > I don't have specific steps to reproduce the problem that I'm experiencing > with compaction and repair. I'm in the middle of migrating 4.8 billion rows > from MySQL via SSTableLoader. > Some things that may or may not be relevant: > 1. I didn't drop and recreate the keyspace (so probably not related to > CASSANDRA-4857) > 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch > reaches 100% total progress (i.e. starts to build secondary index), I kill > the sstableloader process and cancel the index build > 3. I restart the nodes occasionally. It's possible that there is an on-going > compaction during one of those restarts. > Related StackOverflow question (mine): > http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ updated CASSANDRA-7145: -- Description: I can't finish any compaction because my nodes always throw a "FileNotFoundException". I've already tried the following but nothing helped: 1. nodetool flush 2. nodetool repair (ends with RuntimeException; see attachment) 3. node restart (via dse cassandra-stop) Whenever I restart the nodes, another type of exception is logged (see attachment) somewhere near the end of startup process. This particular exception doesn't seem to be critical because they nodes still manage to finish the startup and become online. I don't have specific steps to reproduce the problem that I'm experiencing with compaction and repair. I'm in the middle of migrating 4.8 billion rows from MySQL via SSTableLoader. Some things that may or may not be relevant: 1. I didn't drop and recreate the keyspace (so probably not related to CASSANDRA-4857) 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch reaches 100% total progress (i.e. starts to build secondary index), I kill the sstableloader process and cancel the index build 3. I restart the nodes occasionally. It's possible that there is an on-going compaction during one of those restarts. Related StackOverflow question (mine): http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction was: I can't finish any compaction because my nodes always throw a "FileNotFoundException". I've already tried the following but nothing helped: 1. nodetool flush 2. nodetool repair (ends with RuntimeException; see attachment) 3. node restart (via dse cassandra-stop) Somewhere near the end of startup process, another type of exception is logged (see attachment) but the nodes are still able to finish the startup and eventually become online. My questions now are: 1. Have I already lost data? I'm in the middle of migrating 4.8 billion rows from MySQL and I'd like to know whether I should already abort and start over 2. What caused the sstable files to go missing? 3. How can I proceed with compaction and repair? Obviously, not being able to do so would eventually lead to serious performance and data issues Related StackOverflow question (mine): http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction Notes: 1. I didn't drop and recreate the keyspace (so probably not related to CASSANDRA-4857) 2. I use sstableloader for the migration. However, since it is designed to wait for the secondary index build to complete before exiting, the overall throughput becomes unacceptable. Due to this, I devised a mechanism that would kill the sstableloader process and cancel the secondary index build when the bulk-loading total progress reaches 100%. So far, I've done this more than 100 times already 3. There are times when I had to restart the nodes because the OS load reached high levels. It's possible that there are compactions in-progress when I restarted the nodes > FileNotFoundException during compaction > --- > > Key: CASSANDRA-7145 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7145 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), > Java 1.7.0_55 >Reporter: PJ >Priority: Blocker > Attachments: compaction - FileNotFoundException.txt, repair - > RuntimeException.txt, startup - AssertionError.txt > > > I can't finish any compaction because my nodes always throw a > "FileNotFoundException". I've already tried the following but nothing helped: > 1. nodetool flush > 2. nodetool repair (ends with RuntimeException; see attachment) > 3. node restart (via dse cassandra-stop) > Whenever I restart the nodes, another type of exception is logged (see > attachment) somewhere near the end of startup process. This particular > exception doesn't seem to be critical because they nodes still manage to > finish the startup and become online. > I don't have specific steps to reproduce the problem that I'm experiencing > with compaction and repair. I'm in the middle of migrating 4.8 billion rows > from MySQL via SSTableLoader. > Some things that may or may not be relevant: > 1. I didn't drop and recreate the keyspace (so probably not related to > CASSANDRA-4857) > 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch > reaches 100% total progress (i.e. starts to build secondary index), I kill > the sstableloader process and cancel the index build > 3. I restart the nodes occasionally. It's possible that there is an on-going > compaction during one of those restarts. > Related StackOverflow question (mine): > http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction -
[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ updated CASSANDRA-7145: -- Description: I can't finish any compaction because my nodes always throw a "FileNotFoundException". I've already tried the following but nothing helped: 1. nodetool flush 2. nodetool repair (ends with RuntimeException; see attachment) 3. node restart (via dse cassandra-stop) Whenever I restart the nodes, another type of exception is logged (see attachment) somewhere near the end of startup process. This particular exception doesn't seem to be critical because the nodes still manage to finish the startup and become online. I don't have specific steps to reproduce the problem that I'm experiencing with compaction and repair. I'm in the middle of migrating 4.8 billion rows from MySQL via SSTableLoader. Some things that may or may not be relevant: 1. I didn't drop and recreate the keyspace (so probably not related to CASSANDRA-4857) 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch reaches 100% total progress (i.e. starts to build secondary index), I kill the sstableloader process and cancel the index build 3. I restart the nodes occasionally. It's possible that there is an on-going compaction during one of those restarts. Related StackOverflow question (mine): http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction was: I can't finish any compaction because my nodes always throw a "FileNotFoundException". I've already tried the following but nothing helped: 1. nodetool flush 2. nodetool repair (ends with RuntimeException; see attachment) 3. node restart (via dse cassandra-stop) Whenever I restart the nodes, another type of exception is logged (see attachment) somewhere near the end of startup process. This particular exception doesn't seem to be critical because they nodes still manage to finish the startup and become online. I don't have specific steps to reproduce the problem that I'm experiencing with compaction and repair. I'm in the middle of migrating 4.8 billion rows from MySQL via SSTableLoader. Some things that may or may not be relevant: 1. I didn't drop and recreate the keyspace (so probably not related to CASSANDRA-4857) 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch reaches 100% total progress (i.e. starts to build secondary index), I kill the sstableloader process and cancel the index build 3. I restart the nodes occasionally. It's possible that there is an on-going compaction during one of those restarts. Related StackOverflow question (mine): http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction > FileNotFoundException during compaction > --- > > Key: CASSANDRA-7145 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7145 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), > Java 1.7.0_55 >Reporter: PJ >Priority: Blocker > Attachments: compaction - FileNotFoundException.txt, repair - > RuntimeException.txt, startup - AssertionError.txt > > > I can't finish any compaction because my nodes always throw a > "FileNotFoundException". I've already tried the following but nothing helped: > 1. nodetool flush > 2. nodetool repair (ends with RuntimeException; see attachment) > 3. node restart (via dse cassandra-stop) > Whenever I restart the nodes, another type of exception is logged (see > attachment) somewhere near the end of startup process. This particular > exception doesn't seem to be critical because the nodes still manage to > finish the startup and become online. > I don't have specific steps to reproduce the problem that I'm experiencing > with compaction and repair. I'm in the middle of migrating 4.8 billion rows > from MySQL via SSTableLoader. > Some things that may or may not be relevant: > 1. I didn't drop and recreate the keyspace (so probably not related to > CASSANDRA-4857) > 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch > reaches 100% total progress (i.e. starts to build secondary index), I kill > the sstableloader process and cancel the index build > 3. I restart the nodes occasionally. It's possible that there is an on-going > compaction during one of those restarts. > Related StackOverflow question (mine): > http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction -- This message was sent by Atlassian JIRA (v6.2#6252)