[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction

2015-12-02 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-7145:
---
Component/s: Compaction

> FileNotFoundException during compaction
> ---
>
> Key: CASSANDRA-7145
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7145
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), 
> Java 1.7.0_55
>Reporter: PJ
>Assignee: Marcus Eriksson
> Fix For: 1.2.19, 2.0.11, 2.1.0
>
> Attachments: 
> 0001-avoid-marking-compacted-sstables-as-compacting.patch, compaction - 
> FileNotFoundException.txt, repair - RuntimeException.txt, startup - 
> AssertionError.txt
>
>
> I can't finish any compaction because my nodes always throw a 
> "FileNotFoundException". I've already tried the following but nothing helped:
> 1. nodetool flush
> 2. nodetool repair (ends with RuntimeException; see attachment)
> 3. node restart (via dse cassandra-stop)
> Whenever I restart the nodes, another type of exception is logged (see 
> attachment) somewhere near the end of startup process. This particular 
> exception doesn't seem to be critical because the nodes still manage to 
> finish the startup and become online.
> I don't have specific steps to reproduce the problem that I'm experiencing 
> with compaction and repair. I'm in the middle of migrating 4.8 billion rows 
> from MySQL via SSTableLoader. 
> Some things that may or may not be relevant:
> 1. I didn't drop and recreate the keyspace (so probably not related to 
> CASSANDRA-4857)
> 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch 
> reaches 100% total progress (i.e. starts to build secondary index), I kill 
> the sstableloader process and cancel the index build
> 3. I restart the nodes occasionally. It's possible that there is an on-going 
> compaction during one of those restarts.
> Related StackOverflow question (mine): 
> http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction

2014-08-22 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-7145:
---

Attachment: 0001-avoid-marking-compacted-sstables-as-compacting.patch

If we have a situation where this happens (in sequence);

# We ask LeveledManifest for a new CompactionCandidate
# LCS returns a CompactionCandidate containing sstables marked as compacting (a 
bug)
# The compaction that held one of the sstables we marked in #2 finishes and 
removes the files that were included in the compaction
# We successfully mark the compacted sstable as compacting (it is no longer 
marked as compacting in the View)
# FileNotFoundException once we start trying to compact

Attached patch 
* removes a case in LCS where we could return compacting sstables in a 
CompactionCandidate
* makes sure we can't mark compacted sstables as compacting

It would be much appreciated if anyone that can reproduce this could try with 
the attached patch to see if the problem goes away.

> FileNotFoundException during compaction
> ---
>
> Key: CASSANDRA-7145
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7145
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), 
> Java 1.7.0_55
>Reporter: PJ
>Assignee: Marcus Eriksson
> Fix For: 2.0.10
>
> Attachments: 
> 0001-avoid-marking-compacted-sstables-as-compacting.patch, compaction - 
> FileNotFoundException.txt, repair - RuntimeException.txt, startup - 
> AssertionError.txt
>
>
> I can't finish any compaction because my nodes always throw a 
> "FileNotFoundException". I've already tried the following but nothing helped:
> 1. nodetool flush
> 2. nodetool repair (ends with RuntimeException; see attachment)
> 3. node restart (via dse cassandra-stop)
> Whenever I restart the nodes, another type of exception is logged (see 
> attachment) somewhere near the end of startup process. This particular 
> exception doesn't seem to be critical because the nodes still manage to 
> finish the startup and become online.
> I don't have specific steps to reproduce the problem that I'm experiencing 
> with compaction and repair. I'm in the middle of migrating 4.8 billion rows 
> from MySQL via SSTableLoader. 
> Some things that may or may not be relevant:
> 1. I didn't drop and recreate the keyspace (so probably not related to 
> CASSANDRA-4857)
> 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch 
> reaches 100% total progress (i.e. starts to build secondary index), I kill 
> the sstableloader process and cancel the index build
> 3. I restart the nodes occasionally. It's possible that there is an on-going 
> compaction during one of those restarts.
> Related StackOverflow question (mine): 
> http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction

2014-06-24 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-7145:
--

Priority: Major  (was: Blocker)

I'm really going to need more to troubleshoot this effectively.

# How did your cluster get into this state?
# Can you reproduce starting from a non-broken state?
# Does it still happen on 2.0.8?

> FileNotFoundException during compaction
> ---
>
> Key: CASSANDRA-7145
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7145
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), 
> Java 1.7.0_55
>Reporter: PJ
> Attachments: compaction - FileNotFoundException.txt, repair - 
> RuntimeException.txt, startup - AssertionError.txt
>
>
> I can't finish any compaction because my nodes always throw a 
> "FileNotFoundException". I've already tried the following but nothing helped:
> 1. nodetool flush
> 2. nodetool repair (ends with RuntimeException; see attachment)
> 3. node restart (via dse cassandra-stop)
> Whenever I restart the nodes, another type of exception is logged (see 
> attachment) somewhere near the end of startup process. This particular 
> exception doesn't seem to be critical because the nodes still manage to 
> finish the startup and become online.
> I don't have specific steps to reproduce the problem that I'm experiencing 
> with compaction and repair. I'm in the middle of migrating 4.8 billion rows 
> from MySQL via SSTableLoader. 
> Some things that may or may not be relevant:
> 1. I didn't drop and recreate the keyspace (so probably not related to 
> CASSANDRA-4857)
> 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch 
> reaches 100% total progress (i.e. starts to build secondary index), I kill 
> the sstableloader process and cancel the index build
> 3. I restart the nodes occasionally. It's possible that there is an on-going 
> compaction during one of those restarts.
> Related StackOverflow question (mine): 
> http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction

2014-05-04 Thread PJ (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ updated CASSANDRA-7145:
--

Description: 
I can't finish any compaction because my nodes always throw a 
"FileNotFoundException". I've already tried the following but nothing helped:

1. nodetool flush
2. nodetool repair (ends with RuntimeException; see attachment)
3. node restart (via dse cassandra-stop)

Whenever I restart the nodes, another type of exception is logged (see 
attachment) somewhere near the end of startup process. This particular 
exception doesn't seem to be critical because they nodes still manage to finish 
the startup and become online.

I don't have specific steps to reproduce the problem that I'm experiencing with 
compaction and repair. I'm in the middle of migrating 4.8 billion rows from 
MySQL via SSTableLoader. 

Some things that may or may not be relevant:
1. I didn't drop and recreate the keyspace (so probably not related to 
CASSANDRA-4857)
2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch 
reaches 100% total progress (i.e. starts to build secondary index), I kill the 
sstableloader process and cancel the index build
3. I restart the nodes occasionally. It's possible that there is an on-going 
compaction during one of those restarts.

Related StackOverflow question (mine): 
http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction

  was:
I can't finish any compaction because my nodes always throw a 
"FileNotFoundException". I've already tried the following but nothing helped:

1. nodetool flush
2. nodetool repair (ends with RuntimeException; see attachment)
3. node restart (via dse cassandra-stop)

Somewhere near the end of startup process, another type of exception is logged 
(see attachment) but the nodes are still able to finish the startup and 
eventually become online.

My questions now are:
1. Have I already lost data? I'm in the middle of migrating 4.8 billion rows 
from MySQL and I'd like to know whether I should already abort and start over
2. What caused the sstable files to go missing?
3. How can I proceed with compaction and repair? Obviously, not being able to 
do so would eventually lead to serious performance and data issues

Related StackOverflow question (mine): 
http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction

Notes:
1. I didn't drop and recreate the keyspace (so probably not related to 
CASSANDRA-4857)
2. I use sstableloader for the migration. However, since it is designed to wait 
for the secondary index build to complete before exiting, the overall 
throughput becomes unacceptable. Due to this, I devised a mechanism that would 
kill the sstableloader process and cancel the secondary index build when the 
bulk-loading total progress reaches 100%. So far, I've done this more than 100 
times already
3. There are times when I had to restart the nodes because the OS load reached 
high levels. It's possible that there are compactions in-progress when I 
restarted the nodes


> FileNotFoundException during compaction
> ---
>
> Key: CASSANDRA-7145
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7145
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), 
> Java 1.7.0_55
>Reporter: PJ
>Priority: Blocker
> Attachments: compaction - FileNotFoundException.txt, repair - 
> RuntimeException.txt, startup - AssertionError.txt
>
>
> I can't finish any compaction because my nodes always throw a 
> "FileNotFoundException". I've already tried the following but nothing helped:
> 1. nodetool flush
> 2. nodetool repair (ends with RuntimeException; see attachment)
> 3. node restart (via dse cassandra-stop)
> Whenever I restart the nodes, another type of exception is logged (see 
> attachment) somewhere near the end of startup process. This particular 
> exception doesn't seem to be critical because they nodes still manage to 
> finish the startup and become online.
> I don't have specific steps to reproduce the problem that I'm experiencing 
> with compaction and repair. I'm in the middle of migrating 4.8 billion rows 
> from MySQL via SSTableLoader. 
> Some things that may or may not be relevant:
> 1. I didn't drop and recreate the keyspace (so probably not related to 
> CASSANDRA-4857)
> 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch 
> reaches 100% total progress (i.e. starts to build secondary index), I kill 
> the sstableloader process and cancel the index build
> 3. I restart the nodes occasionally. It's possible that there is an on-going 
> compaction during one of those restarts.
> Related StackOverflow question (mine): 
> http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction



-

[jira] [Updated] (CASSANDRA-7145) FileNotFoundException during compaction

2014-05-04 Thread PJ (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ updated CASSANDRA-7145:
--

Description: 
I can't finish any compaction because my nodes always throw a 
"FileNotFoundException". I've already tried the following but nothing helped:

1. nodetool flush
2. nodetool repair (ends with RuntimeException; see attachment)
3. node restart (via dse cassandra-stop)

Whenever I restart the nodes, another type of exception is logged (see 
attachment) somewhere near the end of startup process. This particular 
exception doesn't seem to be critical because the nodes still manage to finish 
the startup and become online.

I don't have specific steps to reproduce the problem that I'm experiencing with 
compaction and repair. I'm in the middle of migrating 4.8 billion rows from 
MySQL via SSTableLoader. 

Some things that may or may not be relevant:
1. I didn't drop and recreate the keyspace (so probably not related to 
CASSANDRA-4857)
2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch 
reaches 100% total progress (i.e. starts to build secondary index), I kill the 
sstableloader process and cancel the index build
3. I restart the nodes occasionally. It's possible that there is an on-going 
compaction during one of those restarts.

Related StackOverflow question (mine): 
http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction

  was:
I can't finish any compaction because my nodes always throw a 
"FileNotFoundException". I've already tried the following but nothing helped:

1. nodetool flush
2. nodetool repair (ends with RuntimeException; see attachment)
3. node restart (via dse cassandra-stop)

Whenever I restart the nodes, another type of exception is logged (see 
attachment) somewhere near the end of startup process. This particular 
exception doesn't seem to be critical because they nodes still manage to finish 
the startup and become online.

I don't have specific steps to reproduce the problem that I'm experiencing with 
compaction and repair. I'm in the middle of migrating 4.8 billion rows from 
MySQL via SSTableLoader. 

Some things that may or may not be relevant:
1. I didn't drop and recreate the keyspace (so probably not related to 
CASSANDRA-4857)
2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch 
reaches 100% total progress (i.e. starts to build secondary index), I kill the 
sstableloader process and cancel the index build
3. I restart the nodes occasionally. It's possible that there is an on-going 
compaction during one of those restarts.

Related StackOverflow question (mine): 
http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction


> FileNotFoundException during compaction
> ---
>
> Key: CASSANDRA-7145
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7145
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), 
> Java 1.7.0_55
>Reporter: PJ
>Priority: Blocker
> Attachments: compaction - FileNotFoundException.txt, repair - 
> RuntimeException.txt, startup - AssertionError.txt
>
>
> I can't finish any compaction because my nodes always throw a 
> "FileNotFoundException". I've already tried the following but nothing helped:
> 1. nodetool flush
> 2. nodetool repair (ends with RuntimeException; see attachment)
> 3. node restart (via dse cassandra-stop)
> Whenever I restart the nodes, another type of exception is logged (see 
> attachment) somewhere near the end of startup process. This particular 
> exception doesn't seem to be critical because the nodes still manage to 
> finish the startup and become online.
> I don't have specific steps to reproduce the problem that I'm experiencing 
> with compaction and repair. I'm in the middle of migrating 4.8 billion rows 
> from MySQL via SSTableLoader. 
> Some things that may or may not be relevant:
> 1. I didn't drop and recreate the keyspace (so probably not related to 
> CASSANDRA-4857)
> 2. I do the bulk-loading in batches of 1 to 20 millions rows. When a batch 
> reaches 100% total progress (i.e. starts to build secondary index), I kill 
> the sstableloader process and cancel the index build
> 3. I restart the nodes occasionally. It's possible that there is an on-going 
> compaction during one of those restarts.
> Related StackOverflow question (mine): 
> http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction



--
This message was sent by Atlassian JIRA
(v6.2#6252)