[ 
https://issues.apache.org/jira/browse/CASSANDRA-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-11889:
---------------------------------
    Description: 
The following exception was reported in CASSANDRA-11470. It occurred whilst 
listing files with compaction in progress:

{code}
WARN  [CompactionExecutor:2006] 2016-05-23 18:23:31,694 BigTableWriter.java:171 
- Writing large partition 
test_keyspace/test_columnfamily:eda6b9c36f8df6fe596492c3438d7a38e9b109a6 
(123663388 bytes)
INFO  [IndexSummaryManager:1] 2016-05-23 18:24:23,731 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
WARN  [CompactionExecutor:2006] 2016-05-23 18:24:56,669 BigTableWriter.java:171 
- Writing large partition 
test_keyspace/test_columnfamily:05b6b424194dd19ab7cfbcd53c4979768cd859e9 
(256286063 bytes)
WARN  [CompactionExecutor:2006] 2016-05-23 18:26:23,575 BigTableWriter.java:171 
- Writing large partition 
test_keyspace/test_columnfamily:04e9fac15552b9ae77c27a6cb8d0fd11fdcc24d7 
(212445557 bytes)
INFO  [CompactionExecutor:2005] 2016-05-23 18:29:26,839 
LeveledManifest.java:437 - Adding high-level (L3) 
BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61041-big-Data.db')
 to candidates
WARN  [CompactionExecutor:2006] 2016-05-23 18:30:34,154 BigTableWriter.java:171 
- Writing large partition 
test_keyspace/test_columnfamily:edbe6f178503be90911dbf29a55b97a4b095a9ec 
(183852539 bytes)
INFO  [CompactionExecutor:2006] 2016-05-23 18:31:21,080 
LeveledManifest.java:437 - Adding high-level (L3) 
BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61042-big-Data.db')
 to candidates
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 
LogFile.java:173 - Unexpected files detected for sstable [ma-91034-big], record 
[REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
 last update time [00:00:00] should have been [08:29:36]
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,208 
ScheduledReporter.java:119 - RuntimeException thrown from 
GraphiteReporter#report. Exception was suppressed.
java.lang.RuntimeException: Failed to list files in 
/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:692) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
 ~[metrics-graphite-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
 ~[metrics-graphite-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_91]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_91]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_91]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_91]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_91]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: 
org.apache.cassandra.db.lifecycle.LogTransaction$CorruptTransactionLogException:
 Some records failed verification. See earlier in log for details.
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:114)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
~[na:1.8.0_91]
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) 
~[na:1.8.0_91]
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) 
~[na:1.8.0_91]
        at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) 
~[na:1.8.0_91]
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) 
~[na:1.8.0_91]
        at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) 
~[na:1.8.0_91]
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
 ~[na:1.8.0_91]
        at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) 
~[na:1.8.0_91]
        at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) 
~[na:1.8.0_91]
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        ... 22 common frames omitted
INFO  [CompactionExecutor:2005] 2016-05-23 18:32:27,284 
LeveledManifest.java:437 - Adding high-level (L3) 
BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98189-big-Data.db')
 to candidates
INFO  [CompactionExecutor:2005] 2016-05-23 18:34:42,562 
LeveledManifest.java:437 - Adding high-level (L3) 
BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98210-big-Data.db')
 to candidates
{code}

The error of interest is:

{code}
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 
LogFile.java:173 - Unexpected files detected for sstable [ma-91034-big], record 
[REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
 last update time [00:00:00] should have been [08:29:36]
{code}

The problem is this code here in LogRecord:

{code}
    public LogRecord withExistingFiles()
    {
        return make(type, getExistingFiles(), 0, absolutePath.get());
    }

    public static LogRecord make(Type type, List<File> files, int minFiles, 
String absolutePath)
    {
        long lastModified = files.stream().map(File::lastModified).reduce(0L, 
Long::max);
        return new LogRecord(type, absolutePath, lastModified, 
Math.max(minFiles, files.size()));
    }
{code}

If the sstable files are completely deleted after getExistingFiles() has 
returned them and before lastModified is calculated, then we may report last 
update time as zero because the files no longer exist and still indicate that 
there are files on disk so that this code in LogFile.verifyRecord fails:

{code}
        record.status.onDiskRecord = record.withExistingFiles();
        if (record.updateTime != record.status.onDiskRecord.updateTime && 
record.status.onDiskRecord.numFiles > 0)
        {
            record.setError(String.format("Unexpected files detected for 
sstable [%s], " +
                                          "record [%s]: last update time [%tT] 
should have been [%tT]",
                                          record.fileName(),
                                          record,
                                          record.status.onDiskRecord.updateTime,
                                          record.updateTime));

        }
{code}

As a consequence, we throw an exception when listing files. In this case we 
were calculating the true snapshot size for the metrics.

Note that we are careful to delete files from oldest to newest so that the last 
update time should always match what is stored in the txn log, however if all 
files are deleted then we report last update time as zero. So, if my analysis 
is correct, this problem should not be very frequent.

  was:
The following exception was reported in CASSANDRA-11470. It occurred whilst 
listing files with compaction in progress:

{code}
WARN  [CompactionExecutor:2006] 2016-05-23 18:23:31,694 BigTableWriter.java:171 
- Writing large partition 
test_keyspace/test_columnfamily:eda6b9c36f8df6fe596492c3438d7a38e9b109a6 
(123663388 bytes)
INFO  [IndexSummaryManager:1] 2016-05-23 18:24:23,731 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
WARN  [CompactionExecutor:2006] 2016-05-23 18:24:56,669 BigTableWriter.java:171 
- Writing large partition 
test_keyspace/test_columnfamily:05b6b424194dd19ab7cfbcd53c4979768cd859e9 
(256286063 bytes)
WARN  [CompactionExecutor:2006] 2016-05-23 18:26:23,575 BigTableWriter.java:171 
- Writing large partition 
test_keyspace/test_columnfamily:04e9fac15552b9ae77c27a6cb8d0fd11fdcc24d7 
(212445557 bytes)
INFO  [CompactionExecutor:2005] 2016-05-23 18:29:26,839 
LeveledManifest.java:437 - Adding high-level (L3) 
BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61041-big-Data.db')
 to candidates
WARN  [CompactionExecutor:2006] 2016-05-23 18:30:34,154 BigTableWriter.java:171 
- Writing large partition 
test_keyspace/test_columnfamily:edbe6f178503be90911dbf29a55b97a4b095a9ec 
(183852539 bytes)
INFO  [CompactionExecutor:2006] 2016-05-23 18:31:21,080 
LeveledManifest.java:437 - Adding high-level (L3) 
BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61042-big-Data.db')
 to candidates
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 
LogFile.java:173 - Unexpected files detected for sstable [ma-91034-big], record 
[REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
 last update time [00:00:00] should have been [08:29:36]
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,208 
ScheduledReporter.java:119 - RuntimeException thrown from 
GraphiteReporter#report. Exception was suppressed.
java.lang.RuntimeException: Failed to list files in 
/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:692) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) 
~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
 ~[metrics-graphite-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
 ~[metrics-graphite-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_91]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_91]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_91]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_91]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_91]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: 
org.apache.cassandra.db.lifecycle.LogTransaction$CorruptTransactionLogException:
 Some records failed verification. See earlier in log for details.
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:114)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
~[na:1.8.0_91]
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) 
~[na:1.8.0_91]
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) 
~[na:1.8.0_91]
        at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) 
~[na:1.8.0_91]
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) 
~[na:1.8.0_91]
        at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) 
~[na:1.8.0_91]
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
 ~[na:1.8.0_91]
        at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) 
~[na:1.8.0_91]
        at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) 
~[na:1.8.0_91]
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
 ~[apache-cassandra-3.0.6.jar:3.0.6]
        ... 22 common frames omitted
INFO  [CompactionExecutor:2005] 2016-05-23 18:32:27,284 
LeveledManifest.java:437 - Adding high-level (L3) 
BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98189-big-Data.db')
 to candidates
INFO  [CompactionExecutor:2005] 2016-05-23 18:34:42,562 
LeveledManifest.java:437 - Adding high-level (L3) 
BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98210-big-Data.db')
 to candidates
{code}

The error of interest is:

{code}
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 
LogFile.java:173 - Unexpected files detected for sstable [ma-91034-big], record 
[REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
 last update time [00:00:00] should have been [08:29:36]
{code}

The problem is this code here in LogRecord:

{code}
 public LogRecord withExistingFiles()
    {
        return make(type, getExistingFiles(), 0, absolutePath.get());
    }

    public static LogRecord make(Type type, List<File> files, int minFiles, 
String absolutePath)
    {
        long lastModified = files.stream().map(File::lastModified).reduce(0L, 
Long::max);
        return new LogRecord(type, absolutePath, lastModified, 
Math.max(minFiles, files.size()));
    }
{code}

If the sstable files are completely deleted after getExistingFiles() has 
returned them and before lastModified is calculated, then we may report last 
update time as zero because the files no longer exist and still indicate that 
there are files on disk so that this code in LogFile.verifyRecord fails:

{code}
record.status.onDiskRecord = record.withExistingFiles();
        if (record.updateTime != record.status.onDiskRecord.updateTime && 
record.status.onDiskRecord.numFiles > 0)
        {
            record.setError(String.format("Unexpected files detected for 
sstable [%s], " +
                                          "record [%s]: last update time [%tT] 
should have been [%tT]",
                                          record.fileName(),
                                          record,
                                          record.status.onDiskRecord.updateTime,
                                          record.updateTime));

        }
{code}

As a consequence, we throw an exception when listing files. In this case we 
were calculating the true snapshot size for the metrics.

Note that we are careful to delete files from oldest to newest so that the last 
update time should always match what is stored in the txn log, however if all 
files are deleted then we report last update time as zero.


> LogRecord: file system race condition may cause verify() to fail
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-11889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11889
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.0.x, 3.x
>
>
> The following exception was reported in CASSANDRA-11470. It occurred whilst 
> listing files with compaction in progress:
> {code}
> WARN  [CompactionExecutor:2006] 2016-05-23 18:23:31,694 
> BigTableWriter.java:171 - Writing large partition 
> test_keyspace/test_columnfamily:eda6b9c36f8df6fe596492c3438d7a38e9b109a6 
> (123663388 bytes)
> INFO  [IndexSummaryManager:1] 2016-05-23 18:24:23,731 
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> WARN  [CompactionExecutor:2006] 2016-05-23 18:24:56,669 
> BigTableWriter.java:171 - Writing large partition 
> test_keyspace/test_columnfamily:05b6b424194dd19ab7cfbcd53c4979768cd859e9 
> (256286063 bytes)
> WARN  [CompactionExecutor:2006] 2016-05-23 18:26:23,575 
> BigTableWriter.java:171 - Writing large partition 
> test_keyspace/test_columnfamily:04e9fac15552b9ae77c27a6cb8d0fd11fdcc24d7 
> (212445557 bytes)
> INFO  [CompactionExecutor:2005] 2016-05-23 18:29:26,839 
> LeveledManifest.java:437 - Adding high-level (L3) 
> BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61041-big-Data.db')
>  to candidates
> WARN  [CompactionExecutor:2006] 2016-05-23 18:30:34,154 
> BigTableWriter.java:171 - Writing large partition 
> test_keyspace/test_columnfamily:edbe6f178503be90911dbf29a55b97a4b095a9ec 
> (183852539 bytes)
> INFO  [CompactionExecutor:2006] 2016-05-23 18:31:21,080 
> LeveledManifest.java:437 - Adding high-level (L3) 
> BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61042-big-Data.db')
>  to candidates
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 
> LogFile.java:173 - Unexpected files detected for sstable [ma-91034-big], 
> record 
> [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
>  last update time [00:00:00] should have been [08:29:36]
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,208 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.RuntimeException: Failed to list files in 
> /data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396
>       at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:692) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_91]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_91]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_91]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: 
> org.apache.cassandra.db.lifecycle.LogTransaction$CorruptTransactionLogException:
>  Some records failed verification. See earlier in log for details.
>       at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:114)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> ~[na:1.8.0_91]
>       at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) 
> ~[na:1.8.0_91]
>       at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>  ~[na:1.8.0_91]
>       at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) 
> ~[na:1.8.0_91]
>       at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) 
> ~[na:1.8.0_91]
>       at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) 
> ~[na:1.8.0_91]
>       at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>  ~[na:1.8.0_91]
>       at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) 
> ~[na:1.8.0_91]
>       at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) 
> ~[na:1.8.0_91]
>       at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>       ... 22 common frames omitted
> INFO  [CompactionExecutor:2005] 2016-05-23 18:32:27,284 
> LeveledManifest.java:437 - Adding high-level (L3) 
> BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98189-big-Data.db')
>  to candidates
> INFO  [CompactionExecutor:2005] 2016-05-23 18:34:42,562 
> LeveledManifest.java:437 - Adding high-level (L3) 
> BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98210-big-Data.db')
>  to candidates
> {code}
> The error of interest is:
> {code}
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 
> LogFile.java:173 - Unexpected files detected for sstable [ma-91034-big], 
> record 
> [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
>  last update time [00:00:00] should have been [08:29:36]
> {code}
> The problem is this code here in LogRecord:
> {code}
>     public LogRecord withExistingFiles()
>     {
>         return make(type, getExistingFiles(), 0, absolutePath.get());
>     }
>     public static LogRecord make(Type type, List<File> files, int minFiles, 
> String absolutePath)
>     {
>         long lastModified = files.stream().map(File::lastModified).reduce(0L, 
> Long::max);
>         return new LogRecord(type, absolutePath, lastModified, 
> Math.max(minFiles, files.size()));
>     }
> {code}
> If the sstable files are completely deleted after getExistingFiles() has 
> returned them and before lastModified is calculated, then we may report last 
> update time as zero because the files no longer exist and still indicate that 
> there are files on disk so that this code in LogFile.verifyRecord fails:
> {code}
>         record.status.onDiskRecord = record.withExistingFiles();
>         if (record.updateTime != record.status.onDiskRecord.updateTime && 
> record.status.onDiskRecord.numFiles > 0)
>         {
>             record.setError(String.format("Unexpected files detected for 
> sstable [%s], " +
>                                           "record [%s]: last update time 
> [%tT] should have been [%tT]",
>                                           record.fileName(),
>                                           record,
>                                           
> record.status.onDiskRecord.updateTime,
>                                           record.updateTime));
>         }
> {code}
> As a consequence, we throw an exception when listing files. In this case we 
> were calculating the true snapshot size for the metrics.
> Note that we are careful to delete files from oldest to newest so that the 
> last update time should always match what is stored in the txn log, however 
> if all files are deleted then we report last update time as zero. So, if my 
> analysis is correct, this problem should not be very frequent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to