[ https://issues.apache.org/jira/browse/CASSANDRA-19452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yifan Cai updated CASSANDRA-19452: ---------------------------------- Bug Category: Parent values: Correctness(12982)Level 1 values: Unrecoverable Corruption / Loss(13161) Complexity: Normal Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) > [Analytics] Use constant reference time during bulk read process > ---------------------------------------------------------------- > > Key: CASSANDRA-19452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19452 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library > Reporter: Yifan Cai > Assignee: Yifan Cai > Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Bulk reader leverages a time provider that returns the current time during > read to guide compaction and validation. > As the current time value varies in spark executors, there is a chance that > rows/cells get expired inconsistently. Another issue is the validation on > no-expired rows/cells after compaction might fail, since they could expire > during read. The read can take minutes or even hours. > It could lead to false data omission and job failure. > The fix is to use constant reference time that is decided by Spark driver and > distribute to all executors. The reference time is used for compaction and > validation later. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org