[ https://issues.apache.org/jira/browse/CASSANDRA-19452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yifan Cai updated CASSANDRA-19452: ---------------------------------- Fix Version/s: NA Since Version: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/a13532272051d4e4608f92d53bdd997103e8ea19 Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Use constant reference time during bulk read process > ---------------------------------------------------------------- > > Key: CASSANDRA-19452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19452 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library > Reporter: Yifan Cai > Assignee: Yifan Cai > Priority: Normal > Fix For: NA > > Time Spent: 1.5h > Remaining Estimate: 0h > > Bulk reader leverages a time provider that returns the current time during > read to guide compaction and validation. > As the current time value varies in spark executors, there is a chance that > rows/cells get expired inconsistently. Another issue is the validation on > no-expired rows/cells after compaction might fail, since they could expire > during read. The read can take minutes or even hours. > It could lead to false data omission and job failure. > The fix is to use constant reference time that is decided by Spark driver and > distribute to all executors. The reference time is used for compaction and > validation later. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org