[ 
https://issues.apache.org/jira/browse/CASSANDRA-19452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19452:
----------------------------------
          Fix Version/s: NA
          Since Version: NA
    Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/a13532272051d4e4608f92d53bdd997103e8ea19
             Resolution: Fixed
                 Status: Resolved  (was: Ready to Commit)

> [Analytics] Use constant reference time during bulk read process
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-19452
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19452
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Analytics Library
>            Reporter: Yifan Cai
>            Assignee: Yifan Cai
>            Priority: Normal
>             Fix For: NA
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Bulk reader leverages a time provider that returns the current time during 
> read to guide compaction and validation.
> As the current time value varies in spark executors, there is a chance that 
> rows/cells get expired inconsistently. Another issue is the validation on 
> no-expired rows/cells after compaction might fail, since they could expire 
> during read. The read can take minutes or even hours.
> It could lead to false data omission and job failure.
> The fix is to use constant reference time that is decided by Spark driver and 
> distribute to all executors. The reference time is used for compaction and 
> validation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to