Re: [PR] CASSANDRA-19452 Use constant reference time during bulk read process [cassandra-analytics]

via GitHub Fri, 01 Mar 2024 19:13:47 -0800


yifan-c commented on code in PR #44:
URL: 
https://github.com/apache/cassandra-analytics/pull/44#discussion_r1509874100



##########
cassandra-bridge/src/main/java/org/apache/cassandra/spark/utils/TimeProvider.java:
##########
@@ -19,16 +19,37 @@
 
 package org.apache.cassandra.spark.utils;
 
+import java.util.concurrent.TimeUnit;
+
 /**
  * Provides current time
   */
-@FunctionalInterface
 public interface TimeProvider
 {
-    TimeProvider INSTANCE = () -> (int) 
Math.floorDiv(System.currentTimeMillis(), 1000L);
+    TimeProvider DEFAULT = new TimeProvider()
+    {
+        private final int referenceEpoch = nowInSeconds();

Review Comment:
   The reason of using a single time point is applicable to SBR in general.
   The expectation of bulk read is no different than a single point read. It is 
surprising if data visibility changes (caused by the advancing time) during the 
read. 
   
   Bulk read is essentially a transformation from a collection of sstable to a 
collection of mutations. The time reference has to be the same in order to get 
the same collection of mutations deterministically. The existing behavior is a 
bug, especially when TTL is considered. It has little to do with test setup or 
prod use case. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Re: [PR] CASSANDRA-19452 Use constant reference time during bulk read process [cassandra-analytics]

Reply via email to