clintropolis commented on code in PR #18403:
URL: https://github.com/apache/druid/pull/18403#discussion_r2332129233


##########
processing/src/main/java/org/apache/druid/data/input/impl/AggregateProjectionSpec.java:
##########
@@ -218,10 +223,9 @@ private static ProjectionOrdering 
computeOrdering(VirtualColumns virtualColumns,
 
     String timeColumnName = null;
     Granularity granularity = null;
-    // try to find the __time column equivalent, which might be a time_floor 
expression to model granularity
-    // bucketing. The time column is decided as the finest granularity on 
__time detected. If the projection does
-    // not have a time-like column, the granularity will be handled as ALL for 
the projection and all projection
-    // rows will use a synthetic timestamp of the minimum timestamp of the 
incremental index
+
+    // determine the granularity and time column name for the projection, 
based on the first time-like grouping column.
+    // if there are multiple time-like grouping columns, they must be 
"coarser" than the first time-like grouping column.

Review Comment:
   this is too restrictive, the __time column does not need to be first, we 
just need to replace any candidate __time column with the finer granularity 
column, since what we are looking for is the finest granularity column we can 
use as a substitute for __time in any queries against the base table



##########
processing/src/main/java/org/apache/druid/data/input/impl/AggregateProjectionSpec.java:
##########
@@ -230,12 +234,40 @@ private static ProjectionOrdering 
computeOrdering(VirtualColumns virtualColumns,
       } else {
         final VirtualColumn vc = 
virtualColumns.getVirtualColumn(dimension.getName());
         final Granularity maybeGranularity = 
Granularities.fromVirtualColumn(vc);
-        if (granularity == null && maybeGranularity != null) {
-          granularity = maybeGranularity;
-          timeColumnName = dimension.getName();
-        } else if (granularity != null && maybeGranularity != null && 
maybeGranularity.isFinerThan(granularity)) {
-          granularity = maybeGranularity;
-          timeColumnName = dimension.getName();
+        if (maybeGranularity == null || 
maybeGranularity.equals(Granularities.ALL)) {
+          // no __time in inputs or not supported, skip
+          continue;
+        }
+        if (granularity == null) {
+          if (maybeGranularity.getClass().equals(PeriodGranularity.class)
+              && maybeGranularity.getTimeZone().equals(DateTimeZone.UTC)
+              && ((PeriodGranularity) maybeGranularity).getOrigin() == null) {
+            granularity = maybeGranularity;
+            timeColumnName = dimension.getName();
+            continue;
+          } else {
+            // we don't allow:
+            // 1. virtual column on __time if there's no grouping on __time
+            // 2. non-UTC or non-epoch origin period granularity
+            throw InvalidInput.exception(
+                "cannot use granularity[%s] on column[%s] as projection time 
column",
+                maybeGranularity,
+                dimension.getName()
+            );
+          }
+        }
+        if (granularity.equals(Granularities.NONE) || 
(granularity.getClass().equals(PeriodGranularity.class)
+                                                       && 
maybeGranularity.getClass().equals(PeriodGranularity.class)
+                                                       && ((PeriodGranularity) 
granularity).canBeMappedTo((PeriodGranularity) maybeGranularity))) {
+          // keep existing granularity
+        } else {
+          throw InvalidInput.exception(
+              "cannot map time granularity[%s] on column[%s] to time 
granularity[%s] on column[%s], failed to determine projection time column",
+              granularity,
+              timeColumnName,
+              maybeGranularity,
+              dimension.getName()
+          );

Review Comment:
   again, not an error condition imo. a projection doesn't need to necessarily 
include __time as a grouping column, if there is no __time column then the 
granularity is effectively the segment granularity, and we use a constant 
__time column where all values are the start time of the segment.



##########
processing/src/main/java/org/apache/druid/data/input/impl/AggregateProjectionSpec.java:
##########
@@ -230,12 +234,40 @@ private static ProjectionOrdering 
computeOrdering(VirtualColumns virtualColumns,
       } else {
         final VirtualColumn vc = 
virtualColumns.getVirtualColumn(dimension.getName());
         final Granularity maybeGranularity = 
Granularities.fromVirtualColumn(vc);
-        if (granularity == null && maybeGranularity != null) {
-          granularity = maybeGranularity;
-          timeColumnName = dimension.getName();
-        } else if (granularity != null && maybeGranularity != null && 
maybeGranularity.isFinerThan(granularity)) {
-          granularity = maybeGranularity;
-          timeColumnName = dimension.getName();
+        if (maybeGranularity == null || 
maybeGranularity.equals(Granularities.ALL)) {
+          // no __time in inputs or not supported, skip
+          continue;
+        }
+        if (granularity == null) {
+          if (maybeGranularity.getClass().equals(PeriodGranularity.class)
+              && maybeGranularity.getTimeZone().equals(DateTimeZone.UTC)
+              && ((PeriodGranularity) maybeGranularity).getOrigin() == null) {
+            granularity = maybeGranularity;
+            timeColumnName = dimension.getName();
+            continue;
+          } else {
+            // we don't allow:
+            // 1. virtual column on __time if there's no grouping on __time
+            // 2. non-UTC or non-epoch origin period granularity
+            throw InvalidInput.exception(
+                "cannot use granularity[%s] on column[%s] as projection time 
column",
+                maybeGranularity,
+                dimension.getName()
+            );

Review Comment:
   I don't think this needs to be an exception, we just need to not consider it 
as a time column, the same way as a coarser granularity would not be __time. 
These columns can still be substituted for query virtual columns, so there is 
value in being able to pre-compute them, they just cannot serve as a 
replacement for __time (or time floor on __time) in queries.



##########
processing/src/main/java/org/apache/druid/segment/projections/Projections.java:
##########
@@ -44,6 +46,8 @@
 
 public class Projections
 {
+  private static final Map<byte[], Boolean> PERIOD_GRAN_CACHE = new 
HashMap<>();

Review Comment:
   this needs to be a concurrent map or use a lock or something



##########
processing/src/main/java/org/apache/druid/segment/projections/Projections.java:
##########
@@ -375,13 +379,21 @@ public static ProjectionMatchBuilder 
matchQueryVirtualColumn(
       // virtual column and underlying expression itself, but this will do for 
now
       final Granularity virtualGranularity = 
Granularities.fromVirtualColumn(queryVirtualColumn);
       if (virtualGranularity != null) {
-        if 
(virtualGranularity.isFinerThan(projection.getEffectiveGranularity())) {
-          return null;
-        }
         // same granularity, replace virtual column directly by remapping it 
to the physical column
         if (projection.getEffectiveGranularity().equals(virtualGranularity)) {
           return matchBuilder.remapColumn(queryVirtualColumn.getOutputName(), 
ColumnHolder.TIME_COLUMN_NAME)
                              
.addReferencedPhysicalColumn(ColumnHolder.TIME_COLUMN_NAME);
+        } else if (virtualGranularity.equals(Granularities.NONE)
+                   || 
projection.getEffectiveGranularity().equals(Granularities.ALL)) {
+          return null;

Review Comment:
   what is this check for none and all separately for? Are there other query 
virtual column granularities that are ok with an all granularity projection?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to