Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/906#discussion_r134627805
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
---
@@ -768,4 +765,73 @@ else if (exprHasPrefix && refHasPrefix) {
}
}
}
+
+ /**
+ * handle FAST NONE specially when Project for query output. This
happens when input returns a
+ * FAST NONE directly ( input does not return any batch with
schema/data).
+ *
+ * Project operator has to return a batch with schema derived using the
following 3 rules:
+ * Case 1: * ==> expand into an empty list of columns.
+ * Case 2: regular column reference ==> treat as nullable-int column
+ * Case 3: expressions => Call ExpressionTreeMaterialization over an
empty vector contain.
--- End diff --
Is this description confusing two different scenarios?
1. Empty result set, but a schema is provided. (The Scan Batch changes go
out of their way to provide a schema when possible.)
2. Null result set: no rows and no schema.
The rules in the Javadoc seem to relate to the second case: there are no
columns to project.
But, what do we do in the first case (when we have a schema, but no rows?)
We should do exactly what we'd do if we had data: matching up columns,
inserting nullable ints for missing columns, etc.
Now, visualize the null result set as the same as an empty result set with
no schema. *Exactly the same* rules apply. We match up columns (for wildcard or
a project list), but will find none. So, we'll replace all reference with a
nullable int.
The point is, there should be only one code path; not two, and the one code
path should gracefully handle the case in which the schema is empty.
That said, it is likely true that debugging the existing code path may be
tedious, and it may be faster to create a new code path. I wonder what that
does for ongoing maintenance costs, however, as future developers have to not
only understand the original path, but now must maintain the parallel "fast
none" path.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---