paul-rogers commented on a change in pull request #1870: DRILL-7359: Add
support for DICT type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361551286
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java
##########
@@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent,
ColumnReadProjection c
vectorState);
}
+ /**
+ * Check if this is a special case when vector, writer and column state
should be
+ * created for a primitive field though the field itself is not projected.
This is
+ * needed because {@code DICT}'s {@code keys} field is not projected but is
needed
+ * to be initialized to ensure the dict vector is constructed properly.
Review comment:
Interesting. I'm a bit confused, however. Help me understand what's
happening here.
If a column X is unprojected, this means that, in the batch coming out of
the Scan, say, we don't want column X to appear at all. This why, for all other
projected columns, we do not create an actual value vector.
From the perspective of the reader using the Result Set Loader, the column
does exist: I can write to it. So, we create a "dummy" writer: one which simply
ignores the values given it.
Let's imagine how that would work for a DICT. My input file has a DICT
field. But, that DICT is not projected. Either way, I may find it easier to
read the DICT whether it is projected or not.
Though a DICT represents a map conceptually (and is implemented as a
different kind of map internally), it is really just a correlated array at the
data level. So, I just want to write key/value pairs to the reader. A previous
comment sketched out how I might want to do that. (I've not yet gotten to the
new code where that part is implemented.)
Unless I'm missing something obvious, I think the above reasoning means
that, if a DICT is not projected, we want a dummy DICT writer (using dummy
writers for the keys and values). But, I can't see why we would create the
actual DICT vector.
Can you help me see what I'm missing?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services