paul-rogers commented on a change in pull request #1870: DRILL-7359: Add 
support for DICT type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361551286
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java
 ##########
 @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, 
ColumnReadProjection c
         vectorState);
   }
 
+  /**
+   * Check if this is a special case when vector, writer and column state 
should be
+   * created for a primitive field though the field itself is not projected. 
This is
+   * needed because {@code DICT}'s {@code keys} field is not projected but is 
needed
+   * to be initialized to ensure the dict vector is constructed properly.
 
 Review comment:
   Interesting. I'm a bit confused, however. Help me understand what's 
happening here.
   
   If a column X is unprojected, this means that, in the batch coming out of 
the Scan, say, we don't want column X to appear at all. This why, for all other 
projected columns, we do not create an actual value vector.
   
   From the perspective of the reader using the Result Set Loader, the column 
does exist: I can write to it. So, we create a "dummy" writer: one which simply 
ignores the values given it.
   
   Let's imagine how that would work for a DICT. My input file has a DICT 
field. But, that DICT is not projected. Either way, I may find it easier to 
read the DICT whether it is projected or not.
   
   Though a DICT represents a map conceptually (and is implemented as a 
different kind of map internally), it is really just a correlated array at the 
data level. So, I just want to write key/value pairs to the reader. A previous 
comment sketched out how I might want to do that. (I've not yet gotten to the 
new code where that part is implemented.)
   
   Unless I'm missing something obvious, I think the above reasoning means 
that, if a DICT is not projected, we want a dummy DICT writer (using dummy 
writers for the keys and values). But, I can't see why we would create the 
actual DICT vector.
   
   Can you help me see what I'm missing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to