[GitHub] [drill] paul-rogers commented on a change in pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework

GitBox Tue, 03 Dec 2019 18:30:31 -0800

paul-rogers commented on a change in pull request #1870: DRILL-7359: Add 
support for DICT type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r353522274


 ##########
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##########
 @@ -268,4 +274,51 @@ public String getAsString() {
     }
     return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+    AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+    for (int i = 0; i < variants.length; i++) {
+      nullVariants[i] = variants[i].createNullReader();
+    }
+    return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+    private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @KazydubB, thanks for the explanation. It clarifies something I wondered 
about: the role of the null state readers. Let's talk a bit about the intended 
semantics for the readers.
   
   I should be able to write code that caches the readers:
   
   ```
   DictReader dr = rowReader.dict("myDict");
   ScalarWriter kw = dr.key().scalar();
   MapWriter vr = dr.value.tuple();
   ```
   
   That is, I should be able to use the same key reader (`kr`) and value reader 
(`vr`) for all members. Then, I should either be able to iterate over the 
values:
   
   ```
   while (dr.next()) {
     System.out.println(kr.getString());
     System.out.println(vr.getAsString());
   }
   ```
   
   If things work that way, then we don't need a special reader for the null 
state: the value reader (here, `vr`) would simply return `true` from `isNull()`:
   
   ```
   while (dr.next()) {
     System.out.println(kr.getString());
     if (vr.isNull()) {
       System.out.println("NULL");
     } else {
       System.out.println(vr.getAsString());
     }
   }
   ```
   
   (Ignoring, for the moment, that `getAsString()` already does a null check.)
   
   Note that, if we read past the end, both the key and value writers will 
point to nothing. I think that, for the array reader, accessing the value one 
past the end is an error. For `DICT`, we could simply declare that the value is 
NULL.
   
   Then, I was going to sketch out the same behavior if we search for a key and 
found...nothing. Seems that the `DictReader` has no way to search for a key. If 
it did, it would not be clear what type to use for the key. So, that is a bit 
of a mess.
   
   Let's assume we did have something to find a specific value, like we do for 
the arrays where we have `setPosn(int index)`. Maybe we have `findStringKey()` 
etc. For testing, we could just have `findKey(Object key)` and do a switch on 
type as we do for the other readers. So:
   
   ```
     dr.findStringKey("foo");
     if (vr.isNull()) {
       System.out.println("NULL");
     } else {
       System.out.println(vr.getAsString());
     }
   ```
   
   Or, even:
   
   ```
     if (dr.findStringKey("foo")) {
       System.out.println(vr.getAsString());
     }
   ```
   
   The only way to find the value is to do a linear search. If we do a linear 
search, and no value exists, the index will point one past the end of the dict. 
And, above, we suggested that the key and value readers could return `true` for 
`isNull()` in those cases.
   
   Given this, unless I'm missing something, I don't see that we need any kind 
of null value reader at all. In fact, I'd even say that we don't want such a 
reader if we'd have to do:
   
   ```
   ObjectReader kr = dr.findStringKey("foo");
   if (kr.isNull) {
     // kr is a special null reader
   } else {
     // kr is a Map reader
   }
   ```
   
   The above clutters the code and makes it impossible to cache the value 
reader, which will slow inner-most loops that want to use the reader 
abstractions.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework

Reply via email to