paul-rogers commented on a change in pull request #1870: DRILL-7359: Add
support for DICT type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r353522274
##########
File path:
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
##########
@@ -268,4 +274,51 @@ public String getAsString() {
}
return requireReader(type).getAsString();
}
+
+ private UnionReaderImpl getNullReader() {
+ AbstractObjectReader[] nullVariants = new
AbstractObjectReader[variants.length];
+ for (int i = 0; i < variants.length; i++) {
+ nullVariants[i] = variants[i].createNullReader();
+ }
+ return new NullUnionReader(schema(), unionAccessor, nullVariants);
+ }
+
+ private static class NullUnionReader extends UnionReaderImpl {
+
+ private NullUnionReader(ColumnMetadata metadata, VectorAccessor va,
AbstractObjectReader[] variants) {
Review comment:
@KazydubB, thanks for the explanation. It clarifies something I wondered
about: the role of the null state readers. Let's talk a bit about the intended
semantics for the readers.
I should be able to write code that caches the readers:
```
DictReader dr = rowReader.dict("myDict");
ScalarWriter kw = dr.key().scalar();
MapWriter vr = dr.value.tuple();
```
That is, I should be able to use the same key reader (`kr`) and value reader
(`vr`) for all members. Then, I should either be able to iterate over the
values:
```
while (dr.next()) {
System.out.println(kr.getString());
System.out.println(vr.getAsString());
}
```
If things work that way, then we don't need a special reader for the null
state: the value reader (here, `vr`) would simply return `true` from `isNull()`:
```
while (dr.next()) {
System.out.println(kr.getString());
if (vr.isNull()) {
System.out.println("NULL");
} else {
System.out.println(vr.getAsString());
}
}
```
(Ignoring, for the moment, that `getAsString()` already does a null check.)
Note that, if we read past the end, both the key and value writers will
point to nothing. I think that, for the array reader, accessing the value one
past the end is an error. For `DICT`, we could simply declare that the value is
NULL.
Then, I was going to sketch out the same behavior if we search for a key and
found...nothing. Seems that the `DictReader` has no way to search for a key. If
it did, it would not be clear what type to use for the key. So, that is a bit
of a mess.
Let's assume we did have something to find a specific value, like we do for
the arrays where we have `setPosn(int index)`. Maybe we have `findStringKey()`
etc. For testing, we could just have `findKey(Object key)` and do a switch on
type as we do for the other readers. So:
```
dr.findStringKey("foo");
if (vr.isNull()) {
System.out.println("NULL");
} else {
System.out.println(vr.getAsString());
}
```
Or, even:
```
if (dr.findStringKey("foo")) {
System.out.println(vr.getAsString());
}
```
The only way to find the value is to do a linear search. If we do a linear
search, and no value exists, the index will point one past the end of the dict.
And, above, we suggested that the key and value readers could return `true` for
`isNull()` in those cases.
Given this, unless I'm missing something, I don't see that we need any kind
of null value reader at all. In fact, I'd even say that we don't want such a
reader if we'd have to do:
```
ObjectReader kr = dr.findStringKey("foo");
if (kr.isNull) {
// kr is a special null reader
} else {
// kr is a Map reader
}
```
The above clutters the code and makes it impossible to cache the value
reader, which will slow inner-most loops that want to use the reader
abstractions.
What do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services