[Also adding forgotten cc to dev. list]
Jacques,
Jacques Nadeau wrote:
It's purpose is to make sure that a projection that finds no columns still
produces output. Think record {a:4,b:6}
If I say 'select c from t' , the reader (without this functionality) return
zero columns (not supported). As such, ensure at least one adds a column. I
believe it is still needed
In what sense is that not supported? For example, where would returning zero
columns be expected to break?
When I tried disabling ensureAtLeastOneField(), the only thing I've seen break
so far is the empty-schema check in IteratorValidatorBatchIterator. What that
validator check is also disabled, all drill-java-exec unit tests run. (I'm
currently checking later-run tests (drill-jdbc, etc.) and have not yet tried
the regression tests.)
but should probably check if a column is already setup in the output mutation
and, if so, don't re-add the ensure at least one column. Thus probably requires
a new method in output mutator (isEmpty or size)
Yeah; I was wondering how to get that information given that only writing
methods seemed to be available on the interface types used in
ensureAtLeastOneField().
On Oct 3, 2015 7:49 PM, "Daniel Barclay" <dbarc...@maprtech.com
<mailto:dbarc...@maprtech.com>> wrote:
What exactly is the purpose of ensureAtLeastOneField()
org.apache.drill.exec.vector.complex.fn.JsonReader (drill/JsonReader.java at
fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 ยท apache/drill
<https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)?
Is it still needed?
Currently, it makes the first field of a map be a nullable-integer field if
the JSON reader reads no nows. However, it does that /regardless/ of whether
the first field already exists from earlier readers, causing a later reader and
ScanBatch to signal a schema change when there wasn't really a schema change.
This is currently causing breaking in the attempt to fix the
ScanBatch/IterOutcome problems underlying DRILL-2288.
Example case:
- There are three JSON files. The first and last to be read have the same
schema. The middle file is empty. They are all read by the same ScanBatch.
- The first JSON reader sets map fields.
- The second JSON reader sees no rows, so its atLeastOneWrite flag isn't
set, so its ensureAtLeastOneField() thinks it needs to add a field, but
forcibly sets the first field to be a nullable-int field--/regardless /of
whether a first field exists, so it changes what the first reader set it to.
- Then, somewhere in and/or downstream of the third reader (with
in-progress ScanBatch fixes in place), Drill gets incompatible-vector errors
(mentioning the second reader's NullableIntVector vs. the original reader's
type for the first field) and/or schema-change-not-supported errors because
ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really
change (between the first and third JSON files).
Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or
unsupported-schema-change errors, and did not cause any new errors in the
java-exec unit tests. (I haven't checked other tests yet.)
Also, ensureAtLeastOneField() (or next()) has a comment about making sure
there's a field in order to return a record count, but next() returns the
record count.
Those two things make me wonder if ensureAtLeastOneField()is now obsolete.
Can it be deleted now?
Or is it the case that it is still needed, but it needs to check whether
there are already any fields before (currently blindly) creating one?
Thanks,
Daniel
--
Daniel Barclay
MapR Technologies
--
Daniel Barclay
MapR Technologies