[Also adding forgotten cc to dev. list]

Jacques,

Jacques Nadeau wrote:

It's purpose is to make sure that a projection that finds no columns still 
produces output.  Think record {a:4,b:6}

If I say 'select c from t' ,  the reader (without this functionality) return 
zero columns (not supported).   As such, ensure at least one adds a column.  I 
believe it is still needed

In what sense is that not supported?  For example, where would returning zero 
columns be expected to break?

When I tried disabling ensureAtLeastOneField(), the only thing I've seen break 
so far is the empty-schema check in IteratorValidatorBatchIterator. What that 
validator check is also disabled, all drill-java-exec unit tests run.  (I'm 
currently checking later-run tests (drill-jdbc, etc.) and have not yet tried 
the regression tests.)

but should probably check if a column is already setup in the output mutation 
and, if so, don't re-add the ensure at least one column. Thus probably requires 
a new method in output mutator (isEmpty or size)

Yeah; I was wondering how to get that information given that only writing 
methods seemed to be available on the interface types used in 
ensureAtLeastOneField().
On Oct 3, 2015 7:49 PM, "Daniel Barclay" <dbarc...@maprtech.com 
<mailto:dbarc...@maprtech.com>> wrote:

    What exactly is the purpose of ensureAtLeastOneField() 
org.apache.drill.exec.vector.complex.fn.JsonReader (drill/JsonReader.java at 
fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 ยท apache/drill 
<https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)?

    Is it still needed?


    Currently, it makes the first field of a map be a nullable-integer field if 
the JSON reader reads no nows. However, it does that /regardless/ of whether 
the first field already exists from earlier readers, causing a later reader and 
ScanBatch to signal a schema change when there wasn't really a schema change.  
This is currently causing breaking in the attempt to fix the 
ScanBatch/IterOutcome problems underlying DRILL-2288.

    Example case:
    - There are three JSON files.  The first and last to be read have the same 
schema.  The middle file is empty.  They are all read by the same ScanBatch.
    - The first JSON reader sets map fields.
    - The second JSON reader sees no rows, so its atLeastOneWrite flag isn't 
set, so its ensureAtLeastOneField() thinks it needs to add a field, but 
forcibly sets the first field to be a nullable-int field--/regardless /of 
whether a first field exists, so it changes what the first reader set it to.
    - Then, somewhere in and/or downstream of the third reader (with 
in-progress ScanBatch fixes in place), Drill gets incompatible-vector errors 
(mentioning the second reader's NullableIntVector vs. the original reader's 
type for the first field) and/or schema-change-not-supported errors because 
ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really 
change (between the first and third JSON files).


    Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or 
unsupported-schema-change errors, and did not cause any new errors in the 
java-exec unit tests. (I haven't checked other tests yet.)

    Also, ensureAtLeastOneField() (or next()) has a comment about making sure 
there's a field in order to return a record count, but next() returns the 
record count.

    Those two things make me wonder if ensureAtLeastOneField()is now obsolete.

    Can it be deleted now?

    Or is it the case that it is still needed, but it needs to check whether 
there are already any fields before (currently blindly) creating one?


    Thanks,
    Daniel

-- Daniel Barclay
    MapR Technologies



--
Daniel Barclay
MapR Technologies

Reply via email to