Jacques,

> ... it ... should probably check if a column is already setup in the output 
mutation and, if so, don't re-add the ensure at least one column. Thus probably 
requires a new method in output mutator (isEmpty or size)

Some questions on the various moving pieces:

 Does ensureAtLeastOneField() need to check only for the 
fieldWriter.integer(...) call, or does it also need to check for the earlier 
fieldWriter.map(...) call? (I'm not clear on what cases that loop can 
encounter.)

Would the method go (be declared) on MapWriter (re ensureAtLeastOneField()'s 
fieldWriter variable) or on ComplexWriter (re its writer parameter) (or 
something else)?

Which data should the method implementation be consulting?  For example, I see 
that SingleMapWriter has both container and fields members, and traversing down 
container (at least in the current case I'm looking at) leads to fields 
primary, secondary, and delegate (in MapWithOrdinal). Which is the primary/best 
source for whether there's any column (or subcolumn?) yet?)

Thanks,
Daniel



Jacques Nadeau wrote:

It's purpose is to make sure that a projection that finds no columns still 
produces output.  Think record {a:4,b:6}

If I say 'select c from t' ,  the reader (without this functionality) return 
zero columns (not supported).   As such, ensure at least one adds a column.  I 
believe it is still needed but should probably check if a column is already 
setup in the output mutation and, if so, don't re-add the ensure at least one 
column. Thus probably requires a new method in output mutator (isEmpty or size)

On Oct 3, 2015 7:49 PM, "Daniel Barclay" <dbarc...@maprtech.com 
<mailto:dbarc...@maprtech.com>> wrote:

    What exactly is the purpose of ensureAtLeastOneField() 
org.apache.drill.exec.vector.complex.fn.JsonReader (drill/JsonReader.java at 
fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 ยท apache/drill 
<https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)?

    Is it still needed?


    Currently, it makes the first field of a map be a nullable-integer field if 
the JSON reader reads no nows. However, it does that /regardless/ of whether 
the first field already exists from earlier readers, causing a later reader and 
ScanBatch to signal a schema change when there wasn't really a schema change.  
This is currently causing breaking in the attempt to fix the 
ScanBatch/IterOutcome problems underlying DRILL-2288.

    Example case:
    - There are three JSON files.  The first and last to be read have the same 
schema.  The middle file is empty.  They are all read by the same ScanBatch.
    - The first JSON reader sets map fields.
    - The second JSON reader sees no rows, so its atLeastOneWrite flag isn't 
set, so its ensureAtLeastOneField() thinks it needs to add a field, but 
forcibly sets the first field to be a nullable-int field--/regardless /of 
whether a first field exists, so it changes what the first reader set it to.
    - Then, somewhere in and/or downstream of the third reader (with 
in-progress ScanBatch fixes in place), Drill gets incompatible-vector errors 
(mentioning the second reader's NullableIntVector vs. the original reader's 
type for the first field) and/or schema-change-not-supported errors because 
ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really 
change (between the first and third JSON files).


    Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or 
unsupported-schema-change errors, and did not cause any new errors in the 
java-exec unit tests. (I haven't checked other tests yet.)

    Also, ensureAtLeastOneField() (or next()) has a comment about making sure 
there's a field in order to return a record count, but next() returns the 
record count.

    Those two things make me wonder if ensureAtLeastOneField()is now obsolete.

    Can it be deleted now?

    Or is it the case that it is still needed, but it needs to check whether 
there are already any fields before (currently blindly) creating one?


    Thanks,
    Daniel

-- Daniel Barclay
    MapR Technologies



--
Daniel Barclay
MapR Technologies

Reply via email to