[ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804637#comment-14804637
 ] 

Steven Phillips commented on DRILL-3229:
----------------------------------------

Basic design outline:

A Union type represents a field where the type can vary between records. The 
data for a field of type Union will be stored in a UnionVector.

h4. UnionVector
        Internally uses a MapVector to hold the vectors for the various types. 
The types include all of the MinorTypes, including List and Map.
        For example, the internal MapVector will have a subfield named 
"bigInt", which will refer to a NullableBigIntVector.

        In addition to the vectors corresponding to the minor types, there will 
be two additional fields, both represented by UInt1Vectors. These are
        "bits" and "types", which will represent the nullability and types of 
the underlying data. The "bits" vector will work the same way it works in other
        nullable vectors. The "types" vector will store the number 
corresponding to the value of the MinorType as defined in the protobuf 
definition. There
        will be mutator methods for setting null and type.

h4. UnionWriter
        The UnionWriter implements and overwrites all of the methods of 
FieldWriter. It holds field writers corresponding to each of the types included 
in the underly
        UnionVector, and delegates the method calls for each type to the 
corresponding writer. For example, the BigIntWriter interface:

{code}
public interface BigIntWriter extends BaseWriter {
  public void write(BigIntHolder h);

  public void writeBigInt(long value);
}
{code}
        UnionWriter overwrites these methods:

{code}
@Override
  public void writeBigInt(long value) {
    data.getMutator().setType(idx(), MinorType.BIGINT);
    data.getMutator().setNotNull(idx());
    getBigIntWriter().setPosition(idx());
    getBigIntWriter().writeBigInt(value);
  }

@Override
  public void writeBigInt(BigIntHolder h) {
    data.getMutator().setType(idx(), MinorType.BIGINT);
    data.getMutator().setNotNull(idx());
    getBigIntWriter().setPosition(idx());
    getBigIntWriter().writeBigInt(holder.value);
  }
{code}

        This requires users of the interface to go through the UnionWriter, 
rather than using the underlying BigIntWriter directly. Otherwise, the "type" 
and "bits" vector would not get set correctly.

h4. UnionReader
        Much the same as the UnionWriter, the UnionReader overwrites the 
methods of FieldReader, and delegates to a corresponding specific FieldReader 
implementation depending on which type 
        the current value is.

h4. UnionListVector
        UnionListVector extends BaseRepeatedVector. It works much the same as 
other Repeated vectors; there is a data vector and an offset vector. The data 
vector in this case is a UnionVector.

h4. UnionListWriter
        The UnionListWriter overrides all FieldWriter methods. When starting a 
new list, the startList() method is called. This calls the startNewValue(int 
index) method
        of the underlying UnionListVector.Mutator. Subsequent calls to the 
ListWriter methods (such as bigint()), return the UnionListWriter itself, and 
calls to write are handled by calling
        the appropriate method on the underlying UnionListVector.Mutator, which 
handles updating the offset vector.

        In the case that the map() method is called (i.e. repeated map), the 
UnionListWriter is itself returned, but a state variable is updated to indicate 
that it should oeprate as a MapWriter.
        While in MapWriter mode, calls to the MapWriter methods will also 
return the UnionListWriter itself, but will also update the field indicating 
what the name of the current field is.
        Subsequent writes to the ScalarWriter methods will write to the 
underlying UnionVector using the UnionWriter interface.

        For example,

{code}
UnionListWriter list;
...

list.startList();
list.map().bigInt("a").writeBigInt(1);
{code}

        This code first indicates that a new list is starting. By doing this, 
the offset vector is correctly set. Calling map() sets the internal state of 
the writer to "MAP". bigInt("a") sets the current
        field of the writer to "a", and writeBigInt(1) writes the value 1 to 
the underlying UnionVector.
        Another example:

{code}
MapWriter mapWriter = list.map().map("a")
{code}

        In this case, the final call to map("a") delegates to the underlying 
UnionWriter, and returns a new MapWriter, with the position set according to 
the current offset.

> Create a new EmbeddedVector
> ---------------------------
>
>                 Key: DRILL-3229
>                 URL: https://issues.apache.org/jira/browse/DRILL-3229
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>            Reporter: Jacques Nadeau
>            Assignee: Steven Phillips
>             Fix For: Future
>
>
> Embedded Vector will leverage a binary encoding for holding information about 
> type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to