[ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804637#comment-14804637 ]
Steven Phillips commented on DRILL-3229: ---------------------------------------- Basic design outline: A Union type represents a field where the type can vary between records. The data for a field of type Union will be stored in a UnionVector. h4. UnionVector Internally uses a MapVector to hold the vectors for the various types. The types include all of the MinorTypes, including List and Map. For example, the internal MapVector will have a subfield named "bigInt", which will refer to a NullableBigIntVector. In addition to the vectors corresponding to the minor types, there will be two additional fields, both represented by UInt1Vectors. These are "bits" and "types", which will represent the nullability and types of the underlying data. The "bits" vector will work the same way it works in other nullable vectors. The "types" vector will store the number corresponding to the value of the MinorType as defined in the protobuf definition. There will be mutator methods for setting null and type. h4. UnionWriter The UnionWriter implements and overwrites all of the methods of FieldWriter. It holds field writers corresponding to each of the types included in the underly UnionVector, and delegates the method calls for each type to the corresponding writer. For example, the BigIntWriter interface: {code} public interface BigIntWriter extends BaseWriter { public void write(BigIntHolder h); public void writeBigInt(long value); } {code} UnionWriter overwrites these methods: {code} @Override public void writeBigInt(long value) { data.getMutator().setType(idx(), MinorType.BIGINT); data.getMutator().setNotNull(idx()); getBigIntWriter().setPosition(idx()); getBigIntWriter().writeBigInt(value); } @Override public void writeBigInt(BigIntHolder h) { data.getMutator().setType(idx(), MinorType.BIGINT); data.getMutator().setNotNull(idx()); getBigIntWriter().setPosition(idx()); getBigIntWriter().writeBigInt(holder.value); } {code} This requires users of the interface to go through the UnionWriter, rather than using the underlying BigIntWriter directly. Otherwise, the "type" and "bits" vector would not get set correctly. h4. UnionReader Much the same as the UnionWriter, the UnionReader overwrites the methods of FieldReader, and delegates to a corresponding specific FieldReader implementation depending on which type the current value is. h4. UnionListVector UnionListVector extends BaseRepeatedVector. It works much the same as other Repeated vectors; there is a data vector and an offset vector. The data vector in this case is a UnionVector. h4. UnionListWriter The UnionListWriter overrides all FieldWriter methods. When starting a new list, the startList() method is called. This calls the startNewValue(int index) method of the underlying UnionListVector.Mutator. Subsequent calls to the ListWriter methods (such as bigint()), return the UnionListWriter itself, and calls to write are handled by calling the appropriate method on the underlying UnionListVector.Mutator, which handles updating the offset vector. In the case that the map() method is called (i.e. repeated map), the UnionListWriter is itself returned, but a state variable is updated to indicate that it should oeprate as a MapWriter. While in MapWriter mode, calls to the MapWriter methods will also return the UnionListWriter itself, but will also update the field indicating what the name of the current field is. Subsequent writes to the ScalarWriter methods will write to the underlying UnionVector using the UnionWriter interface. For example, {code} UnionListWriter list; ... list.startList(); list.map().bigInt("a").writeBigInt(1); {code} This code first indicates that a new list is starting. By doing this, the offset vector is correctly set. Calling map() sets the internal state of the writer to "MAP". bigInt("a") sets the current field of the writer to "a", and writeBigInt(1) writes the value 1 to the underlying UnionVector. Another example: {code} MapWriter mapWriter = list.map().map("a") {code} In this case, the final call to map("a") delegates to the underlying UnionWriter, and returns a new MapWriter, with the position set according to the current offset. > Create a new EmbeddedVector > --------------------------- > > Key: DRILL-3229 > URL: https://issues.apache.org/jira/browse/DRILL-3229 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill > Reporter: Jacques Nadeau > Assignee: Steven Phillips > Fix For: Future > > > Embedded Vector will leverage a binary encoding for holding information about > type for each individual field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)