[ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907299#comment-14907299
 ] 

Parth Chandra commented on DRILL-3229:
--------------------------------------

The union type looks good (haven't delved into the UnionListVector, though it 
doesn't look too far removed from the UnionVector). I'm missing some details - 
i) When do we create a Union type?
ii) The Union Vector will have a map vector which will have a fields for each 
minor type. The fields will be nullable vectors of the corresponding minor 
type. For a given value, only one of the value vectors will have the bits field 
set. Is my understanding correct? A picture would be a big help.

More importantly, can we write up a couple of notes on the big picture so I can 
see where this fits in?  For instance, it is not clear in what cases we plan to 
use this. There are different use cases where changing schema is encountered.  
For instance, a large number of nulls followed by a schema that materializes is 
one frequently encountered case. The other common case is that of a primitive 
type that appears within quotes in a particular record and gets interpreted as 
a varchar. More complex cases can occur that have the same information 
represented differently eg a timestamp that is written either as as string or 
as a long. (I'm not yet considering the rather extreme example in the yelp data 
set where a null field shows up as an empty map). Which of these types of cases 
are we addressing with UnionVectors? 

Also, one question I've never resolved in my own mind is that of FieldMetadata. 
Does a ValueVector require FieldMetadata to describe it's structure? Or is it 
the other way around: FieldMetadata can be derived from the ValueVector. Either 
way, how do we define FieldMetadata for Union types? What is the impact on 
ODBC/JDBC, if any? 

Would a shared doc be a better way to discuss this? Then we can consolidate and 
add the result to https://drill.apache.org/docs/value-vectors/.



> Create a new EmbeddedVector
> ---------------------------
>
>                 Key: DRILL-3229
>                 URL: https://issues.apache.org/jira/browse/DRILL-3229
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>            Reporter: Jacques Nadeau
>            Assignee: Steven Phillips
>             Fix For: Future
>
>
> Embedded Vector will leverage a binary encoding for holding information about 
> type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to