[ 
https://issues.apache.org/jira/browse/DRILL-7502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007933#comment-17007933
 ] 

ASF GitHub Bot commented on DRILL-7502:
---------------------------------------

paul-rogers commented on issue #1945: DRILL-7502: Invalid codegen for typeof() 
with UNION
URL: https://github.com/apache/drill/pull/1945#issuecomment-570762571
 
 
   Another thing to note is that we may want to go though one more round of 
sorting out the type functions. In normal SQL, there is just one level of type: 
a columns is an `INT` or a `DOUBLE`.
   
   In Sqllite (IIRC) there is the `Variant` type: so a column can be `Variant` 
but the values can be `INT` or `DOUBLE`. The "variant" name comes from Visual 
Basic. In Drill, the variant is called `UNION`.
   
   In Drill, we also have structured types: `ARRAY<INTEGER>` or 
`DICT<STRING,DOUBLE>`.
   
   We've been trying to handle all these cases with originally one function 
(`typeof()`), then later three (adding `sqlTypeOf()` and `drillTypeOf()`).
   
   We probably want four variations. Two forms of the type:
   
   * The full type description: `ARRAY<ARRAY<MAP<a: INTEGER, b: VARCHAR>>>` 
say, or `UNION<DOUBLE, INTEGER>`.
   * The short type of the vector itself: `LIST`, `UNION`.
   
   For Variant types (`UNION`, `LIST`) we then want two forms:
   
   * The vector itself. (`DICT`, `MAP`, `UNION`, `LIST`)
   * The value: (`INTEGER`, `DOUBLE`).
   
   Of course, the value can itself be structured: a `UNION` which has a `LIST` 
of `UNION` types. So, for both the vector and the value, we'd want both the 
simple and verbose forms.
   
   And, now that @arina-ielchiieva added the nice type description system, we 
have no function which returns the full description, including the `NOT NULL` 
attribute, as it appears in the schema file: `INTEGER NOT NULL`.
   
   The present fix gets us closer, but it is hard to solve four or five cases 
with three functions. I suspect we could do better. Suggestions welcomed!
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incorrect/invalid codegen for typeof() with UNION
> -------------------------------------------------
>
>                 Key: DRILL-7502
>                 URL: https://issues.apache.org/jira/browse/DRILL-7502
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.17.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.18.0
>
>
> The {{typeof()}} function is defined as follows:
> {code:java}
>   @FunctionTemplate(names = {"typeOf"},
>           scope = FunctionTemplate.FunctionScope.SIMPLE,
>           nulls = NullHandling.INTERNAL)
>   public static class GetType implements DrillSimpleFunc {
>     @Param
>     FieldReader input;
>     @Output
>     VarCharHolder out;
>     @Inject
>     DrillBuf buf;
>     @Override
>     public void setup() {}
>     @Override
>     public void eval() {
>       String typeName = input.getTypeString();
>       byte[] type = typeName.getBytes();
>       buf = buf.reallocIfNeeded(type.length);
>       buf.setBytes(0, type);
>       out.buffer = buf;
>       out.start = 0;
>       out.end = type.length;
>     }
>   }
> {code}
> Note that the {{input}} field is defined as {{FieldReader}} which has a 
> method called {{getTypeString()}}. As a result, the code works fine in all 
> existing tests in {{TestTypeFns}}.
> I tried to add a function to use {{typeof()}} on a column of type {{UNION}}. 
> When I did, the query failed with a compile error in generated code:
> {noformat}
> SYSTEM ERROR: CompileException: Line 42, Column 43: 
>   A method named "getTypeString" is not declared in any enclosing class nor 
> any supertype, nor through a static import
> {noformat}
> The stack trace shows the generated code; Note that the type of {{input}} 
> changes from a reader to a holder, causing code to be invalid:
> {code:java}
> public class ProjectorGen0 {
>     DrillBuf work0;
>     UnionVector vv1;
>     VarCharVector vv6;
>     DrillBuf work9;
>     VarCharVector vv11;
>     DrillBuf work14;
>     VarCharVector vv16;
>     public void doEval(int inIndex, int outIndex)
>         throws SchemaChangeException
>     {
>         {
>             UnionHolder out4 = new UnionHolder();
>             {
>                 out4 .isSet = vv1 .getAccessor().isSet((inIndex));
>                 if (out4 .isSet == 1) {
>                     vv1 .getAccessor().get((inIndex), out4);
>                 }
>             }
>             //---- start of eval portion of typeOf function. ----//
>             VarCharHolder out5 = new VarCharHolder();
>             {
>                 final VarCharHolder out = new VarCharHolder();
>                 UnionHolder input = out4;
>                 DrillBuf buf = work0;
>                 UnionFunctions$GetType_eval:
> {
>     String typeName = input.getTypeString();
>     byte[] type = typeName.getBytes();
>     buf = buf.reallocIfNeeded(type.length);
>     buf.setBytes(0, type);
>     out.buffer = buf;
>     out.start = 0;
>     out.end = type.length;
> }
> {code}
> By contrast, here is the generated code for one of the existing 
> {{TestTypeFns}} tests where things work:
> {code:java}
> public class ProjectorGen0
>     extends ProjectorTemplate
> {
>     DrillBuf work0;
>     NullableBigIntVector vv1;
>     VarCharVector vv7;
>     public ProjectorGen0() {
>         try {
>             __DRILL_INIT__();
>         } catch (SchemaChangeException e) {
>             throw new UnsupportedOperationException(e);
>         }
>     }
>     public void doEval(int inIndex, int outIndex)
>         throws SchemaChangeException
>     {
>         {
>            ..
>             //---- start of eval portion of typeOf function. ----//
>             VarCharHolder out6 = new VarCharHolder();
>             {
>                 final VarCharHolder out = new VarCharHolder();
>                 FieldReader input = new NullableIntHolderReaderImpl(out5);
>                 DrillBuf buf = work0;
>                 UnionFunctions$GetType_eval:
> {
>     String typeName = input.getTypeString();
>     byte[] type = typeName.getBytes();
>     buf = buf.reallocIfNeeded(type.length);
>     buf.setBytes(0, type);
>     out.buffer = buf;
>     out.start = 0;
>     out.end = type.length;
> }
>                 work0 = buf;
>                 out6 .start = out.start;
>                 out6 .end = out.end;
>                 out6 .buffer = out.buffer;
>             }
>             //---- end of eval portion of typeOf function. ----//
> {code}
> Notice that the {{input}} variable is of type {{FieldReader}} as expected.
> Queries that work:
> {code:java}
>     String sql = "SELECT typeof(CAST(a AS " + castType + ")) FROM (VALUES 
> (1)) AS T(a)";
>     sql = "SELECT typeof(CAST(a AS " + castType + ")) FROM 
> cp.`functions/null.json`";
>     String sql = "SELECT typeof(" + expr + ") FROM (VALUES (" + value + ")) 
> AS T(a)";
> {code}
> Query that fails:
> {code:java}
>     String sql ="SELECT typeof(a) AS t, modeof(a) as m, drilltypeof(a) AS 
> dt\n" +
>                 "FROM cp.`jsoninput/union/c.json`";
> {code}
> The queries that work all include either a CAST or constant values. The query 
> that fails works with data read from a file. Also, the queries that work use 
> scalar types, the query that fails uses the UNION type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to