Steven Phillips created DRILL-1781:
--------------------------------------
Summary: For complex functions, don't return until schema is known
Key: DRILL-1781
URL: https://issues.apache.org/jira/browse/DRILL-1781
Project: Apache Drill
Issue Type: Bug
Reporter: Steven Phillips
In the case of complex output functions, it is impossible to determine the
output schema until the actual data is consumed. For example, with
convert_form(VARCHAR, 'json'), unlike most other functions, it is not
sufficient to know that the incoming data type is VARCHAR, we actually need to
decode the contents of the record before we can determine what the output type
is, whether it be map, list, or primitive type.
For fast schema return, we worked around this problem by simply assuming the
type was Map, and if it happened to be different, there would be a schema
change. This solution is not satisfactory, as it ends up breaking other
functions, like flatten.
The solution is to continue returning a schema whenever possible, but when it
is not possible, drill will wait until it is.
For non-blocking operators, drill will immediately consume the incoming batch,
and thus will not return empty schema batches if there is data to consume.
Blocking operators will return an empty schema batch. If a flattten function
occurs downstream from a blocking operator, it will not be able to return a
schema, and thus fast schema return will not happen in this case.
In the cases where the complex function is not downstream from a blocking
operator, fast schema return should continue to work.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)