Boaz Ben-Zvi created DRILL-4961:
-----------------------------------

             Summary: Schema change error due to a missing column in a Json file
                 Key: DRILL-4961
                 URL: https://issues.apache.org/jira/browse/DRILL-4961
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.8.0
            Reporter: Boaz Ben-Zvi


A missing column in a batch defaults to a (hard coded) nullable INT (e.g., see 
line 128 in ExpressionTreeMaterializer.java), which can cause a schema conflict 
when that column in another batch has a conflicting type (e.g. VARCHAR).

To recreate (the following test also created DRILL-4960 ; which may be related) 
:  Run a parallel aggregation over two small Json files (e.g. copy twice 
contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a 
whole column was eliminated (e.g. "last_name").

0: jdbc:drill:zk=local> alter session set planner.slice_target = 1;
+-------+--------------------------------+
|  ok   |            summary             |
+-------+--------------------------------+
| true  | planner.slice_target updated.  |
+-------+--------------------------------+
1 row selected (0.091 seconds)
0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp` 
group by first_name, last_name;
Error: SYSTEM ERROR: SchemaChangeException: Incoming batches for merging 
receiver have different schemas!

Fragment 1:0

[Error Id: 1315ddc5-5c31-404f-917b-c7a082d016cf on 10.250.57.63:31010] 
(state=,code=0)

The above used a streaming aggregation; when switching to hash aggregation the 
same error manifests differently:

0: jdbc:drill:zk=local> alter session set `planner.enable_streamagg` = false;
+-------+------------------------------------+
|  ok   |              summary               |
+-------+------------------------------------+
| true  | planner.enable_streamagg updated.  |
+-------+------------------------------------+
1 row selected (0.083 seconds)
0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp` 
group by first_name, last_name;
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was 
holding vector class org.apache.drill.exec.vector.NullableVarCharVector, field= 
last_name(VARCHAR:OPTIONAL)[$bits$(UINT1:REQUIRED), 
last_name(VARCHAR:OPTIONAL)[$offsets$(UINT4:REQUIRED)]] 

Fragment 2:0

[Error Id: 58daaaa0-3bfe-4197-b4bd-44f9d7604d77 on 10.250.57.63:31010] 
(state=,code=0)
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to