Boaz Ben-Zvi created DRILL-4961:
-----------------------------------
Summary: Schema change error due to a missing column in a Json file
Key: DRILL-4961
URL: https://issues.apache.org/jira/browse/DRILL-4961
Project: Apache Drill
Issue Type: Bug
Components: Execution - Flow
Affects Versions: 1.8.0
Reporter: Boaz Ben-Zvi
A missing column in a batch defaults to a (hard coded) nullable INT (e.g., see
line 128 in ExpressionTreeMaterializer.java), which can cause a schema conflict
when that column in another batch has a conflicting type (e.g. VARCHAR).
To recreate (the following test also created DRILL-4960 ; which may be related)
: Run a parallel aggregation over two small Json files (e.g. copy twice
contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a
whole column was eliminated (e.g. "last_name").
0: jdbc:drill:zk=local> alter session set planner.slice_target = 1;
+-------+--------------------------------+
| ok | summary |
+-------+--------------------------------+
| true | planner.slice_target updated. |
+-------+--------------------------------+
1 row selected (0.091 seconds)
0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp`
group by first_name, last_name;
Error: SYSTEM ERROR: SchemaChangeException: Incoming batches for merging
receiver have different schemas!
Fragment 1:0
[Error Id: 1315ddc5-5c31-404f-917b-c7a082d016cf on 10.250.57.63:31010]
(state=,code=0)
The above used a streaming aggregation; when switching to hash aggregation the
same error manifests differently:
0: jdbc:drill:zk=local> alter session set `planner.enable_streamagg` = false;
+-------+------------------------------------+
| ok | summary |
+-------+------------------------------------+
| true | planner.enable_streamagg updated. |
+-------+------------------------------------+
1 row selected (0.083 seconds)
0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp`
group by first_name, last_name;
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was
holding vector class org.apache.drill.exec.vector.NullableVarCharVector, field=
last_name(VARCHAR:OPTIONAL)[$bits$(UINT1:REQUIRED),
last_name(VARCHAR:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]
Fragment 2:0
[Error Id: 58daaaa0-3bfe-4197-b4bd-44f9d7604d77 on 10.250.57.63:31010]
(state=,code=0)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)