Boaz Ben-Zvi created DRILL-4961: ----------------------------------- Summary: Schema change error due to a missing column in a Json file Key: DRILL-4961 URL: https://issues.apache.org/jira/browse/DRILL-4961 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.8.0 Reporter: Boaz Ben-Zvi
A missing column in a batch defaults to a (hard coded) nullable INT (e.g., see line 128 in ExpressionTreeMaterializer.java), which can cause a schema conflict when that column in another batch has a conflicting type (e.g. VARCHAR). To recreate (the following test also created DRILL-4960 ; which may be related) : Run a parallel aggregation over two small Json files (e.g. copy twice contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a whole column was eliminated (e.g. "last_name"). 0: jdbc:drill:zk=local> alter session set planner.slice_target = 1; +-------+--------------------------------+ | ok | summary | +-------+--------------------------------+ | true | planner.slice_target updated. | +-------+--------------------------------+ 1 row selected (0.091 seconds) 0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp` group by first_name, last_name; Error: SYSTEM ERROR: SchemaChangeException: Incoming batches for merging receiver have different schemas! Fragment 1:0 [Error Id: 1315ddc5-5c31-404f-917b-c7a082d016cf on 10.250.57.63:31010] (state=,code=0) The above used a streaming aggregation; when switching to hash aggregation the same error manifests differently: 0: jdbc:drill:zk=local> alter session set `planner.enable_streamagg` = false; +-------+------------------------------------+ | ok | summary | +-------+------------------------------------+ | true | planner.enable_streamagg updated. | +-------+------------------------------------+ 1 row selected (0.083 seconds) 0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp` group by first_name, last_name; Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.NullableVarCharVector, field= last_name(VARCHAR:OPTIONAL)[$bits$(UINT1:REQUIRED), last_name(VARCHAR:OPTIONAL)[$offsets$(UINT4:REQUIRED)]] Fragment 2:0 [Error Id: 58daaaa0-3bfe-4197-b4bd-44f9d7604d77 on 10.250.57.63:31010] (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)