Paul Rogers created DRILL-7428:
----------------------------------
Summary: Drill incorrectly allows a repeated map field to be
projected to top level
Key: DRILL-7428
URL: https://issues.apache.org/jira/browse/DRILL-7428
Project: Apache Drill
Issue Type: Bug
Reporter: Paul Rogers
Consider the following query from the [Mongo DB
tests|https://github.com/apache/drill/blob/master/contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/MongoTestConstants.java#L80]:
{noformat}
select t.name as name, t.topping.type as type
from mongo.%s.`%s` t where t.sales >= 150
{noformat}
The query is used in
[{{TestMongoQueries.testUnShardedDBInShardedClusterWithProjectionAndFilter()}}|https://github.com/apache/drill/blob/master/contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/TestMongoQueries.java#L89].
Here it turns out that {{topping}} is a repeated map. The query is projecting
the members of that map to the top level. The query has five rows, but 24
values in the repeated map. The Project operator allows the projection,
resulting in an output batch in which most vectors have 5 values, but the
{{topping}} column, now at the top level and no longer in the map, has 24
values.
As a result, the first five values, formerly associated with the first record,
are now associated with the first five top-level records, while the values
formerly associated with records 1-4 are lost.
Thus, this is a data corruption bug.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)