[ https://issues.apache.org/jira/browse/DRILL-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590294#comment-16590294 ]
Karthikeyan Manivannan commented on DRILL-6688: ----------------------------------------------- [~ben-zvi] please review the PR. > Data batches for Project operator exceed the maximum specified > -------------------------------------------------------------- > > Key: DRILL-6688 > URL: https://issues.apache.org/jira/browse/DRILL-6688 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Affects Versions: 1.14.0 > Reporter: Robert Hou > Assignee: Karthikeyan Manivannan > Priority: Major > Fix For: 1.15.0 > > > I ran this query: > alter session set `drill.exec.memory.operator.project.output_batch_size` = > 131072; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.width.max_per_query` = 1; > select > chr(101) CharacterValuea, > chr(102) CharacterValueb, > chr(103) CharacterValuec, > chr(104) CharacterValued, > chr(105) CharacterValuee > from dfs.`/drill/testdata/batch_memory/character5_1MB.parquet`; > The output has 1024 identical lines: > e f g h i > There is one incoming batch: > 2018-08-09 15:50:14,794 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG > o.a.d.e.p.i.p.ProjectMemoryManager - BATCH_STATS, incoming: Batch size: > { Records: 60000, Total size: 0, Data size: 300000, Gross row width: 0, Net > row width: 5, Density: 0% } > Batch schema & sizes: > { `_DEFAULT_COL_TO_READ_`(type: OPTIONAL INT, count: 60000, Per entry: std > data size: 4, std net size: 5, actual data size: 4, actual net size: 5 > Totals: data size: 240000, net size: 300000) } > } > There are four outgoing batches. All are too large. The first three look like > this: > 2018-08-09 15:50:14,799 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG > o.a.d.e.p.i.p.ProjectRecordBatch - BATCH_STATS, outgoing: Batch size: > { Records: 16383, Total size: 0, Data size: 409575, Gross row width: 0, Net > row width: 25, Density: 0% } > Batch schema & sizes: > { CharacterValuea(type: REQUIRED VARCHAR, count: 16383, Per entry: std data > size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: > data size: 16383, net size: 81915) } > CharacterValueb(type: REQUIRED VARCHAR, count: 16383, Per entry: std data > size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: > data size: 16383, net size: 81915) } > CharacterValuec(type: REQUIRED VARCHAR, count: 16383, Per entry: std data > size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: > data size: 16383, net size: 81915) } > CharacterValued(type: REQUIRED VARCHAR, count: 16383, Per entry: std data > size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: > data size: 16383, net size: 81915) } > CharacterValuee(type: REQUIRED VARCHAR, count: 16383, Per entry: std data > size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: > data size: 16383, net size: 81915) } > } > The last batch is smaller because it has the remaining records. > The data size (409575) exceeds the maximum batch size (131072). > character415.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)