[
https://issues.apache.org/jira/browse/ORC-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750632#comment-16750632
]
Yulei Yang edited comment on ORC-299 at 1/24/19 2:26 AM:
---------------------------------------------------------
We met this issue in hive 1.2.1, and have resoved it by reduce the value of
orc.row.index.stride.
the root cause is that rowgroup size is too large to build a dictionary.
You can see another exception, which is caused by the same reason:
Caused by: java.lang.ArrayIndexOutOfBoundsException: xxxxx at
org.apache.hadoop.hive.ql.io.orc.DynamicByteArray.add(DynamicByteArray.java:115)
BTW, set hive.exec.orc.dictionary.key.size.threshold=0 does not work.
was (Author: noatime):
We met this issue in hive 1.2.1, and have resoved this issue by reduce the
value of orc.row.index.stride.
the root cause is that rowgroup size is too large to build a dictionary.
You can see another exception, which is caused by the same reason:
Caused by: java.lang.ArrayIndexOutOfBoundsException: xxxxx at
org.apache.hadoop.hive.ql.io.orc.DynamicByteArray.add(DynamicByteArray.java:115)
BTW, set hive.exec.orc.dictionary.key.size.threshold=0 does not work.
> Improve heuristics for bailing on dictionary encoding
> -----------------------------------------------------
>
> Key: ORC-299
> URL: https://issues.apache.org/jira/browse/ORC-299
> Project: ORC
> Issue Type: Improvement
> Reporter: Chris Drome
> Priority: Major
>
> Recently a user ran into the following failure:
> {noformat}
> Caused by: java.lang.NullPointerException at
> java.lang.System.arraycopy(Native Method) at
>
> org.apache.hadoop.hive.ql.io.orc.DynamicByteArray.add(DynamicByteArray.java:115)
> at
>
> org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.addNewKey(StringRedBlackTree.java:48)
> at
>
> org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.add(StringRedBlackTree.java:55)
> at
>
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.write(WriterImpl.java:1250)
> at
>
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1797)
> at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2469) at
>
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86)
> at
>
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at
>
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at
>
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122)
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at
>
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:110)
> at
>
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:165)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:536)
> ... 18 more
> {noformat}
>
> I tracked this down to the following in DynamicByteArray.java, which is being
> used to create the dictionary for a particular column:
> {noformat}
> private int length;
> {noformat}
>
> This has the side-effect of capping the memory available for the dictionary
> at 2GB.
>
> Given the size of column values in this use case, and the fact that the user
> is exceeding this 2GB limit, there should probably be some heuristics that
> bail early on dictionary creation, so this limitation is never reached. Given
> the size of data that would be required to hit this limit, it is unlikely
> that a dictionary would be useful.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)