[ https://issues.apache.org/jira/browse/SPARK-16123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343139#comment-15343139 ]
Apache Spark commented on SPARK-16123: -------------------------------------- User 'sameeragarwal' has created a pull request for this issue: https://github.com/apache/spark/pull/13832 > Avoid NegativeArraySizeException while reserving additional capacity in > VectorizedColumnReader > ---------------------------------------------------------------------------------------------- > > Key: SPARK-16123 > URL: https://issues.apache.org/jira/browse/SPARK-16123 > Project: Spark > Issue Type: Bug > Reporter: Sameer Agarwal > > Both off-heap and on-heap variants of ColumnVector.reserve() can > unfortunately overflow while reserving additional capacity during reads. > {code} > Caused by: java.lang.NegativeArraySizeException > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.reserveInternal(OnHeapColumnVector.java:461) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.reserve(OnHeapColumnVector.java:397) > at > org.apache.spark.sql.execution.vectorized.ColumnVector.appendBytes(ColumnVector.java:675) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.putByteArray(OnHeapColumnVector.java:389) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readBinary(VectorizedPlainValuesReader.java:167) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedRleValuesReader.readBinarys(VectorizedRleValuesReader.java:402) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBinaryBatch(VectorizedColumnReader.java:372) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:194) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anonfun$prepareNextFile$1.apply(FileScanRDD.scala:173) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anonfun$prepareNextFile$1.apply(FileScanRDD.scala:169) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org