Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20316#discussion_r162519101
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java
---
@@ -49,18 +49,8 @@
* After creating, `initialize` and `initBatch` should be called
sequentially.
*/
public class OrcColumnarBatchReader extends RecordReader<Void,
ColumnarBatch> {
-
- /**
- * The default size of batch. We use this value for ORC reader to make
it consistent with Spark's
- * columnar batch, because their default batch sizes are different like
the following:
- *
- * - ORC's VectorizedRowBatch.DEFAULT_SIZE = 1024
- * - Spark's ColumnarBatch.DEFAULT_BATCH_SIZE = 4 * 1024
- */
- private static final int DEFAULT_SIZE = 4 * 1024;
-
- // ORC File Reader
- private Reader reader;
+ // TODO: make this configurable.
--- End diff --
The comment is not valid now. Spark `ColumnarBatch` doesn't have a default
size now, the reader just need to decide the capacity itself.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]