viirya commented on code in PR #213: URL: https://github.com/apache/arrow-datafusion-comet/pull/213#discussion_r1550100500
########## common/src/main/scala/org/apache/spark/sql/comet/util/Utils.scala: ########## @@ -161,4 +173,84 @@ object Utils { toArrowField(field.name, field.dataType, field.nullable, timeZoneId) }.asJava) } + + /** + * Serializes a list of `ColumnarBatch` into an output stream. This method must be in `spark` + * package because `ChunkedByteBufferOutputStream` is spark private class. As it uses Arrow + * classes, it must be in `common` module. + * + * @param batches + * the output batches, each batch is a list of Arrow vectors wrapped in `CometVector` + * @param out + * the output stream + */ + def serializeBatches(batches: Iterator[ColumnarBatch]): Iterator[(Long, ChunkedByteBuffer)] = { + batches.map { batch => + val dictionaryProvider: CDataDictionaryProvider = new CDataDictionaryProvider + + val codec = CompressionCodec.createCodec(SparkEnv.get.conf) + val cbbos = new ChunkedByteBufferOutputStream(1024 * 1024, ByteBuffer.allocate) Review Comment: I need to move `serializeBatches` into `spark` package because `ChunkedByteBufferOutputStream` is a spark private class. I cannot move `serializeBatches` to `spark` module because it uses arrow packages (we shade arrow in `common` module). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org