LuciferYang commented on PR #36616: URL: https://github.com/apache/spark/pull/36616#issuecomment-1136674386
> Can you mention in the comment or PR description Updated PR description: 2. **Performance improvement**: `ConstantColumnVector` has better reading and writing performance than `OnHeapColumnVector` and `OffHeapColumnVector`. From the microbench results, the performance improvement is obvious for `StringType` : the read throughput is increased by about 2 times, and the write throughput is increased by more than 100 times. 3. **Memory saving**: `ConstantColumnVector` saves more memory than `OnHeapColumnVector` and `OffHeapColumnVector`, for `UTF8String` type Vector with length of 4096(default `batchSize`), 'ConstantColumnVector' can save more than 90% of memory compared with `OnHeapColumnVector`: - - `ConstantColumnVector` only stores an `UTF8String` - - `OnHeapColumnVector` needs `arrayOffsets(int[4096])` + `arrayLengths(int[4096])` + `UTF8String * 4096)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org