LuciferYang commented on PR #36616:
URL: https://github.com/apache/spark/pull/36616#issuecomment-1136674386

   > Can you mention in the comment or PR description
   
   Updated PR description:
   
   2. **Performance improvement**: `ConstantColumnVector` has better reading 
and writing performance than `OnHeapColumnVector` and `OffHeapColumnVector`. 
From the microbench results, the performance improvement is obvious for 
`StringType` : the read throughput is increased by  about 2 times, and the 
write throughput is increased by more than 100 times.
   
   3. **Memory saving**: `ConstantColumnVector` saves more memory than 
`OnHeapColumnVector` and `OffHeapColumnVector`, for `UTF8String` type Vector 
with length of 4096(default `batchSize`), 'ConstantColumnVector' can save more 
than 90% of memory compared with `OnHeapColumnVector`:
   
   -  - `ConstantColumnVector` only stores an `UTF8String`
   -  -  `OnHeapColumnVector` needs `arrayOffsets(int[4096])` + 
`arrayLengths(int[4096])` + `UTF8String * 4096)`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to