Re: Performance of VectorizedRleValuesReader

2020-09-13 Thread Chang Chen
I think we can copy all encoded data into a ByteBuffer once, and unpack values in the loop while (valueIndex < this.currentCount) { // values are bit packed 8 at a time, so reading bitWidth will always work this.packer.unpack8Values(buffer, buffer.position() + valueIndex,

[VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-13 Thread Mridul Muralidharan
Hi, I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based shuffle to improve shuffle efficiency. Please take a look at: - SPIP jira: https://issues.apache.org/jira/browse/SPARK-30602 - SPIP doc:

Re: Performance of VectorizedRleValuesReader

2020-09-13 Thread Sean Owen
It certainly can't be called once - it's reading different data each time. There might be a faster way to do it, I don't know. Do you have ideas? On Sun, Sep 13, 2020 at 9:25 PM Chang Chen wrote: > > Hi export > > it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup() > >

Performance of VectorizedRleValuesReader

2020-09-13 Thread Chang Chen
Hi export it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup () case PACKED: int numGroups = header >>> 1; this.currentCount = numGroups * 8; if (this.currentBuffer.length < this.currentCount) { this.currentBuffer = new int[this.currentCount]; }