H Xin Liu, Could you provide a concrete user case if possible(code to reproduce protobuf object and comparisons between protobuf and normal object)?
I contributed a bit to SizeEstimator years ago, and to my understanding, the time complexity should be O(N) where N is the num of referenced fields recursively. We should definitely investigate this case if it indeed takes a lot of time on protobuf objects. > On 27 Feb 2018, at 8:47 AM, Xin Liu <xin.e....@gmail.com> wrote: > > Hi folks, > > We have a situation where, shuffled data is protobuf based, and SizeEstimator > is taking a lot of time. > > We have tried to override SizeEstimator to return a constant value, which > speeds up things a lot. > > My questions, what is the side effect of disabling SizeEstimator? Is it just > spark do memory reallocation, or there is more severe consequences? > > Thanks!