H Xin Liu, 

Could you provide a concrete user case if possible(code to reproduce protobuf 
object and comparisons between  protobuf and normal object)?

I contributed a bit to SizeEstimator years ago, and to my understanding, the 
time complexity should be O(N) where N is the num of referenced fields 
recursively.

We should definitely investigate this case if it indeed takes a lot of time on 
protobuf objects.

> On 27 Feb 2018, at 8:47 AM, Xin Liu <xin.e....@gmail.com> wrote:
> 
> Hi folks,
> 
> We have a situation where, shuffled data is protobuf based, and SizeEstimator 
> is taking a lot of time.
> 
> We have tried to override SizeEstimator to return a constant value, which 
> speeds up things a lot.
> 
> My questions, what is the side effect of disabling SizeEstimator? Is it just 
> spark do memory reallocation, or there is more severe consequences?
> 
> Thanks!

Reply via email to