How to estimate the rdd size before the rdd result is written to disk

2019-12-19 Thread zhangliyun
Hi all: i want to ask a question about how to estimate the rdd size( according to byte) when it is not saved to disk because the job spends long time if the output is very huge and output partition number is small. following step is what i can solve for this problem 1.sample 0.01 's

DSv2 sync notes - 11 December 2019

2019-12-19 Thread Ryan Blue
Hi everyone, here are my notes for the DSv2 sync last week. Sorry they’re late! Feel free to add more details or corrections. Thanks! rb *Attendees*: Ryan Blue John Zhuge Dongjoon Hyun Joseph Torres Kevin Yu Russel Spitzer Terry Kim Wenchen Fan Hyukjin Kwan Jacky Lee *Topics*: - Relation