Thanks Ted, I thought the avg. block size was already low and less than the
usual 128mb. If I need to reduce it further via parquet.block.size, it
would mean an increase in the number of blocks and that should increase the
number of tasks/executors. Is that the correct way to interpret this?

On Mon, May 2, 2016 at 6:21 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Please consider decreasing block size.
>
> Thanks
>
> > On May 1, 2016, at 9:19 PM, Buntu Dev <buntu...@gmail.com> wrote:
> >
> > I got a 10g limitation on the executors and operating on parquet dataset
> with block size 70M with 200 blocks. I keep hitting the memory limits when
> doing a 'select * from t1 order by c1 limit 1000000' (ie, 1M). It works if
> I limit to say 100k. What are the options to save a large dataset without
> running into memory issues?
> >
> > Thanks!
>

Reply via email to