Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-06 Thread amarouni
You can get some more insights by using the Spark history server (http://spark.apache.org/docs/latest/monitoring.html), it can show you which task is failing and some other information that might help you debugging the issue. On 05/10/2016 19:00, Babak Alipour wrote: > The issue seems to lie in

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-02 Thread Babak Alipour
Thanks Vadim for sharing your experience, but I have tried multi-JVM setup (2 workers), various sizes for spark.executor.memory (8g, 16g, 20g, 32g, 64g) and spark.executor.core (2-4), same error all along. As for the files, these are all .snappy.parquet files, resulting from inserting some data

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Vadim Semenov
oh, and try to run even smaller executors, i.e. with `spark.executor.memory` <= 16GiB. I wonder what result you're going to get. On Sun, Oct 2, 2016 at 1:24 AM, Vadim Semenov wrote: > > Do you mean running a multi-JVM 'cluster' on the single machine? > Yes, that's

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Vadim Semenov
> Do you mean running a multi-JVM 'cluster' on the single machine? Yes, that's what I suggested. You can get some information here: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ > How would that affect performance/memory-consumption? If a multi-JVM setup can

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Babak Alipour
To add one more note, I tried running more smaller executors each with 32-64g memory and executor.cores 2-4 (with 2 workers as well) and I'm still getting the same exception: java.lang.IllegalArgumentException: Cannot allocate a page with more than 17179869176 bytes at

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Babak Alipour
Do you mean running a multi-JVM 'cluster' on the single machine? How would that affect performance/memory-consumption? If a multi-JVM setup can handle such a large input, then why can't a single-JVM break down the job into smaller tasks? I also found that SPARK-9411 mentions making the page_size

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Vadim Semenov
Run more smaller executors: change `spark.executor.memory` to 32g and `spark.executor.cores` to 2-4, for example. Changing driver's memory won't help because it doesn't participate in execution. On Fri, Sep 30, 2016 at 2:58 PM, Babak Alipour wrote: > Thank you for your

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Babak Alipour
Thank you for your replies. @Mich, using LIMIT 100 in the query prevents the exception but given the fact that there's enough memory, I don't think this should happen even without LIMIT. @Vadim, here's the full stack trace: Caused by: java.lang.IllegalArgumentException: Cannot allocate a page

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Vadim Semenov
Can you post the whole exception stack trace? What are your executor memory settings? Right now I assume that it happens in UnsafeExternalRowSorter -> UnsafeExternalSorter:insertRecord Running more executors with lower `spark.executor.memory` should help. On Fri, Sep 30, 2016 at 12:57 PM,

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Mich Talebzadeh
What will happen if you LIMIT the result set to 100 rows only -- select from order by field LIMIT 100. Will that work? How about running the whole query WITHOUT order by? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Babak Alipour
Greetings everyone, I'm trying to read a single field of a Hive table stored as Parquet in Spark (~140GB for the entire table, this single field should be just a few GB) and look at the sorted output using the following: sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC") ​But