Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-17 Thread Yin Huai
So, the second attemp of those tasks failed with NPE can complete and the job eventually finished? On Mon, Jun 15, 2015 at 10:37 PM, Night Wolf nightwolf...@gmail.com wrote: Hey Yin, Thanks for the link to the JIRA. I'll add details to it. But I'm able to reproduce it, at least in the same

Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-17 Thread Cheng Lian
What's the size of this table? Is the data skewed (so that speculation is probably triggered)? Cheng On 6/15/15 10:37 PM, Night Wolf wrote: Hey Yin, Thanks for the link to the JIRA. I'll add details to it. But I'm able to reproduce it, at least in the same shell session, every time I do a

Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-15 Thread Night Wolf
Looking at the logs of the executor, looks like it fails to find the file; e.g. for task 10323.0 15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit IOException trying to rename

Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-15 Thread Night Wolf
Hi guys, Using Spark 1.4, trying to save a dataframe as a table, a really simple test, but I'm getting a bunch of NPEs; The code Im running is very simple; qc.read.parquet(/user/sparkuser/data/staged/item_sales_basket_id.parquet).write.format(parquet).saveAsTable(is_20150617_test2) Logs of

Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-15 Thread Yin Huai
I saw it once but I was not clear how to reproduce it. The jira I created is https://issues.apache.org/jira/browse/SPARK-7837. More information will be very helpful. Were those errors from speculative tasks or regular tasks (the first attempt of the task)? Is this error deterministic (can you

Re: Spark DataFrame 1.4 write to parquet/saveAsTable tasks fail

2015-06-15 Thread Night Wolf
Hey Yin, Thanks for the link to the JIRA. I'll add details to it. But I'm able to reproduce it, at least in the same shell session, every time I do a write I get a random number of tasks failing on the first run with the NPE. Using dynamic allocation of executors in YARN mode. No speculative