160G parquet files (ca. 30 files, snappy compressed, made by cloudera impala)

ca. 30 full table scan, took 3-5 columns out, then some normal scala
operations like substring, groupby, filter, at the end, save as file in HDFS

yarn-client mode, 23 core and 60G mem / node

but, always failed !

startup script (3 NodeManager, each an executor):




some screenshot:


<http://apache-spark-user-list.1001560.n3.nabble.com/file/n10254/spark1.png> 


<http://apache-spark-user-list.1001560.n3.nabble.com/file/n10254/spark2.png> 



i got some log like:




same job using standalone mode (3 slaves) works...

startup script (each 24 cores, 64g mem) :




any idea?

thanks a lot!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to