Hi All, I am trying Spark Sql on a dataset ~16Tb with large number of files (~50K). Each file is roughly 400-500 Megs.
I am issuing a fairly simple hive query on the dataset with just filters (No groupBy's and Joins) and the job is very very slow. It runs for 7-8 hrs and processes about 80-100 Gigs on a 12 node cluster. I have experimented with different values of spark.sql.shuffle.partitions from 20 to 4000 but havn't seen lot of difference. >From the logs I have the yarn error attached at end [1]. I have got the below spark configs [2] for the job. Is there any other tuning I need to look into. Any tips would be appreciated, Thanks 2. Spark config - spark-submit --master yarn-client --driver-memory 1G --executor-memory 10G --executor-cores 5 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.initialExecutors=2 --conf spark.dynamicAllocation.minExecutors=2 1. Yarn Error: > > 16/04/07 13:05:37 INFO yarn.YarnAllocator: Container marked as failed: > container_1459747472046_1618_02_000003. Exit status: 1. Diagnostics: > Exception from container-launch. > Container id: container_1459747472046_1618_02_000003 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > at org.apache.hadoop.util.Shell.run(Shell.java:455) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > Container exited with a non-zero exit code 1