spark history server + yarn log aggregation issue

2015-09-09 Thread michael.england
Hi, I am running Spark-on-YARN on a secure cluster with yarn log aggregation set up. Once a job completes, when viewing stdout/stderr executor logs in the Spark history server UI it redirects me to the local nodemanager where a page appears for a second saying ‘Redirecting to log server….’ and

Spark-on-YARN LOCAL_DIRS location

2015-08-26 Thread michael.england
Hi, I am having issues with /tmp space filling up during Spark jobs because Spark-on-YARN uses the yarn.nodemanager.local-dirs for shuffle space. I noticed this message appears when submitting Spark-on-YARN jobs: WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the

RE: Cleaning up spark.local.dir automatically

2015-01-13 Thread michael.england
That’s really useful, thanks. From: Andrew Ash [mailto:and...@andrewash.com] Sent: 09 January 2015 22:42 To: England, Michael (IT/UK) Cc: raghavendra.pan...@gmail.com; user Subject: Re: Cleaning up spark.local.dir automatically That's a worker setting which cleans up the files left behind by

RE: Cleaning up spark.local.dir automatically

2015-01-09 Thread michael.england
Thanks, I imagine this will kill any cached RDDs if their files are beyond the ttl? Thanks From: Raghavendra Pandey [mailto:raghavendra.pan...@gmail.com] Sent: 09 January 2015 15:29 To: England, Michael (IT/UK); user@spark.apache.org Subject: Re: Cleaning up spark.local.dir automatically You

RE: Spark History Server can't read event logs

2015-01-09 Thread michael.england
Hi Marcelo, On MapR, the mapr user can read the files using the NFS mount, however using the normal hadoop fs -cat /... command, I get permission denied. As the history server is pointing to a location on mapfs, not the NFS mount, I'd imagine the Spark history server is trying to read the

Cleaning up spark.local.dir automatically

2015-01-09 Thread michael.england
Hi, Is there a way of automatically cleaning up the spark.local.dir after a job has been run? I have noticed a large number of temporary files have been stored here and are not cleaned up. The only solution I can think of is to run some sort of cron job to delete files older than a few days. I

RE: Spark History Server can't read event logs

2015-01-08 Thread michael.england
Hi Vanzin, I am using the MapR distribution of Hadoop. The history server logs are created by a job with the permissions: drwxrwx--- - myusername mygroup 2 2015-01-08 09:14 /apps/spark/historyserver/logs/spark-1420708455212 However, the permissions of the higher directories

FW: No APPLICATION_COMPLETE file created in history server log location upon pyspark job success

2015-01-07 Thread michael.england
Hi, I am currently running pyspark jobs against Spark 1.1.0 on YARN. When I run example Java jobs such as spark-pi, the following files get created: bash-4.1$ tree spark-pi-1420624364958 spark-pi-1420624364958 âââ APPLICATION_COMPLETE âââ EVENT_LOG_1 âââ SPARK_VERSION_1.1.0 0 directories, 3

RE: FW: No APPLICATION_COMPLETE file created in history server log location upon pyspark job success

2015-01-07 Thread michael.england
Thanks Andrew, simple fix ☺. From: Andrew Ash [mailto:and...@andrewash.com] Sent: 07 January 2015 15:26 To: England, Michael (IT/UK) Cc: user Subject: Re: FW: No APPLICATION_COMPLETE file created in history server log location upon pyspark job success Hi Michael, I think you need to

Spark History Server can't read event logs

2015-01-07 Thread michael.england
Hi, When I run jobs and save the event logs, they are saved with the permissions of the unix user and group that ran the spark job. The history server is run as a service account and therefore can’t read the files: Extract from the History server logs: 2015-01-07 15:37:24,3021 ERROR Client