spark history server + yarn log aggregation issue

2015-09-09 Thread michael.england
Hi,

I am running Spark-on-YARN on a secure cluster with yarn log aggregation set 
up. Once a job completes, when viewing stdout/stderr executor logs in the Spark 
history server UI it redirects me to the local nodemanager where a page appears 
for a second saying ‘Redirecting to log server….’ and then redirects me to the 
aggregated job history server log page. However, the aggregated job history 
page sends me to 
http://:/jobhistory..
 Instead of https://..., causing odd characters to appear. If you manually 
specify https:// in the URL, this works as expected, however the page 
automatically refreshes and causes this to go back to http again.

I have set the job.log.server.url property in yarn-site.xml to include https:


yarn.log.server.url
https://.domain.com:port/jobhistory/logs/
  

I know this isn’t a Spark issue specifically, but I wondered if anyone else has 
experienced this issue and knows how to get around it?

Thanks,
Mike


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



Spark-on-YARN LOCAL_DIRS location

2015-08-26 Thread michael.england
Hi,

I am having issues with /tmp space filling up during Spark jobs because 
Spark-on-YARN uses the yarn.nodemanager.local-dirs for shuffle space. I noticed 
this message appears when submitting Spark-on-YARN jobs:

WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by 
the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone 
and LOCAL_DIRS in YARN).

I can’t find much documentation on where to set the LOCAL_DIRS property. Please 
can someone advise whether this is a yarn-env.sh or a spark-env.sh property and 
whether it would then use the directory specified by this env variable as a 
shuffle area instead of the default yarn.nodemanager.local-dirs location?

Thanks,
Mike


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



RE: Cleaning up spark.local.dir automatically

2015-01-13 Thread michael.england
That’s really useful, thanks.


From: Andrew Ash [mailto:and...@andrewash.com]
Sent: 09 January 2015 22:42
To: England, Michael (IT/UK)
Cc: raghavendra.pan...@gmail.com; user
Subject: Re: Cleaning up spark.local.dir automatically

That's a worker setting which cleans up the files left behind by executors, so 
spark.cleaner.ttl isn't at the RDD level.  After 
https://issues.apache.org/jira/browse/SPARK-1860 the cleaner won't clean up 
directories left by running executors.


On Fri, Jan 9, 2015 at 7:38 AM, 
mailto:michael.engl...@nomura.com>> wrote:
Thanks, I imagine this will kill any cached RDDs if their files are beyond the 
ttl?

Thanks


From: Raghavendra Pandey 
[mailto:raghavendra.pan...@gmail.com]
Sent: 09 January 2015 15:29
To: England, Michael (IT/UK); 
user@spark.apache.org
Subject: Re: Cleaning up spark.local.dir automatically

You may like to look at spark.cleaner.ttl configuration which is infinite by 
default. Spark has that configuration to delete temp files time to time.
On Fri Jan 09 2015 at 8:34:10 PM 
mailto:michael.engl...@nomura.com>> wrote:
Hi,

Is there a way of automatically cleaning up the spark.local.dir after a job has 
been run? I have noticed a large number of temporary files have been stored 
here and are not cleaned up. The only solution I can think of is to run some 
sort of cron job to delete files older than a few days. I am currently using a 
mixture of standalone and YARN spark builds.

Thanks,
Michael


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm

This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any 

RE: Cleaning up spark.local.dir automatically

2015-01-09 Thread michael.england
Thanks, I imagine this will kill any cached RDDs if their files are beyond the 
ttl?

Thanks


From: Raghavendra Pandey [mailto:raghavendra.pan...@gmail.com]
Sent: 09 January 2015 15:29
To: England, Michael (IT/UK); user@spark.apache.org
Subject: Re: Cleaning up spark.local.dir automatically

You may like to look at spark.cleaner.ttl configuration which is infinite by 
default. Spark has that configuration to delete temp files time to time.
On Fri Jan 09 2015 at 8:34:10 PM 
mailto:michael.engl...@nomura.com>> wrote:
Hi,

Is there a way of automatically cleaning up the spark.local.dir after a job has 
been run? I have noticed a large number of temporary files have been stored 
here and are not cleaned up. The only solution I can think of is to run some 
sort of cron job to delete files older than a few days. I am currently using a 
mixture of standalone and YARN spark builds.

Thanks,
Michael


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



Cleaning up spark.local.dir automatically

2015-01-09 Thread michael.england
Hi,

Is there a way of automatically cleaning up the spark.local.dir after a job has 
been run? I have noticed a large number of temporary files have been stored 
here and are not cleaned up. The only solution I can think of is to run some 
sort of cron job to delete files older than a few days. I am currently using a 
mixture of standalone and YARN spark builds.

Thanks,
Michael



This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



RE: Spark History Server can't read event logs

2015-01-09 Thread michael.england
Hi Marcelo,

On MapR, the mapr user can read the files using the NFS mount, however using 
the normal hadoop fs -cat /... command, I get permission denied. As the history 
server is pointing to a location on mapfs, not the NFS mount, I'd imagine the 
Spark history server is trying to read the files using the hadoop api and 
therefore the permissions cause issues here.

Thanks,
Michael


-Original Message-
From: Marcelo Vanzin [mailto:van...@cloudera.com] 
Sent: 08 January 2015 19:23
To: England, Michael (IT/UK)
Cc: user@spark.apache.org
Subject: Re: Spark History Server can't read event logs

Sorry for the noise; but I just remembered you're actually using MapR (and not 
HDFS), so maybe the "3777" trick could work...

On Thu, Jan 8, 2015 at 10:32 AM, Marcelo Vanzin  wrote:
> Nevermind my last e-mail. HDFS complains about not understanding "3777"...
>
> On Thu, Jan 8, 2015 at 9:46 AM, Marcelo Vanzin  wrote:
>> Hmm. Can you set the permissions of "/apps/spark/historyserver/logs"
>> to 3777? I'm not sure HDFS respects the group id bit, but it's worth 
>> a try. (BTW that would only affect newly created log directories.)
>>
>> On Thu, Jan 8, 2015 at 1:22 AM,   wrote:
>>> Hi Vanzin,
>>>
>>> I am using the MapR distribution of Hadoop. The history server logs are 
>>> created by a job with the permissions:
>>>
>>> drwxrwx---   - 2 2015-01-08 09:14 
>>> /apps/spark/historyserver/logs/spark-1420708455212
>>>
>>> However, the permissions of the higher directories are mapr:mapr and the 
>>> user that runs Spark in our case is a unix ID called mapr (in the mapr 
>>> group). Therefore, this can't read my job event logs as shown above.
>>>
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>> -Original Message-
>>> From: Marcelo Vanzin [mailto:van...@cloudera.com]
>>> Sent: 07 January 2015 18:10
>>> To: England, Michael (IT/UK)
>>> Cc: user@spark.apache.org
>>> Subject: Re: Spark History Server can't read event logs
>>>
>>> The Spark code generates the log directory with "770" permissions. On top 
>>> of that you need to make sure of two things:
>>>
>>> - all directories up to /apps/spark/historyserver/logs/ are readable 
>>> by the user running the history server
>>> - the user running the history server belongs to the group that owns 
>>> /apps/spark/historyserver/logs/
>>>
>>> I think the code could be more explicitly about setting the group of the 
>>> generated log directories and files, but if you follow the two rules above 
>>> things should work. Also, I recommend setting 
>>> /apps/spark/historyserver/logs/ itself to "1777" so that any user can 
>>> generate logs, but only the owner (or a superuser) can delete them.
>>>
>>>
>>>
>>> On Wed, Jan 7, 2015 at 7:45 AM,   wrote:
 Hi,



 When I run jobs and save the event logs, they are saved with the 
 permissions of the unix user and group that ran the spark job. The 
 history server is run as a service account and therefore can’t read the 
 files:



 Extract from the History server logs:



 2015-01-07 15:37:24,3021 ERROR Client
 fs/client/fileclient/cc/client.cc:1009
 Thread: 1183 User does not have access to open file
 /apps/spark/historyserver/logs/spark-1420644521194

 15/01/07 15:37:24 ERROR ReplayListenerBus: Exception in parsing 
 Spark event log
 /apps/spark/historyserver/logs/spark-1420644521194/EVENT_LOG_1

 org.apache.hadoop.security.AccessControlException: Open failed for file:
 /apps/spark/historyserver/logs/spark-1420644521194/EVENT_LOG_1, error:
 Permission denied (13)



 Is there a setting which I can change that allows the files to be 
 world readable or at least by the account running the history server?
 Currently, the job appears in the History Sever UI but only states ‘>>> Started>’.



 Thanks,

 Michael


 This e-mail (including any attachments) is private and 
 confidential, may contain proprietary or privileged information and 
 is intended for the named
 recipient(s) only. Unintended recipients are strictly prohibited 
 from taking action on the basis of information in this e-mail and 
 must contact the sender immediately, delete this e-mail (and all
 attachments) and destroy any hard copies. Nomura will not accept 
 responsibility or liability for the accuracy or completeness of, or 
 the presence of any virus or disabling code in, this e-mail. If 
 verification is sought please request a hard copy. Any reference to 
 the terms of executed transactions should be treated as preliminary only 
 and subject to formal written confirmation by Nomura.
 Nomura reserves the right to retain, monitor and intercept e-mail 
 communications through its networks (subject to and in accordance 
 with applicable laws). No confidentiality or privilege is waived or 
 lost by Nomura by any mistransmission of this e-

RE: Spark History Server can't read event logs

2015-01-08 Thread michael.england
Hi Vanzin,

I am using the MapR distribution of Hadoop. The history server logs are created 
by a job with the permissions:

drwxrwx---   - 2 2015-01-08 09:14 
/apps/spark/historyserver/logs/spark-1420708455212

However, the permissions of the higher directories are mapr:mapr and the user 
that runs Spark in our case is a unix ID called mapr (in the mapr group). 
Therefore, this can't read my job event logs as shown above.


Thanks,
Michael


-Original Message-
From: Marcelo Vanzin [mailto:van...@cloudera.com] 
Sent: 07 January 2015 18:10
To: England, Michael (IT/UK)
Cc: user@spark.apache.org
Subject: Re: Spark History Server can't read event logs

The Spark code generates the log directory with "770" permissions. On top of 
that you need to make sure of two things:

- all directories up to /apps/spark/historyserver/logs/ are readable by the 
user running the history server
- the user running the history server belongs to the group that owns 
/apps/spark/historyserver/logs/

I think the code could be more explicitly about setting the group of the 
generated log directories and files, but if you follow the two rules above 
things should work. Also, I recommend setting /apps/spark/historyserver/logs/ 
itself to "1777" so that any user can generate logs, but only the owner (or a 
superuser) can delete them.



On Wed, Jan 7, 2015 at 7:45 AM,   wrote:
> Hi,
>
>
>
> When I run jobs and save the event logs, they are saved with the 
> permissions of the unix user and group that ran the spark job. The 
> history server is run as a service account and therefore can’t read the files:
>
>
>
> Extract from the History server logs:
>
>
>
> 2015-01-07 15:37:24,3021 ERROR Client 
> fs/client/fileclient/cc/client.cc:1009
> Thread: 1183 User does not have access to open file
> /apps/spark/historyserver/logs/spark-1420644521194
>
> 15/01/07 15:37:24 ERROR ReplayListenerBus: Exception in parsing Spark 
> event log 
> /apps/spark/historyserver/logs/spark-1420644521194/EVENT_LOG_1
>
> org.apache.hadoop.security.AccessControlException: Open failed for file:
> /apps/spark/historyserver/logs/spark-1420644521194/EVENT_LOG_1, error:
> Permission denied (13)
>
>
>
> Is there a setting which I can change that allows the files to be 
> world readable or at least by the account running the history server? 
> Currently, the job appears in the History Sever UI but only states ‘ Started>’.
>
>
>
> Thanks,
>
> Michael
>
>
> This e-mail (including any attachments) is private and confidential, 
> may contain proprietary or privileged information and is intended for 
> the named
> recipient(s) only. Unintended recipients are strictly prohibited from 
> taking action on the basis of information in this e-mail and must 
> contact the sender immediately, delete this e-mail (and all 
> attachments) and destroy any hard copies. Nomura will not accept 
> responsibility or liability for the accuracy or completeness of, or 
> the presence of any virus or disabling code in, this e-mail. If 
> verification is sought please request a hard copy. Any reference to 
> the terms of executed transactions should be treated as preliminary only and 
> subject to formal written confirmation by Nomura.
> Nomura reserves the right to retain, monitor and intercept e-mail 
> communications through its networks (subject to and in accordance with 
> applicable laws). No confidentiality or privilege is waived or lost by 
> Nomura by any mistransmission of this e-mail. Any reference to 
> "Nomura" is a reference to any entity in the Nomura Holdings, Inc. 
> group. Please read our Electronic Communications Legal Notice which forms 
> part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm



--
Marcelo


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



Spark History Server can't read event logs

2015-01-07 Thread michael.england
Hi,

When I run jobs and save the event logs, they are saved with the permissions of 
the unix user and group that ran the spark job. The history server is run as a 
service account and therefore can’t read the files:

Extract from the History server logs:

2015-01-07 15:37:24,3021 ERROR Client fs/client/fileclient/cc/client.cc:1009 
Thread: 1183 User does not have access to open file 
/apps/spark/historyserver/logs/spark-1420644521194
15/01/07 15:37:24 ERROR ReplayListenerBus: Exception in parsing Spark event log 
/apps/spark/historyserver/logs/spark-1420644521194/EVENT_LOG_1
org.apache.hadoop.security.AccessControlException: Open failed for file: 
/apps/spark/historyserver/logs/spark-1420644521194/EVENT_LOG_1, error: 
Permission denied (13)

Is there a setting which I can change that allows the files to be world 
readable or at least by the account running the history server? Currently, the 
job appears in the History Sever UI but only states ‘’.

Thanks,
Michael


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



RE: FW: No APPLICATION_COMPLETE file created in history server log location upon pyspark job success

2015-01-07 Thread michael.england
Thanks Andrew, simple fix ☺.


From: Andrew Ash [mailto:and...@andrewash.com]
Sent: 07 January 2015 15:26
To: England, Michael (IT/UK)
Cc: user
Subject: Re: FW: No APPLICATION_COMPLETE file created in history server log 
location upon pyspark job success

Hi Michael,

I think you need to explicitly call sc.stop() on the spark context for it to 
close down properly (this doesn't happen automatically).  See 
https://issues.apache.org/jira/browse/SPARK-2972 for more details

Andrew

On Wed, Jan 7, 2015 at 3:38 AM, 
mailto:michael.engl...@nomura.com>> wrote:
Hi,

I am currently running pyspark jobs against Spark 1.1.0 on YARN. When I run 
example Java jobs such as spark-pi, the following files get created:

bash-4.1$ tree spark-pi-1420624364958
spark-pi-1420624364958
âââ APPLICATION_COMPLETE
âââ EVENT_LOG_1
âââ SPARK_VERSION_1.1.0

0 directories, 3 files

However, when I run my pyspark job, no APPLICATION_COMPLETE file gets created.

bash-4.1$ tree pyspark-1420628130353
pyspark -1420628130353
âââ EVENT_LOG_1
âââ SPARK_VERSION_1.1.0

0 directories, 2 files

If I touch the file into this directory, it just appears as  in 
the history server UI.

I am submitting jobs using spark-submit for now:

bin/spark-submit --master yarn-client --executor-memory 4G --executor-cores 12 
--num-executors 10 –queue highpriority 


Is there a setting I am missing for this APPLICATION_COMPLETE file to be 
created when a pyspark job completes?

Thanks,
Michael

This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



FW: No APPLICATION_COMPLETE file created in history server log location upon pyspark job success

2015-01-07 Thread michael.england
Hi,

I am currently running pyspark jobs against Spark 1.1.0 on YARN. When I run 
example Java jobs such as spark-pi, the following files get created:

bash-4.1$ tree spark-pi-1420624364958
spark-pi-1420624364958
âââ APPLICATION_COMPLETE
âââ EVENT_LOG_1
âââ SPARK_VERSION_1.1.0

0 directories, 3 files

However, when I run my pyspark job, no APPLICATION_COMPLETE file gets created.

bash-4.1$ tree pyspark-1420628130353
pyspark -1420628130353
âââ EVENT_LOG_1
âââ SPARK_VERSION_1.1.0

0 directories, 2 files

If I touch the file into this directory, it just appears as  in 
the history server UI.

I am submitting jobs using spark-submit for now:

bin/spark-submit --master yarn-client --executor-memory 4G --executor-cores 12 
--num-executors 10 –queue highpriority 


Is there a setting I am missing for this APPLICATION_COMPLETE file to be 
created when a pyspark job completes?

Thanks,
Michael


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm