[jira] [Created] (SPARK-20981) Add --repositories equivalent configuration for Spark

2017-06-04 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20981:
---

 Summary: Add --repositories equivalent configuration for Spark
 Key: SPARK-20981
 URL: https://issues.apache.org/jira/browse/SPARK-20981
 Project: Spark
  Issue Type: Improvement
  Components: Spark Submit
Affects Versions: 2.2.0
Reporter: Saisai Shao
Priority: Minor


In our use case of launching Spark applications via REST APIs, there's no way 
for user to specify command line arguments, all Spark configurations are set 
through Spark configurations. For "--repositories" because there's no 
equivalent Spark configuration, so we cannot specify the custom repository 
through configuration.

So here propose to add "--repositories" equivalent configuration in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20943) Correct BypassMergeSortShuffleWriter's comment

2017-06-02 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034296#comment-16034296
 ] 

Saisai Shao commented on SPARK-20943:
-

I think the original purpose of comment is to say {{Aggregator}} and 
{{Ordering}} is not used in map side shuffle write, those {{Aggregator}} 
{{Ordering}} set in ShuffleRDD will only be used in shuffle reader side.

> Correct BypassMergeSortShuffleWriter's comment
> --
>
> Key: SPARK-20943
> URL: https://issues.apache.org/jira/browse/SPARK-20943
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Shuffle
>Affects Versions: 2.1.1
>Reporter: CanBin Zheng
>Priority: Trivial
>  Labels: starter
>
> There are some comments written in BypassMergeSortShuffleWriter.java about 
> when to select this write path, the three required conditions are described 
> as follows:  
> 1. no Ordering is specified, and
> 2. no Aggregator is specified, and
> 3. the number of partitions is less than 
>  spark.shuffle.sort.bypassMergeThreshold
> Obviously, the conditions written are partially wrong and misleading, the 
> right conditions should be:
> 1. map-side combine is false, and
> 2. the number of partitions is less than 
>  spark.shuffle.sort.bypassMergeThreshold



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20943) Correct BypassMergeSortShuffleWriter's comment

2017-06-01 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033016#comment-16033016
 ] 

Saisai Shao commented on SPARK-20943:
-

I don't think the previous comment is incorrect. In the shuffle write area, 
there's no map-side combine concept, it generalizes into aggregator and 
ordering, which means map-side combine is just one case w/ aggregator.

> Correct BypassMergeSortShuffleWriter's comment
> --
>
> Key: SPARK-20943
> URL: https://issues.apache.org/jira/browse/SPARK-20943
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Shuffle
>Affects Versions: 2.1.1
>Reporter: CanBin Zheng
>Priority: Trivial
>  Labels: starter
>
> There are some comments written in BypassMergeSortShuffleWriter.java about 
> when to select this write path, the three required conditions are described 
> as follows:  
> 1. no Ordering is specified, and
> 2. no Aggregator is specified, and
> 3. the number of partitions is less than 
>  spark.shuffle.sort.bypassMergeThreshold
> Obviously, the conditions written are partially wrong and misleading, the 
> right conditions should be:
> 1. map-side combine is false, and
> 2. the number of partitions is less than 
>  spark.shuffle.sort.bypassMergeThreshold



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Livy to enter Apache Incubator

2017-05-31 Thread Saisai Shao
+1 (non-binding)

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[jira] [Commented] (SPARK-20898) spark.blacklist.killBlacklistedExecutors doesn't work in YARN

2017-05-30 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030710#comment-16030710
 ] 

Saisai Shao commented on SPARK-20898:
-

[~tgraves], I addressed this issue in 
https://github.com/apache/spark/pull/17113 

> spark.blacklist.killBlacklistedExecutors doesn't work in YARN
> -
>
> Key: SPARK-20898
> URL: https://issues.apache.org/jira/browse/SPARK-20898
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Thomas Graves
>
> I was trying out the new spark.blacklist.killBlacklistedExecutors on YARN but 
> it doesn't appear to work.  Everytime I get:
> 17/05/26 16:28:12 WARN BlacklistTracker: Not attempting to kill blacklisted 
> executor id 4 since allocation client is not defined
> Even though dynamic allocation is on.  Taking a quick look, I think the way 
> it creates the blacklisttracker and passes the allocation client is wrong. 
> The scheduler backend is 
>  not set yet so it never passes the allocation client to the blacklisttracker 
> correctly.  Thus it will never kill.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19569) could not get APP ID and cause failed to connect to spark driver on yarn-client mode

2017-05-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017066#comment-16017066
 ] 

Saisai Shao commented on SPARK-19569:
-

[~ouyangxc.zte] In you above code you directly call 
{{client.submitApplication()}} to invoke Spark application, I assume this 
client is {{org.apache.spark.deploy.yarn.Client}}. From my understanding it is 
not allowed to directly call this class. Also if you directly using yarn#client 
to invoke Spark on YARN application, I would doubt you will probably have to do 
lots of preparation works done by SparkSubmit.

> could not  get APP ID and cause failed to connect to spark driver on 
> yarn-client mode
> -
>
> Key: SPARK-19569
> URL: https://issues.apache.org/jira/browse/SPARK-19569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
> Environment: hadoop2.7.1
> spark2.0.2
> hive2.2
>Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> From Hive's log, failed to get APP ID, so this might be the cause why the 
> driver did not start up.
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the 
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
> file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
>  } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: 
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 
> 8020 file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" 
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1486905599813_0046_02
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(root); groups 
> with view permissions: Set(); users  with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:436

[jira] [Commented] (SPARK-20772) Add support for query parameters in redirects on Yarn

2017-05-16 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013431#comment-16013431
 ] 

Saisai Shao commented on SPARK-20772:
-

I'm guessing if it is an issue of {{AmIpFilter}}, should be a yarn issue, not 
related to Spark?

> Add support for query parameters in redirects on Yarn
> -
>
> Key: SPARK-20772
> URL: https://issues.apache.org/jira/browse/SPARK-20772
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Bjorn Jonsson
>Priority: Minor
>
> Spark uses rewrites of query parameters to paths 
> (http://:4040/jobs/job?id=0 --> http://:4040/jobs/job/?id=0). 
> This works fine in local or standalone mode, but does not work on Yarn (with 
> the org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter filter), where 
> the query parameter is dropped.
> The repro steps are:
> - Start up the spark-shell and run a job
> - Try to access the job details through http://:4040/jobs/job?id=0
> - A HTTP ERROR 400 is thrown (requirement failed: missing id parameter)
> Going directly through the RM proxy works (does not cause query parameters to 
> be dropped).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Review Request 59305: Fix Livy service check and alerts script with SSL enabled

2017-05-16 Thread Saisai Shao


> On May 16, 2017, 3:01 p.m., Alejandro Fernandez wrote:
> > ambari-server/src/main/resources/common-services/SPARK/1.2.1/package/scripts/params.py
> > Line 258 (original), 259 (patched)
> > <https://reviews.apache.org/r/59305/diff/1/?file=1720721#file1720721line259>
> >
> > Does livy have an HTTPS port?

Currently Livy https use the same port as http, and Livy cannot start both 
https and http, user could only configure one.


- Saisai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59305/#review175106
-------


On May 16, 2017, 5:58 a.m., Saisai Shao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59305/
> ---
> 
> (Updated May 16, 2017, 5:58 a.m.)
> 
> 
> Review request for Ambari and Sumit Mohanty.
> 
> 
> Bugs: AMBARI-21012
> https://issues.apache.org/jira/browse/AMBARI-21012
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Current Ambari Livy script doesn't handle HTTPS/SSL enabled scenario, it will 
> always use "http" scheme to check Livy. We should change the scheme to 
> support HTTPS if SSL is enabled.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/resources/common-services/SPARK/1.2.1/package/scripts/alerts/alert_spark_livy_port.py
>  746a98e 
>   
> ambari-server/src/main/resources/common-services/SPARK/1.2.1/package/scripts/params.py
>  42396bd 
>   
> ambari-server/src/main/resources/common-services/SPARK/1.2.1/package/scripts/service_check.py
>  9d74779 
>   
> ambari-server/src/main/resources/common-services/SPARK2/2.0.0/package/scripts/alerts/alert_spark2_livy_port.py
>  44c284f 
>   
> ambari-server/src/main/resources/common-services/SPARK2/2.0.0/package/scripts/params.py
>  1df3f2f 
>   
> ambari-server/src/main/resources/common-services/SPARK2/2.0.0/package/scripts/service_check.py
>  8e7a766 
> 
> 
> Diff: https://reviews.apache.org/r/59305/diff/1/
> 
> 
> Testing
> ---
> 
> Local verification is done.
> 
> 
> Thanks,
> 
> Saisai Shao
> 
>



[jira] [Assigned] (AMBARI-21012) Livy service check fails with wire encryption setup

2017-05-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMBARI-21012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned AMBARI-21012:


Assignee: Saisai Shao

> Livy service check fails with wire encryption setup
> ---
>
> Key: AMBARI-21012
> URL: https://issues.apache.org/jira/browse/AMBARI-21012
> Project: Ambari
>  Issue Type: Bug
>Affects Versions: 2.5.1
>Reporter: Yesha Vora
>    Assignee: Saisai Shao
>
> Livy service check fails after enabling Wire encryption.  This issue exist 
> for Livy_server and Livy2_server both.
> STR:
> 1) Set up below properties in livy.conf to enable WE.
> {code}
> livy.ssl.trustStore 
> livy.ssl.trustStorePassword 
> livy.key-password 
> livy.keystore 
> livy.keystore.password {code}
> 2) Run Spark Service check.
> Spark service check with fail to validate Livy. It is using http port to 
> connect to livy. When Wire encryption is enabled, it should use https 
> protocol to connect to livy. 
> {code:title=stderr}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/service_check.py",
>  line 62, in 
> SparkServiceCheck().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 322, in execute
> method(env)
>   File 
> "/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/service_check.py",
>  line 59, in service_check
> raise Fail(format("Connection to all Livy servers failed"))
> resource_management.core.exceptions.Fail: Connection to all Livy servers 
> failed{code}
> {code:title=stdout}
> 2017-05-12 21:21:08,531 - Using hadoop conf dir: 
> /usr/hdp/current/hadoop-client/conf
> 2017-05-12 21:21:08,551 - Execute['kinit -kt 
> /etc/security/keytabs/xxx.headless.keytab xxx@XXX; '] {'user': 'spark'}
> 2017-05-12 21:21:08,683 - Execute['kinit -kt 
> /etc/security/keytabs/smokeuser.headless.keytab xxx@XXX; '] {'user': 'livy'}
> 2017-05-12 21:21:08,809 - Execute['curl -s -o /dev/null -w'%{http_code}' 
> --negotiate -u: -k https://:18481 | grep 200'] {'logoutput': True, 
> 'tries': 5, 'try_sleep': 3}
> 200
> 2017-05-12 21:21:09,010 - Execute['curl -s -o /dev/null -w'%{http_code}' 
> --negotiate -u: -k http://:8999/sessions | grep 200'] {'logoutput': True, 
> 'tries': 3, 'user': 'livy', 'try_sleep': 1}
> 2017-05-12 21:21:09,149 - Retrying after 1 seconds. Reason: Execution of 
> 'curl -s -o /dev/null -w'%{http_code}' --negotiate -u: -k 
> http://xxx:8999/sessions | grep 200' returned 1. 
> 2017-05-12 21:21:10,286 - Retrying after 1 seconds. Reason: Execution of 
> 'curl -s -o /dev/null -w'%{http_code}' --negotiate -u: -k 
> http://:8999/sessions | grep 200' returned 1.
> Command failed after 1 tries{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AMBARI-21012) Livy service check fails with wire encryption setup

2017-05-15 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/AMBARI-21012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011798#comment-16011798
 ] 

Saisai Shao commented on AMBARI-21012:
--

[~sumitmohanty], please help to review, thanks!

> Livy service check fails with wire encryption setup
> ---
>
> Key: AMBARI-21012
> URL: https://issues.apache.org/jira/browse/AMBARI-21012
> Project: Ambari
>  Issue Type: Bug
>Affects Versions: 2.5.1
>Reporter: Yesha Vora
>
> Livy service check fails after enabling Wire encryption.  This issue exist 
> for Livy_server and Livy2_server both.
> STR:
> 1) Set up below properties in livy.conf to enable WE.
> {code}
> livy.ssl.trustStore 
> livy.ssl.trustStorePassword 
> livy.key-password 
> livy.keystore 
> livy.keystore.password {code}
> 2) Run Spark Service check.
> Spark service check with fail to validate Livy. It is using http port to 
> connect to livy. When Wire encryption is enabled, it should use https 
> protocol to connect to livy. 
> {code:title=stderr}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/service_check.py",
>  line 62, in 
> SparkServiceCheck().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 322, in execute
> method(env)
>   File 
> "/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/service_check.py",
>  line 59, in service_check
> raise Fail(format("Connection to all Livy servers failed"))
> resource_management.core.exceptions.Fail: Connection to all Livy servers 
> failed{code}
> {code:title=stdout}
> 2017-05-12 21:21:08,531 - Using hadoop conf dir: 
> /usr/hdp/current/hadoop-client/conf
> 2017-05-12 21:21:08,551 - Execute['kinit -kt 
> /etc/security/keytabs/xxx.headless.keytab xxx@XXX; '] {'user': 'spark'}
> 2017-05-12 21:21:08,683 - Execute['kinit -kt 
> /etc/security/keytabs/smokeuser.headless.keytab xxx@XXX; '] {'user': 'livy'}
> 2017-05-12 21:21:08,809 - Execute['curl -s -o /dev/null -w'%{http_code}' 
> --negotiate -u: -k https://:18481 | grep 200'] {'logoutput': True, 
> 'tries': 5, 'try_sleep': 3}
> 200
> 2017-05-12 21:21:09,010 - Execute['curl -s -o /dev/null -w'%{http_code}' 
> --negotiate -u: -k http://:8999/sessions | grep 200'] {'logoutput': True, 
> 'tries': 3, 'user': 'livy', 'try_sleep': 1}
> 2017-05-12 21:21:09,149 - Retrying after 1 seconds. Reason: Execution of 
> 'curl -s -o /dev/null -w'%{http_code}' --negotiate -u: -k 
> http://xxx:8999/sessions | grep 200' returned 1. 
> 2017-05-12 21:21:10,286 - Retrying after 1 seconds. Reason: Execution of 
> 'curl -s -o /dev/null -w'%{http_code}' --negotiate -u: -k 
> http://:8999/sessions | grep 200' returned 1.
> Command failed after 1 tries{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Review Request 59305: Fix Livy service check and alerts script with SSL enabled

2017-05-15 Thread Saisai Shao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59305/
---

Review request for Ambari and Sumit Mohanty.


Bugs: AMBARI-21012
https://issues.apache.org/jira/browse/AMBARI-21012


Repository: ambari


Description
---

Current Ambari Livy script doesn't handle HTTPS/SSL enabled scenario, it will 
always use "http" scheme to check Livy. We should change the scheme to support 
HTTPS if SSL is enabled.


Diffs
-

  
ambari-server/src/main/resources/common-services/SPARK/1.2.1/package/scripts/alerts/alert_spark_livy_port.py
 746a98e 
  
ambari-server/src/main/resources/common-services/SPARK/1.2.1/package/scripts/params.py
 42396bd 
  
ambari-server/src/main/resources/common-services/SPARK/1.2.1/package/scripts/service_check.py
 9d74779 
  
ambari-server/src/main/resources/common-services/SPARK2/2.0.0/package/scripts/alerts/alert_spark2_livy_port.py
 44c284f 
  
ambari-server/src/main/resources/common-services/SPARK2/2.0.0/package/scripts/params.py
 1df3f2f 
  
ambari-server/src/main/resources/common-services/SPARK2/2.0.0/package/scripts/service_check.py
 8e7a766 


Diff: https://reviews.apache.org/r/59305/diff/1/


Testing
---

Local verification is done.


Thanks,

Saisai Shao



Re: How about the fetch the shuffle data in one same machine?

2017-05-10 Thread Saisai Shao
There is a JIRA about this thing (
https://issues.apache.org/jira/browse/SPARK-6521). In the current Spark
shuffle fetch still leverages Netty even two executors are on the same
node, but according to the test on the JIRA, the performance is close
whether to bypass network or not. From my understanding, kernel will not
transfer data into NIC if it is just a loopback communication (please
correct me if I'm wrong).

On Wed, May 10, 2017 at 5:53 PM, raintung li  wrote:

> Hi all,
>
> Now Spark only think the executorId same that fetch local file, but for
> same IP different ExecutorId will fetch using network that actually it can
> be fetch in the local Or Loopback.
>
> Apparently fetch the local file that it is fast that can use the LVS
> cache.
>
> How do you think?
>
> Regards
> -Raintung
>


[jira] [Commented] (SPARK-20658) spark.yarn.am.attemptFailuresValidityInterval doesn't seem to have an effect

2017-05-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001917#comment-16001917
 ] 

Saisai Shao commented on SPARK-20658:
-

It is mainly depends on YARN to measure the failure validity interval and how 
to define a failure AM, Spark just proxy this parameter to YARN. So if there's 
any unexpected behavior I think we should investigate on YARN part to see the 
actual behavior.

> spark.yarn.am.attemptFailuresValidityInterval doesn't seem to have an effect
> 
>
> Key: SPARK-20658
> URL: https://issues.apache.org/jira/browse/SPARK-20658
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Paul Jones
>Priority: Minor
>
> I'm running a job in YARN cluster mode using 
> `spark.yarn.am.attemptFailuresValidityInterval=1h` specified in both 
> spark-default.conf and in my spark-submit command. (This flag shows up in the 
> environment tab of spark history server, so it seems that it's specified 
> correctly). 
> However, I just had a job die with with four AM failures (three of the four 
> failures were over an hour apart). So, I'm confused as to what could be going 
> on. I haven't figured out the cause of the individual failures, so is it 
> possible that we always count certain types of failures? E.g. jobs that are 
> killed due to memory issues always count? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20605) Deprecate not used AM and executor port configuration

2017-05-04 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20605:
---

 Summary: Deprecate not used AM and executor port configuration
 Key: SPARK-20605
 URL: https://issues.apache.org/jira/browse/SPARK-20605
 Project: Spark
  Issue Type: Bug
  Components: Mesos, Spark Core, YARN
Affects Versions: 2.2.0
Reporter: Saisai Shao
Priority: Minor


After SPARK-10997, client mode Netty RpcEnv doesn't require to bind a port to 
start server, so port configurations are not used any more, here propose to 
remove these two configurations: "spark.executor.port" and "spark.am.port".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Kerberos impersonation of a Spark Context at runtime

2017-05-04 Thread Saisai Shao
Current Spark doesn't support impersonate different users at run-time.
Current Spark's proxy user is application level, which means when setting
through --proxy-user the whole application will be running with that user.

On Thu, May 4, 2017 at 5:13 PM, matd  wrote:

> Hi folks,
>
> I have a Spark application executing various jobs for different users
> simultaneously, via several Spark sessions on several threads.
>
> My customer would like to kerberize his hadoop cluster. I wonder if there
> is
> a way to configure impersonation such as each of these jobs would be ran
> with the different proxy users. From what I see in spark conf and code,
> it's
> not possible to do that at runtime for a specific context, but I'm not
> familiar with Kerberos nor with this part of Spark.
>
> Anyone can confirm/infirm this ?
>
> Mathieu
>
> (also on S.O
> http://stackoverflow.com/questions/43765044/kerberos-
> impersonation-of-a-spark-context-at-runtime)
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Kerberos-impersonation-of-a-
> Spark-Context-at-runtime-tp28651.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[jira] [Commented] (SPARK-20421) Mark JobProgressListener (and related classes) as deprecated

2017-05-04 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996301#comment-15996301
 ] 

Saisai Shao commented on SPARK-20421:
-

[~vanzin], what will you do to {{StorageStatus}}? I saw you marked this as 
deprecated, are you going to remove it or just made it private.

Several different places use this class, including UI, REST APIs, if you are 
going to remove this, will you develop an alternative for this?

> Mark JobProgressListener (and related classes) as deprecated
> 
>
> Key: SPARK-20421
> URL: https://issues.apache.org/jira/browse/SPARK-20421
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.2.1
>
>
> This class (and others) were made {{@DeveloperApi}} as part of 
> https://github.com/apache/spark/pull/648. But as part of the work in 
> SPARK-18085, I plan to get rid of a lot of that code, so we should mark these 
> as deprecated in case anyone is still trying to use them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20517) Download link in history server UI is not correct

2017-04-27 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20517:
---

 Summary: Download link in history server UI is not correct
 Key: SPARK-20517
 URL: https://issues.apache.org/jira/browse/SPARK-20517
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0, 2.2.0
Reporter: Saisai Shao
Priority: Minor


The download link in history server UI is concatenated with:

{code}
  Download
{code}

Here {{num}} filed represents number of attempts, this is equal to REST APIs. 
In the REST API, if attempt id is not existed, then {{num}} field should be 
empty, otherwise this {{num}} field should actually be {{attemptId}}.

This will lead to the issue of "no such app", rather than correctly download 
the event log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20517) Download link in history server UI is not correct

2017-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-20517:

Description: 
The download link in history server UI is concatenated with:

{code}
  Download
{code}

Here {{num}} filed represents number of attempts, this is not equal to REST 
APIs. In the REST API, if attempt id is not existed, then {{num}} field should 
be empty, otherwise this {{num}} field should actually be {{attemptId}}.

This will lead to the issue of "no such app", rather than correctly download 
the event log.

  was:
The download link in history server UI is concatenated with:

{code}
  Download
{code}

Here {{num}} filed represents number of attempts, this is equal to REST APIs. 
In the REST API, if attempt id is not existed, then {{num}} field should be 
empty, otherwise this {{num}} field should actually be {{attemptId}}.

This will lead to the issue of "no such app", rather than correctly download 
the event log.


> Download link in history server UI is not correct
> -
>
> Key: SPARK-20517
> URL: https://issues.apache.org/jira/browse/SPARK-20517
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> The download link in history server UI is concatenated with:
> {code}
>class="btn btn-info btn-mini">Download
> {code}
> Here {{num}} filed represents number of attempts, this is not equal to REST 
> APIs. In the REST API, if attempt id is not existed, then {{num}} field 
> should be empty, otherwise this {{num}} field should actually be 
> {{attemptId}}.
> This will lead to the issue of "no such app", rather than correctly download 
> the event log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19688) Spark on Yarn Credentials File set to different application directory

2017-04-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986546#comment-15986546
 ] 

Saisai Shao commented on SPARK-19688:
-

Does this issue exist in the latest master code, from the PR seems you submit a 
patch based on 1.6. I guess this issue could be self recovered in Spark 2.1+.

> Spark on Yarn Credentials File set to different application directory
> -
>
> Key: SPARK-19688
> URL: https://issues.apache.org/jira/browse/SPARK-19688
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, YARN
>Affects Versions: 1.6.3
>Reporter: Devaraj Jonnadula
>Priority: Minor
>
> spark.yarn.credentials.file property is set to different application Id 
> instead of actual Application Id 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Off heap memory settings and Tungsten

2017-04-24 Thread Saisai Shao
AFAIK, I don't think the off-heap memory settings is enabled automatically,
there're two configurations control the tungsten off-heap memory usage:

1. spark.memory.offHeap.enabled.
2. spark.memory.offHeap.size.



On Sat, Apr 22, 2017 at 7:44 PM, geoHeil  wrote:

> Hi,
> I wonder when to enable spark's off heap settings. Shouldn't tungsten
> enable
> these automatically in 2.1?
> http://stackoverflow.com/questions/43330902/spark-off-
> heap-memory-config-and-tungsten
>
> Regards,
> Georg
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Off-heap-memory-settings-and-Tungsten-tp28621.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[jira] [Commented] (SPARK-20426) OneForOneStreamManager occupies too much memory.

2017-04-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978415#comment-15978415
 ] 

Saisai Shao commented on SPARK-20426:
-

Currently I don't have a clear fix about this issue, I'm not sure if your 
proposal is a best choice, or there're other choices like lazy initialization 
of {{FileSegmentManagedBuffer}}. You could go ahead if you have a concrete plan.

> OneForOneStreamManager occupies too much memory.
> 
>
> Key: SPARK-20426
> URL: https://issues.apache.org/jira/browse/SPARK-20426
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.1.0
>Reporter: jin xing
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Spark jobs are running on yarn cluster in my warehouse. We enabled the 
> external shuffle service(*--conf spark.shuffle.service.enabled=true*). 
> Recently NodeManager runs OOM now and then. Dumping heap memory, we find that 
> *OneFroOneStreamManager*'s footprint is huge. NodeManager is configured with 
> 5G heap memory. While *OneForOneManager* costs 2.5G and there are 5503233 
> *FileSegmentManagedBuffer* objects. Is there any suggestions to avoid this 
> other than just keep increasing NodeManager's memory? Is it possible to stop 
> *registerStream* in OneForOneStreamManager? Thus we don't need to cache so 
> many metadatas(i.e. StreamState).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20426) OneForOneStreamManager occupies too much memory.

2017-04-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978411#comment-15978411
 ] 

Saisai Shao commented on SPARK-20426:
-

My simple doubt is that your large cluster / application requires much larger 
shuffle blocks, in this case NM requires lots of memory to keep all these 
metadata.

> OneForOneStreamManager occupies too much memory.
> 
>
> Key: SPARK-20426
> URL: https://issues.apache.org/jira/browse/SPARK-20426
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.1.0
>Reporter: jin xing
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Spark jobs are running on yarn cluster in my warehouse. We enabled the 
> external shuffle service(*--conf spark.shuffle.service.enabled=true*). 
> Recently NodeManager runs OOM now and then. Dumping heap memory, we find that 
> *OneFroOneStreamManager*'s footprint is huge. NodeManager is configured with 
> 5G heap memory. While *OneForOneManager* costs 2.5G and there are 5503233 
> *FileSegmentManagedBuffer* objects. Is there any suggestions to avoid this 
> other than just keep increasing NodeManager's memory? Is it possible to stop 
> *registerStream* in OneForOneStreamManager? Thus we don't need to cache so 
> many metadatas(i.e. StreamState).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20426) OneForOneStreamManager occupies too much memory.

2017-04-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978391#comment-15978391
 ] 

Saisai Shao commented on SPARK-20426:
-

Is your yarn cluster a large cluster, and there're bunch of Spark applications 
ran on it? Or is you Spark application a very large spark cluster? One NM's 
shuffle service may serves different spark clusters simultaneously, that 
requires to keep large amount of metadata in memory.

> OneForOneStreamManager occupies too much memory.
> 
>
> Key: SPARK-20426
> URL: https://issues.apache.org/jira/browse/SPARK-20426
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.1.0
>Reporter: jin xing
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Spark jobs are running on yarn cluster in my warehouse. We enabled the 
> external shuffle service(*--conf spark.shuffle.service.enabled=true*). 
> Recently NodeManager runs OOM now and then. Dumping heap memory, we find that 
> *OneFroOneStreamManager*'s footprint is huge. NodeManager is configured with 
> 5G heap memory. While *OneForOneManager* costs 2.5G and there are 5503233 
> *FileSegmentManagedBuffer* objects. Is there any suggestions to avoid this 
> other than just keep increasing NodeManager's memory? Is it possible to stop 
> *registerStream* in OneForOneStreamManager? Thus we don't need to cache so 
> many metadatas(i.e. StreamState).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976319#comment-15976319
 ] 

Saisai Shao commented on SPARK-20391:
-

I'm in favor of using new REST API to define memory related metrics for 
executor and don't add more fields to {{ExecutorSummary}}.

So here I will only rename this 4 newly added fields:

{code}
val onHeapMemoryUsed: Option[Long],
val offHeapMemoryUsed: Option[Long],
val maxOnHeapMemory: Option[Long],
val maxOffHeapMemory: Option[Long]
{code}

For {{maxMemory}} and {{memoryUsed}} I will leave as it was. 

We could properly define a new API {{ExecutorMemoryMetrics}} where it includes 
all the memory usage mentioned above.

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Blocker
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976298#comment-15976298
 ] 

Saisai Shao commented on SPARK-20391:
-

bq. I assume managed memory here is spark.memory.fraction on heap + 
spark.memory.offHeap.size?

{{totalManagedMemory}} should be equal to spark.memory.fraction + 
spark.memory.offHeap.size, but {{totalStorageMemory}} is no larger than 
{{totalManagedMemory}}. At beginning when there's no job running, 
{{totalStorageMemory}} == {{totalManagedMemory}}, if execution memory is 
consumed, then {{totalStorageMemory}} < {{totalManagedMemory}}. 

Here we have two problems in block manager:

1. all the tracked memory in block manager is storage memory, so we should 
clarify the naming, which is the purpose of this JIRA.
2. block manager only gets the initial snapshot of storage memory 
({{totalStorageMemory}} == {{totalManagedMemory}}). As {{totalStorageMemory}} 
is varying during runtime, so the {{memRemaining}} tracked in {{StorageStatus}} 
is not accurate. This could be addressed in another JIRA.


> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Blocker
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976145#comment-15976145
 ] 

Saisai Shao commented on SPARK-20391:
-

Thanks [~irashid] [~tgraves] for your comments.

bq. with the current naming:
bq. maxMemory is (5): the amount of memory managed by spark
bq. maxOnHeapMemory & maxOffHeapMemory are (5) divided into onheap & offheap

In the current Spark code, {{maxMemory}} actually reflects the 
{{totalStorageMemory}}, not the total managed memory, there still left amount 
of memory for execution (shuffle, tungsten) that's not counted in. So I think 
it is more precise to change to {{totalStorageMemory}}, not 
{{totalManagedMemory}}.

Also for {{maxOnHeapMemory}} and {{maxOffHeapMemory}}, would be better to 
change to {{totalOnHeapStorageMemory}} and {{totalOffHeapStorageMemory}}.

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Blocker
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974685#comment-15974685
 ] 

Saisai Shao commented on SPARK-20391:
-

[~irashid], would be grateful to hear your suggestion.

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20391:
---

 Summary: Properly rename the memory related fields in 
ExecutorSummary REST API
 Key: SPARK-20391
 URL: https://issues.apache.org/jira/browse/SPARK-20391
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Saisai Shao
Priority: Minor


Currently in Spark we could get executor summary through REST API 
{{/api/v1/applications//executors}}. The format of executor summary is:

{code}
class ExecutorSummary private[spark](
val id: String,
val hostPort: String,
val isActive: Boolean,
val rddBlocks: Int,
val memoryUsed: Long,
val diskUsed: Long,
val totalCores: Int,
val maxTasks: Int,
val activeTasks: Int,
val failedTasks: Int,
val completedTasks: Int,
val totalTasks: Int,
val totalDuration: Long,
val totalGCTime: Long,
val totalInputBytes: Long,
val totalShuffleRead: Long,
val totalShuffleWrite: Long,
val isBlacklisted: Boolean,
val maxMemory: Long,
val executorLogs: Map[String, String],
val onHeapMemoryUsed: Option[Long],
val offHeapMemoryUsed: Option[Long],
val maxOnHeapMemory: Option[Long],
val maxOffHeapMemory: Option[Long])
{code}

Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
{{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
{{maxOffHeapMemory}}.

These all 6 fields reflects the *storage* memory usage in Spark, but from the 
name of this 6 fields, user doesn't really know it is referring to *storage* 
memory or the total memory (storage memory + execution memory). This will be 
misleading.

So I think we should properly rename these fields to reflect their real 
meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20365) Not so accurate classpath format for AM and Containers

2017-04-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-20365:

Summary: Not so accurate classpath format for AM and Containers  (was: 
Inaccurate classpath format for AM and Containers)

> Not so accurate classpath format for AM and Containers
> --
>
> Key: SPARK-20365
> URL: https://issues.apache.org/jira/browse/SPARK-20365
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>    Reporter: Saisai Shao
>Priority: Minor
>
> In Spark on YARN, when configuring "spark.yarn.jars" with local jars (jars 
> started with "local" scheme), we will get inaccurate classpath for AM and 
> containers. This is because we don't remove "local" scheme when concatenating 
> classpath. It is OK to run because classpath is separated with ":" and java 
> treat "local" as a separate jar. But we could improve it to remove the scheme.
> {code}
> java.class.path = 
> /tmp/hadoop-sshao/nm-local-dir/usercache/sshao/appcache/application_1492057593145_0009/container_1492057593145_0009_01_03:/tmp/hadoop-sshao/nm-local-dir/usercache/sshao/appcache/application_1492057593145_0009/container_1492057593145_0009_01_03/__spark_conf__:/tmp/hadoop-sshao/nm-local-dir/usercache/sshao/appcache/application_1492057593145_0009/container_1492057593145_0009_01_03/__spark_libs__/*:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/activation-1.1.1.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/antlr-2.7.7.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/antlr-runtime-3.4.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/antlr4-runtime-4.5.3.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/aopalliance-1.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/aopalliance-repackaged-2.4.0-b34.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/apache-log4j-extras-1.2.17.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/apacheds-i18n-2.0.0-M15.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/apacheds-kerberos-codec-2.0.0-M15.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/api-asn1-api-1.0.0-M20.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/api-util-1.0.0-M20.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/arpack_combined_all-0.1.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/avro-1.7.7.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/avro-ipc-1.7.7-tests.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/avro-ipc-1.7.7.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/avro-mapred-1.7.7-hadoop2.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/base64-2.3.8.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/bcprov-jdk15on-1.51.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/bonecp-0.8.0.RELEASE.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/breeze-macros_2.11-0.12.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/breeze_2.11-0.12.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/calcite-avatica-1.2.0-incubating.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/calcite-core-1.2.0-incubating.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/calcite-linq4j-1.2.0-incubating.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/cglib-2.2.1-v20090111.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/chill-java-0.8.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/chill_2.11-0.8.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-beanutils-1.7.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-beanutils-core-1.8.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-cli-1.2.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-codec-1.10.jar:local:///Users/sshao/projects/apache-spark/assem

[jira] [Created] (SPARK-20365) Inaccurate classpath format for AM and Containers

2017-04-18 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20365:
---

 Summary: Inaccurate classpath format for AM and Containers
 Key: SPARK-20365
 URL: https://issues.apache.org/jira/browse/SPARK-20365
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 2.2.0
Reporter: Saisai Shao
Priority: Minor


In Spark on YARN, when configuring "spark.yarn.jars" with local jars (jars 
started with "local" scheme), we will get inaccurate classpath for AM and 
containers. This is because we don't remove "local" scheme when concatenating 
classpath. It is OK to run because classpath is separated with ":" and java 
treat "local" as a separate jar. But we could improve it to remove the scheme.

{code}
java.class.path = 
/tmp/hadoop-sshao/nm-local-dir/usercache/sshao/appcache/application_1492057593145_0009/container_1492057593145_0009_01_03:/tmp/hadoop-sshao/nm-local-dir/usercache/sshao/appcache/application_1492057593145_0009/container_1492057593145_0009_01_03/__spark_conf__:/tmp/hadoop-sshao/nm-local-dir/usercache/sshao/appcache/application_1492057593145_0009/container_1492057593145_0009_01_03/__spark_libs__/*:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/activation-1.1.1.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/antlr-2.7.7.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/antlr-runtime-3.4.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/antlr4-runtime-4.5.3.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/aopalliance-1.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/aopalliance-repackaged-2.4.0-b34.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/apache-log4j-extras-1.2.17.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/apacheds-i18n-2.0.0-M15.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/apacheds-kerberos-codec-2.0.0-M15.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/api-asn1-api-1.0.0-M20.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/api-util-1.0.0-M20.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/arpack_combined_all-0.1.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/avro-1.7.7.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/avro-ipc-1.7.7-tests.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/avro-ipc-1.7.7.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/avro-mapred-1.7.7-hadoop2.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/base64-2.3.8.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/bcprov-jdk15on-1.51.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/bonecp-0.8.0.RELEASE.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/breeze-macros_2.11-0.12.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/breeze_2.11-0.12.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/calcite-avatica-1.2.0-incubating.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/calcite-core-1.2.0-incubating.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/calcite-linq4j-1.2.0-incubating.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/cglib-2.2.1-v20090111.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/chill-java-0.8.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/chill_2.11-0.8.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-beanutils-1.7.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-beanutils-core-1.8.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-cli-1.2.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-codec-1.10.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-collections-3.2.2.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-compiler-3.0.0.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-compress-1.4.1.jar:local:///Users/sshao/projects/apache-spark/assembly/target/scala-2.11/jars/commons-configuration-1.6.jar:local:///Users

Re: Spark API authentication

2017-04-14 Thread Saisai Shao
IIUC auth filter on the Live UI REST API should already be supported, the
fix in SPARK-19652 is mainly for the History UI to support per app based
ACL.

For application submission REST API in standalone mode, I think currently
it is not supported, it is not a bug.

On Fri, Apr 14, 2017 at 6:56 PM, Sergey Grigorev 
wrote:

> Thanks for help!
>
> I've found the ticket with a similar problem https://issues.apache.org/
> jira/browse/SPARK-19652. It looks like this fix did not hit to 2.1.0
> release.
> You said that for the second example custom filter is not supported. It
> is a bug or expected behavior?
>
> On 14.04.2017 13:22, Saisai Shao wrote:
>
> AFAIK, For the first line, custom filter should be worked. But for the
> latter it is not supported.
>
> On Fri, Apr 14, 2017 at 6:17 PM, Sergey Grigorev 
> wrote:
>
>> GET requests like *
>> <http://worker:4040/api/v1/applications>http://worker:4040/api/v1/
>> <http://worker:4040/api/v1/>applications *or 
>> *http://master:6066/v1/submissions/status/driver-20170414025324-
>> <http://master:6066/v1/submissions/status/driver-20170414025324-> *return
>> successful result. But if I open the spark master web ui then it requests
>> username and password.
>>
>>
>> On 14.04.2017 12:46, Saisai Shao wrote:
>>
>> Hi,
>>
>> What specifically are you referring to "Spark API endpoint"?
>>
>> Filter can only be worked with Spark Live and History web UI.
>>
>> On Fri, Apr 14, 2017 at 5:18 PM, Sergey < 
>> grigorev-...@yandex.ru> wrote:
>>
>>> Hello all,
>>>
>>> I've added own spark.ui.filters to enable basic authentication to access
>>> to
>>> Spark web UI. It works fine, but I still can do requests to spark API
>>> without any authentication.
>>> Is there any way to enable authentication for API endpoints?
>>>
>>> P.S. spark version is 2.1.0, deploy mode is standalone.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-API-authentication-tp28601.html>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-AP
>>> I-authentication-tp28601.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe e-mail: 
>>> user-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>
>


Re: Spark API authentication

2017-04-14 Thread Saisai Shao
AFAIK, For the first line, custom filter should be worked. But for the
latter it is not supported.

On Fri, Apr 14, 2017 at 6:17 PM, Sergey Grigorev 
wrote:

> GET requests like *http://worker:4040/api/v1/applications
> <http://worker:4040/api/v1/applications> *or 
> *http://master:6066/v1/submissions/status/driver-20170414025324-
> <http://master:6066/v1/submissions/status/driver-20170414025324-> *return
> successful result. But if I open the spark master web ui then it requests
> username and password.
>
>
> On 14.04.2017 12:46, Saisai Shao wrote:
>
> Hi,
>
> What specifically are you referring to "Spark API endpoint"?
>
> Filter can only be worked with Spark Live and History web UI.
>
> On Fri, Apr 14, 2017 at 5:18 PM, Sergey  wrote:
>
>> Hello all,
>>
>> I've added own spark.ui.filters to enable basic authentication to access
>> to
>> Spark web UI. It works fine, but I still can do requests to spark API
>> without any authentication.
>> Is there any way to enable authentication for API endpoints?
>>
>> P.S. spark version is 2.1.0, deploy mode is standalone.
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Spark-API-authentication-tp28601.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>


Re: Spark API authentication

2017-04-14 Thread Saisai Shao
Hi,

What specifically are you referring to "Spark API endpoint"?

Filter can only be worked with Spark Live and History web UI.

On Fri, Apr 14, 2017 at 5:18 PM, Sergey  wrote:

> Hello all,
>
> I've added own spark.ui.filters to enable basic authentication to access to
> Spark web UI. It works fine, but I still can do requests to spark API
> without any authentication.
> Is there any way to enable authentication for API endpoints?
>
> P.S. spark version is 2.1.0, deploy mode is standalone.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-API-authentication-tp28601.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-13 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968545#comment-15968545
 ] 

Saisai Shao commented on SPARK-16742:
-

[~mgummelt], do you have a design doc of the kerberos support for Spark on 
Mesos, so that my work of SPARK-19143 could be based on yours.

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20286) dynamicAllocation.executorIdleTimeout is ignored after unpersist

2017-04-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965411#comment-15965411
 ] 

Saisai Shao commented on SPARK-20286:
-

bq. Maybe the best approach is to change from cachedExecutorIdleTimeout to 
executorIdleTimeout on all cached executors when the last RDD has been 
unpersisted, and then restart the time counter (unpersist will then count as an 
action).

Yes, I think this is a feasible solution. I can help out it if you're not 
familiar with that code.

> dynamicAllocation.executorIdleTimeout is ignored after unpersist
> 
>
> Key: SPARK-20286
> URL: https://issues.apache.org/jira/browse/SPARK-20286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Miguel Pérez
>
> With dynamic allocation enabled, it seems that executors with cached data 
> which are unpersisted are still being killed using the 
> {{dynamicAllocation.cachedExecutorIdleTimeout}} configuration, instead of 
> {{dynamicAllocation.executorIdleTimeout}}. Assuming the default configuration 
> ({{dynamicAllocation.cachedExecutorIdleTimeout = Infinity}}), an executor 
> with unpersisted data won't be released until the job ends.
> *How to reproduce*
> - Set different values for {{dynamicAllocation.executorIdleTimeout}} and 
> {{dynamicAllocation.cachedExecutorIdleTimeout}}
> - Load a file into a RDD and persist it
> - Execute an action on the RDD (like a count) so some executors are activated.
> - When the action has finished, unpersist the RDD
> - The application UI removes correctly the persisted data from the *Storage* 
> tab, but if you look in the *Executors* tab, you will find that the executors 
> remain *active* until ({{dynamicAllocation.cachedExecutorIdleTimeout}} is 
> reached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20286) dynamicAllocation.executorIdleTimeout is ignored after unpersist

2017-04-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965306#comment-15965306
 ] 

Saisai Shao commented on SPARK-20286:
-

So the fix is that if there's no RDD get persisted, executors idle time should 
be changed to {{dynamicAllocation.executorIdleTimeout}}.

> dynamicAllocation.executorIdleTimeout is ignored after unpersist
> 
>
> Key: SPARK-20286
> URL: https://issues.apache.org/jira/browse/SPARK-20286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Miguel Pérez
>
> With dynamic allocation enabled, it seems that executors with cached data 
> which are unpersisted are still being killed using the 
> {{dynamicAllocation.cachedExecutorIdleTimeout}} configuration, instead of 
> {{dynamicAllocation.executorIdleTimeout}}. Assuming the default configuration 
> ({{dynamicAllocation.cachedExecutorIdleTimeout = Infinity}}), an executor 
> with unpersisted data won't be released until the job ends.
> *How to reproduce*
> - Set different values for {{dynamicAllocation.executorIdleTimeout}} and 
> {{dynamicAllocation.cachedExecutorIdleTimeout}}
> - Load a file into a RDD and persist it
> - Execute an action on the RDD (like a count) so some executors are activated.
> - When the action has finished, unpersist the RDD
> - The application UI removes correctly the persisted data from the *Storage* 
> tab, but if you look in the *Executors* tab, you will find that the executors 
> remain *active* until ({{dynamicAllocation.cachedExecutorIdleTimeout}} is 
> reached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20244) Incorrect input size in UI with pyspark

2017-04-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964422#comment-15964422
 ] 

Saisai Shao commented on SPARK-20244:
-

This actually is not a UI problem, it is FileSystem thread local statistics 
problem, because PythonRDD will create another thread to read data, so the 
readBytes getting from another thread will be error. But there's no problem if 
using spark-shell, since everything is processed in one thread.

This is a general problem if the child RDD's computation creates another thread 
to handle parent's RDD (HadoopRDD)'s iterator. I tried several different ways 
to handle this problem, but still have some small issues. The multi-thread 
processing inside the RDD make the fix quite complex. 

> Incorrect input size in UI with pyspark
> ---
>
> Key: SPARK-20244
> URL: https://issues.apache.org/jira/browse/SPARK-20244
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: pyspark_incorrect_inputsize.png, 
> sparkshell_correct_inputsize.png
>
>
> In Spark UI (Details for Stage) Input Size is  64.0 KB when running in 
> PySparkShell. 
> Also it is incorrect in Tasks table:
> 64.0 KB / 132120575 in pyspark
> 252.0 MB / 132120575 in spark-shell
> I will attach screenshots.
> Reproduce steps:
> Run this  to generate big file (press Ctrl+C after 5-6 seconds)
> $ yes > /tmp/yes.txt
> $ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
> $ ./bin/pyspark
> {code}
> Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 2.1.0
>   /_/
> Using Python version 2.7.5 (default, Nov  6 2016 00:28:07)
> SparkSession available as 'spark'.{code}
> >>> a = sc.textFile("/tmp/yes.txt")
> >>> a.count()
> Open Spark UI and check Stage 0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20275) HistoryServer page shows incorrect complete date of inprogress apps

2017-04-10 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-20275:

Description: 
The HistoryServer's incomplete page shows in-progress application's completed 
date as {{1969-12-31 23:59:59}}, which is not meaningful and could be improved.

!https://issues.apache.org/jira/secure/attachment/12862656/screenshot-1.png!

So instead of showing this date, here proposed to not display this column since 
it is not required for in-progress applications.

  was:
The HistoryServer's incomplete page shows in-progress application's completed 
date as {{1969-12-31 23:59:59}}, which is not meaningful and could be improved.

So instead of showing this date, here proposed to not display this column since 
it is not required for in-progress applications.


> HistoryServer page shows incorrect complete date of inprogress apps
> ---
>
> Key: SPARK-20275
> URL: https://issues.apache.org/jira/browse/SPARK-20275
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> The HistoryServer's incomplete page shows in-progress application's completed 
> date as {{1969-12-31 23:59:59}}, which is not meaningful and could be 
> improved.
> !https://issues.apache.org/jira/secure/attachment/12862656/screenshot-1.png!
> So instead of showing this date, here proposed to not display this column 
> since it is not required for in-progress applications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20275) HistoryServer page shows incorrect complete date of inprogress apps

2017-04-10 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-20275:

Attachment: screenshot-1.png

> HistoryServer page shows incorrect complete date of inprogress apps
> ---
>
> Key: SPARK-20275
> URL: https://issues.apache.org/jira/browse/SPARK-20275
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>    Reporter: Saisai Shao
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> The HistoryServer's incomplete page shows in-progress application's completed 
> date as {{1969-12-31 23:59:59}}, which is not meaningful and could be 
> improved.
> So instead of showing this date, here proposed to not display this column 
> since it is not required for in-progress applications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20275) HistoryServer page shows incorrect complete date of inprogress apps

2017-04-10 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20275:
---

 Summary: HistoryServer page shows incorrect complete date of 
inprogress apps
 Key: SPARK-20275
 URL: https://issues.apache.org/jira/browse/SPARK-20275
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Saisai Shao
Priority: Minor


The HistoryServer's incomplete page shows in-progress application's completed 
date as {{1969-12-31 23:59:59}}, which is not meaningful and could be improved.

So instead of showing this date, here proposed to not display this column since 
it is not required for in-progress applications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-09 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962455#comment-15962455
 ] 

Saisai Shao commented on SPARK-16742:
-

Hi [~mgummelt], I'm working on the design of SPARK-19143, by looking at your 
comments, I think part of the works are overlapped, especially the RPC part to 
propagate Credentials. Here is my current WIP design 
(https://docs.google.com/document/d/1Y8CY3XViViTYiIQO9ySoid0t9q3H163fmroCV1K3NTk/edit?usp=sharing).
 In my current design I offer a standard RPC solution to support different 
cluster managers.

It would be great if we could collaborate together to meet the same goal. My 
main concern is that if Mesos's implementation is quite different from Yarn's, 
then it requires more effort to align with different cluster managers, if your 
proposal is similar to what I proposed here, then my work can be based on yours.

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20239) Improve HistoryServer ACL mechanism

2017-04-06 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-20239:

Description: 
Current SHS (Spark History Server) two different ACLs. 

* ACL of base URL, it is controlled by "spark.acls.enabled" or 
"spark.ui.acls.enabled", and with this enabled, only user configured with 
"spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user 
who started SHS could list all the applications, otherwise none of them can be 
listed. This will also affect REST APIs which listing the summary of all apps 
and one app.

* Per application ACL. This is controlled by "spark.history.ui.acls.enabled". 
With this enabled only history admin user and user/group who ran this app can 
access the details of this app. 

With this two ACLs, we may encounter several unexpected behaviors:

1. if base URL's ACL is enabled but user A has no view permission. User "A" 
cannot see the app list but could still access details of it's own app.
2. if ACLs of base URL is disabled. Then user "A" could see the summary of all 
the apps, even some didn't run by user "A", but cannot access the details.
3. history admin ACL has no permission to list all apps if this admin user is 
not added to base URL's ACL.

The unexpected behaviors is mainly because we have two different ACLs, ideally 
we should have only one to manage all.

So to improve SHS's ACL mechanism, we should:

* Unify two different ACLs into one, and always honor this one (both in base 
URL and app details).
* User could partially list and display apps which ran by him according to the 
ACLs in event log.

  was:
Current SHS (Spark History Server) two different ACLs. 

* ACL of base URL, it is controlled by "spark.acls.enabled" or 
"spark.ui.acls.enabled", and with this enabled, only user configured with 
"spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user 
who started STS could list all the applications, otherwise none of them can be 
listed.

* Per application ACL. This is controlled by "spark.history.ui.acls.enabled". 
With this enabled only history admin user and user/group for this app when it 
was run can access the details of this app. 

With this two ACLs, we may encounter several unexpected behaviors:

1. if base URL's ACL is enabled but user A has no view permission. User "A" 
cannot see the app list but could access details of it's own app.
2. if ACLs of base URL is disabled. Then user "A" could the summary of all the 
apps, even some are not ran by user "A", but cannot access the details.
3. history admin ACL has no permission to list all apps if this admin user is 
not added base URL's ACL.

The unexpected behavior of ACLs is mainly because we have two different ACLs, 
ideally we should have only one to manage all.

So to improve SHS's ACL mechanism, we should:

* Unify two different ACLs into one, and always honor this (both is base URL 
and app details).
* User could partially list and display apps which ran by him according to the 
ACLs in event log.


> Improve HistoryServer ACL mechanism
> ---
>
> Key: SPARK-20239
> URL: https://issues.apache.org/jira/browse/SPARK-20239
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>
> Current SHS (Spark History Server) two different ACLs. 
> * ACL of base URL, it is controlled by "spark.acls.enabled" or 
> "spark.ui.acls.enabled", and with this enabled, only user configured with 
> "spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user 
> who started SHS could list all the applications, otherwise none of them can 
> be listed. This will also affect REST APIs which listing the summary of all 
> apps and one app.
> * Per application ACL. This is controlled by "spark.history.ui.acls.enabled". 
> With this enabled only history admin user and user/group who ran this app can 
> access the details of this app. 
> With this two ACLs, we may encounter several unexpected behaviors:
> 1. if base URL's ACL is enabled but user A has no view permission. User "A" 
> cannot see the app list but could still access details of it's own app.
> 2. if ACLs of base URL is disabled. Then user "A" could see the summary of 
> all the apps, even some didn't run by user "A", but cannot access the details.
> 3. history admin ACL has no permission to list all apps if this admin user is 
> not added to base URL's ACL.

[jira] [Created] (SPARK-20239) Improve HistoryServer ACL mechanism

2017-04-06 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20239:
---

 Summary: Improve HistoryServer ACL mechanism
 Key: SPARK-20239
 URL: https://issues.apache.org/jira/browse/SPARK-20239
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Saisai Shao


Current SHS (Spark History Server) two different ACLs. 

* ACL of base URL, it is controlled by "spark.acls.enabled" or 
"spark.ui.acls.enabled", and with this enabled, only user configured with 
"spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user 
who started STS could list all the applications, otherwise none of them can be 
listed.

* Per application ACL. This is controlled by "spark.history.ui.acls.enabled". 
With this enabled only history admin user and user/group for this app when it 
was run can access the details of this app. 

With this two ACLs, we may encounter several unexpected behaviors:

1. if base URL's ACL is enabled but user A has no view permission. User "A" 
cannot see the app list but could access details of it's own app.
2. if ACLs of base URL is disabled. Then user "A" could the summary of all the 
apps, even some are not ran by user "A", but cannot access the details.
3. history admin ACL has no permission to list all apps if this admin user is 
not added base URL's ACL.

The unexpected behavior of ACLs is mainly because we have two different ACLs, 
ideally we should have only one to manage all.

So to improve SHS's ACL mechanism, we should:

* Unify two different ACLs into one, and always honor this (both is base URL 
and app details).
* User could partially list and display apps which ran by him according to the 
ACLs in event log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: spark kafka consumer with kerberos

2017-03-31 Thread Saisai Shao
Hi Bill,

Normally Kerberos principal and keytab should be enough, because keytab
could actually represent the password. Did you configure SASL/GSSAPI or
SASL/Plain for KafkaClient?
http://kafka.apache.org/documentation.html#security_sasl

Actually this is more like a Kafka question and normally should be a
configuration issue, I would suggest you to ask this question in Kafka mail
list.

Thanks
Saisai


On Fri, Mar 31, 2017 at 10:28 PM, Bill Schwanitz  wrote:

> Saisai,
>
> Yea that seems to have helped. Looks like the kerberos ticket when I
> submit does not get passed to the executor?
>
> ... 3 more
> Caused by: org.apache.kafka.common.KafkaException:
> javax.security.auth.login.LoginException: Unable to obtain password from
> user
>
> at org.apache.kafka.common.network.SaslChannelBuilder.
> configure(SaslChannelBuilder.java:86)
> at org.apache.kafka.common.network.ChannelBuilders.
> create(ChannelBuilders.java:70)
> at org.apache.kafka.clients.ClientUtils.createChannelBuilder(
> ClientUtils.java:83)
> at org.apache.kafka.clients.consumer.KafkaConsumer.(
> KafkaConsumer.java:623)
> ... 14 more
> Caused by: javax.security.auth.login.LoginException: Unable to obtain
> password from user
>
>
> On Fri, Mar 31, 2017 at 9:08 AM, Saisai Shao 
> wrote:
>
>> Hi Bill,
>>
>> The exception is from executor side. From the gist you provided, looks
>> like the issue is that you only configured java options in driver side, I
>> think you should also configure this in executor side. You could refer to
>> here (https://github.com/hortonworks-spark/skc#running-on-a-
>> kerberos-enabled-cluster).
>>
>> --files key.conf#key.conf,v.keytab#v.keytab
>> --driver-java-options "-Djava.security.auth.login.config=./key.conf"
>> --conf 
>> "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf"
>>
>>
>> On Fri, Mar 31, 2017 at 1:58 AM, Bill Schwanitz 
>> wrote:
>>
>>> I'm working on a poc spark job to pull data from a kafka topic with
>>> kerberos enabled ( required ) brokers.
>>>
>>> The code seems to connect to kafka and enter a polling mode. When I toss
>>> something onto the topic I get an exception which I just can't seem to
>>> figure out. Any ideas?
>>>
>>> I have a full gist up at https://gist.github.com/bil
>>> sch/17f4a4c4303ed3e004e2234a5904f0de with a lot of details. If I use
>>> the hdfs/spark client code for just normal operations everything works fine
>>> but for some reason the streaming code is having issues. I have verified
>>> the KafkaClient object is in the jaas config. The keytab is good etc.
>>>
>>> Guessing I'm doing something wrong I just have not figured out what yet!
>>> Any thoughts?
>>>
>>> The exception:
>>>
>>> 17/03/30 12:54:00 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID
>>> 0, host5.some.org.net): org.apache.kafka.common.KafkaException: Failed
>>> to construct kafka consumer
>>> at org.apache.kafka.clients.consumer.KafkaConsumer.(Kafka
>>> Consumer.java:702)
>>> at org.apache.kafka.clients.consumer.KafkaConsumer.(Kafka
>>> Consumer.java:557)
>>> at org.apache.kafka.clients.consumer.KafkaConsumer.(Kafka
>>> Consumer.java:540)
>>> at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.>> t>(CachedKafkaConsumer.scala:47)
>>> at org.apache.spark.streaming.kafka010.CachedKafkaConsumer$.get
>>> (CachedKafkaConsumer.scala:157)
>>> at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterato
>>> r.(KafkaRDD.scala:210)
>>> at org.apache.spark.streaming.kafka010.KafkaRDD.compute(KafkaRD
>>> D.scala:185)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:86)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: org.apache.kafka.common.KafkaException:
>>> org.apache.kafka.common.KafkaException: Jaas configuration not found
>>> at org.apache.kafka.common.network.SaslChannelBuilder.configure
>>> (SaslChannelBuilder.java:86)
>>> at org.apache

Re: spark kafka consumer with kerberos

2017-03-31 Thread Saisai Shao
Hi Bill,

The exception is from executor side. From the gist you provided, looks like
the issue is that you only configured java options in driver side, I think
you should also configure this in executor side. You could refer to here (
https://github.com/hortonworks-spark/skc#running-on-a-kerberos-enabled-cluster
).

--files key.conf#key.conf,v.keytab#v.keytab
--driver-java-options "-Djava.security.auth.login.config=./key.conf"
--conf 
"spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf"


On Fri, Mar 31, 2017 at 1:58 AM, Bill Schwanitz  wrote:

> I'm working on a poc spark job to pull data from a kafka topic with
> kerberos enabled ( required ) brokers.
>
> The code seems to connect to kafka and enter a polling mode. When I toss
> something onto the topic I get an exception which I just can't seem to
> figure out. Any ideas?
>
> I have a full gist up at https://gist.github.com/bilsch/
> 17f4a4c4303ed3e004e2234a5904f0de with a lot of details. If I use the
> hdfs/spark client code for just normal operations everything works fine but
> for some reason the streaming code is having issues. I have verified the
> KafkaClient object is in the jaas config. The keytab is good etc.
>
> Guessing I'm doing something wrong I just have not figured out what yet!
> Any thoughts?
>
> The exception:
>
> 17/03/30 12:54:00 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> host5.some.org.net): org.apache.kafka.common.KafkaException: Failed to
> construct kafka consumer
> at org.apache.kafka.clients.consumer.KafkaConsumer.(
> KafkaConsumer.java:702)
> at org.apache.kafka.clients.consumer.KafkaConsumer.(
> KafkaConsumer.java:557)
> at org.apache.kafka.clients.consumer.KafkaConsumer.(
> KafkaConsumer.java:540)
> at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.<
> init>(CachedKafkaConsumer.scala:47)
> at org.apache.spark.streaming.kafka010.CachedKafkaConsumer$.
> get(CachedKafkaConsumer.scala:157)
> at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.(
> KafkaRDD.scala:210)
> at org.apache.spark.streaming.kafka010.KafkaRDD.compute(
> KafkaRDD.scala:185)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.KafkaException:
> org.apache.kafka.common.KafkaException: Jaas configuration not found
> at org.apache.kafka.common.network.SaslChannelBuilder.
> configure(SaslChannelBuilder.java:86)
> at org.apache.kafka.common.network.ChannelBuilders.
> create(ChannelBuilders.java:70)
> at org.apache.kafka.clients.ClientUtils.createChannelBuilder(
> ClientUtils.java:83)
> at org.apache.kafka.clients.consumer.KafkaConsumer.(
> KafkaConsumer.java:623)
> ... 14 more
> Caused by: org.apache.kafka.common.KafkaException: Jaas configuration not
> found
> at org.apache.kafka.common.security.kerberos.KerberosLogin.getServiceName(
> KerberosLogin.java:299)
> at org.apache.kafka.common.security.kerberos.KerberosLogin.configure(
> KerberosLogin.java:103)
> at org.apache.kafka.common.security.authenticator.LoginManager.(
> LoginManager.java:45)
> at org.apache.kafka.common.security.authenticator.LoginManager.
> acquireLoginManager(LoginManager.java:68)
> at org.apache.kafka.common.network.SaslChannelBuilder.
> configure(SaslChannelBuilder.java:78)
> ... 17 more
> Caused by: java.io.IOException: Could not find a 'KafkaClient' entry in
> this configuration.
> at org.apache.kafka.common.security.JaasUtils.jaasConfig(
> JaasUtils.java:50)
> at org.apache.kafka.common.security.kerberos.KerberosLogin.getServiceName(
> KerberosLogin.java:297)
> ... 21 more
>


[jira] [Created] (SPARK-20172) Event log without read permission should be filtered out before actually reading it

2017-03-31 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20172:
---

 Summary: Event log without read permission should be filtered out 
before actually reading it
 Key: SPARK-20172
 URL: https://issues.apache.org/jira/browse/SPARK-20172
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Saisai Shao
Priority: Minor


In the current Spark's HistoryServer, we expected to check file permission when 
listing all the files, and filter out this files with no read permission. That 
was not worked because we actually doesn't check the access permission, so we 
defer this permission check until reading files, that is not necessary and the 
exception is printed out in every 10 seconds by default.

So to avoid this problem we should add a access check logic in listing files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20128) MetricsSystem not always killed in SparkContext.stop()

2017-03-28 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946493#comment-15946493
 ] 

Saisai Shao commented on SPARK-20128:
-

Sorry I cannot access the logs. What I could see from the link provided above 
is:

{noformat}
[info] - internal accumulators in multiple stages (185 milliseconds)
3/24/17 2:02:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 2:22:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 2:42:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 3:02:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 3:22:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 3:42:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 4:02:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 4:22:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 4:42:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 5:02:19 PM =

-- Gauges --
master.aliveWorkers
3/24/17 5:22:19 PM =

-- Gauges --
{noformat}

>From the console output what I could see is that after this {{internal 
>accumulators in multiple stages}} unit test is finished, then the whole test 
>is hang, and just print some metrics information.

> MetricsSystem not always killed in SparkContext.stop()
> --
>
> Key: SPARK-20128
> URL: https://issues.apache.org/jira/browse/SPARK-20128
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, Tests
>Affects Versions: 2.2.0
>Reporter: Imran Rashid
>  Labels: flaky-test
>
> One Jenkins run failed due to the MetricsSystem never getting killed after a 
> failed test, which led that test to hang and the tests to timeout:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75176
> {noformat}
> 17/03/24 13:44:19.537 dag-scheduler-event-loop ERROR 
> DAGSchedulerEventProcessLoop: DAGSchedulerEventProcessLoop failed; shutting 
> down SparkContext
> java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.spark.MapOutputTrackerMaster$$anonfun$getEpochForMapOutput$1.apply(MapOutputTracker.scala:431)
> at 
> org.apache.spark.MapOutputTrackerMaster$$anonfun$getEpochForMapOutput$1.apply(MapOutputTracker.scala:430)
> at scala.Option.flatMap(Option.scala:171)
> at 
> org.apache.spark.MapOutputTrackerMaster.getEpochForMapOutput(MapOutputTracker.scala:430)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1298)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1731)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1689)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1678)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> 17/03/24 13:44:19.540 dispatcher-event-loop-11 INFO 
> MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
> 17/03/24 13:44:19.546 stop-spark-context INFO MemoryStore: MemoryStore cleared
> 17/03/24 13:44:19.546 stop-spark-context INFO BlockManager: BlockManager 
> stopped
> 17/03/24 13:44:19.546 stop-spark-context INFO BlockManagerMaster: 
> BlockManagerMaster stopped
> 17/03/24 13:44:19.546 dispatcher-event-loop-16 INFO 
> Outpu

[jira] [Comment Edited] (SPARK-20128) MetricsSystem not always killed in SparkContext.stop()

2017-03-28 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946457#comment-15946457
 ] 

Saisai Shao edited comment on SPARK-20128 at 3/29/17 3:20 AM:
--

Here the exception is from MasterSource, which only exists in Standalone 
Master, I think it should not be related to SparkContext, may be the Master is 
not cleanly stopped. --Also as I remembered by default we will not enable 
ConsoleReporter, not sure how this could be happened.--

Looks like we have a metrics property in the test resource, that's why console 
sink will be enabled in UT.


was (Author: jerryshao):
Here the exception is from MasterSource, which only exists in Standalone 
Master, I think it should not be related to SparkContext, may be the Master is 
not cleanly stopped. Also as I remembered by default we will not enable 
ConsoleReporter, not sure how this could be happened.

> MetricsSystem not always killed in SparkContext.stop()
> --
>
> Key: SPARK-20128
> URL: https://issues.apache.org/jira/browse/SPARK-20128
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, Tests
>Affects Versions: 2.2.0
>Reporter: Imran Rashid
>  Labels: flaky-test
>
> One Jenkins run failed due to the MetricsSystem never getting killed after a 
> failed test, which led that test to hang and the tests to timeout:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75176
> {noformat}
> 17/03/24 13:44:19.537 dag-scheduler-event-loop ERROR 
> DAGSchedulerEventProcessLoop: DAGSchedulerEventProcessLoop failed; shutting 
> down SparkContext
> java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.spark.MapOutputTrackerMaster$$anonfun$getEpochForMapOutput$1.apply(MapOutputTracker.scala:431)
> at 
> org.apache.spark.MapOutputTrackerMaster$$anonfun$getEpochForMapOutput$1.apply(MapOutputTracker.scala:430)
> at scala.Option.flatMap(Option.scala:171)
> at 
> org.apache.spark.MapOutputTrackerMaster.getEpochForMapOutput(MapOutputTracker.scala:430)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1298)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1731)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1689)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1678)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> 17/03/24 13:44:19.540 dispatcher-event-loop-11 INFO 
> MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
> 17/03/24 13:44:19.546 stop-spark-context INFO MemoryStore: MemoryStore cleared
> 17/03/24 13:44:19.546 stop-spark-context INFO BlockManager: BlockManager 
> stopped
> 17/03/24 13:44:19.546 stop-spark-context INFO BlockManagerMaster: 
> BlockManagerMaster stopped
> 17/03/24 13:44:19.546 dispatcher-event-loop-16 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 17/03/24 13:44:19.547 stop-spark-context INFO SparkContext: Successfully 
> stopped SparkContext
> 17/03/24 14:02:19.934 metrics-console-reporter-1-thread-1 ERROR 
> ScheduledReporter: RuntimeException thrown from ConsoleReporter#report. 
> Exception was suppressed.
> java.lang.NullPointerException
> at 
> org.apache.spark.deploy.master.MasterSource$$anon$2.getValue(MasterSource.scala:35)
> at 
> org.apache.spark.deploy.master.MasterSource$$anon$2.getValue(MasterSource.scala:34)
> at 
> com.codahale.metrics.ConsoleReporter.printGauge(ConsoleReporter.java:239)
> ...
> {noformat}
> unfortunately I didn't save the entire test logs, but what happens is the 
> initial IndexOutOfBoundsException is a real bug, which causes the 
> SparkContext to stop, and the test to fail.  However, the MetricsSystem 
> somehow stays alive, and since its not a daemon thread, it just hangs, and 
> every 20 mins we get that NPE from within the metrics system as it tries to 
> report.
> I am totally perplexed at how this can happen, it looks like the metric 
> system should always get stopped by the time we see
> {noformat}
> 17/03/24 13:44:19.547 stop-spark-context INFO SparkContext: Successfully 
> stopped SparkContext
> {noformat}
> I don't think I've ever seen this in a real spark use, but it doesn't look 
> like something which is limited to tests, whatever the cause.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20128) MetricsSystem not always killed in SparkContext.stop()

2017-03-28 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946457#comment-15946457
 ] 

Saisai Shao commented on SPARK-20128:
-

Here the exception is from MasterSource, which only exists in Standalone 
Master, I think it should not be related to SparkContext, may be the Master is 
not cleanly stopped. Also as I remembered by default we will not enable 
ConsoleReporter, not sure how this could be happened.

> MetricsSystem not always killed in SparkContext.stop()
> --
>
> Key: SPARK-20128
> URL: https://issues.apache.org/jira/browse/SPARK-20128
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, Tests
>Affects Versions: 2.2.0
>Reporter: Imran Rashid
>  Labels: flaky-test
>
> One Jenkins run failed due to the MetricsSystem never getting killed after a 
> failed test, which led that test to hang and the tests to timeout:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75176
> {noformat}
> 17/03/24 13:44:19.537 dag-scheduler-event-loop ERROR 
> DAGSchedulerEventProcessLoop: DAGSchedulerEventProcessLoop failed; shutting 
> down SparkContext
> java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.spark.MapOutputTrackerMaster$$anonfun$getEpochForMapOutput$1.apply(MapOutputTracker.scala:431)
> at 
> org.apache.spark.MapOutputTrackerMaster$$anonfun$getEpochForMapOutput$1.apply(MapOutputTracker.scala:430)
> at scala.Option.flatMap(Option.scala:171)
> at 
> org.apache.spark.MapOutputTrackerMaster.getEpochForMapOutput(MapOutputTracker.scala:430)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1298)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1731)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1689)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1678)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> 17/03/24 13:44:19.540 dispatcher-event-loop-11 INFO 
> MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
> 17/03/24 13:44:19.546 stop-spark-context INFO MemoryStore: MemoryStore cleared
> 17/03/24 13:44:19.546 stop-spark-context INFO BlockManager: BlockManager 
> stopped
> 17/03/24 13:44:19.546 stop-spark-context INFO BlockManagerMaster: 
> BlockManagerMaster stopped
> 17/03/24 13:44:19.546 dispatcher-event-loop-16 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 17/03/24 13:44:19.547 stop-spark-context INFO SparkContext: Successfully 
> stopped SparkContext
> 17/03/24 14:02:19.934 metrics-console-reporter-1-thread-1 ERROR 
> ScheduledReporter: RuntimeException thrown from ConsoleReporter#report. 
> Exception was suppressed.
> java.lang.NullPointerException
> at 
> org.apache.spark.deploy.master.MasterSource$$anon$2.getValue(MasterSource.scala:35)
> at 
> org.apache.spark.deploy.master.MasterSource$$anon$2.getValue(MasterSource.scala:34)
> at 
> com.codahale.metrics.ConsoleReporter.printGauge(ConsoleReporter.java:239)
> ...
> {noformat}
> unfortunately I didn't save the entire test logs, but what happens is the 
> initial IndexOutOfBoundsException is a real bug, which causes the 
> SparkContext to stop, and the test to fail.  However, the MetricsSystem 
> somehow stays alive, and since its not a daemon thread, it just hangs, and 
> every 20 mins we get that NPE from within the metrics system as it tries to 
> report.
> I am totally perplexed at how this can happen, it looks like the metric 
> system should always get stopped by the time we see
> {noformat}
> 17/03/24 13:44:19.547 stop-spark-context INFO SparkContext: Successfully 
> stopped SparkContext
> {noformat}
> I don't think I've ever seen this in a real spark use, but it doesn't look 
> like something which is limited to tests, whatever the cause.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20079) Re registration of AM hangs spark cluster in yarn-client mode

2017-03-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944629#comment-15944629
 ] 

Saisai Shao commented on SPARK-20079:
-

What is the specific symptom you met? I believe there's bunch of corner cases 
regarding RPC back and forth in yarn-client + AM reattempt scenario, and 
sometimes these scenarios are quite hard to fix, so usually I would suggest to 
set max attempt to 1 in yarn client mode.

> Re registration of AM hangs spark cluster in yarn-client mode
> -
>
> Key: SPARK-20079
> URL: https://issues.apache.org/jira/browse/SPARK-20079
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Guoqiang Li
>
> 1. Start cluster
> echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | 
> ./bin/spark-shell  --master yarn-client --executor-cores 1 --conf 
> spark.shuffle.service.enabled=true --conf 
> spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.maxExecutors=2 
> 2.  Kill the AM process when a stage is scheduled. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19143) API in Spark for distributing new delegation tokens (Improve delegation token handling in secure clusters)

2017-03-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943289#comment-15943289
 ] 

Saisai Shao commented on SPARK-19143:
-

Thanks [~tgraves], let me see how to propose a SPIP.

> API in Spark for distributing new delegation tokens (Improve delegation token 
> handling in secure clusters)
> --
>
> Key: SPARK-19143
> URL: https://issues.apache.org/jira/browse/SPARK-19143
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Ruslan Dautkhanov
>
> Spin off from SPARK-14743 and comments chain in [recent comments| 
> https://issues.apache.org/jira/browse/SPARK-5493?focusedCommentId=15802179&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15802179]
>  in SPARK-5493.
> Spark currently doesn't have a way for distribution new delegation tokens. 
> Quoting [~vanzin] from SPARK-5493 
> {quote}
> IIRC Livy doesn't yet support delegation token renewal. Once it reaches the 
> TTL, the session is unusable.
> There might be ways to hack support for that without changes in Spark, but 
> I'd like to see a proper API in Spark for distributing new delegation tokens. 
> I mentioned that in SPARK-14743, but although that bug is closed, that 
> particular feature hasn't been implemented yet.
> {quote}
> Other thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19143) API in Spark for distributing new delegation tokens (Improve delegation token handling in secure clusters)

2017-03-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943303#comment-15943303
 ] 

Saisai Shao commented on SPARK-19143:
-

[~tgraves], can I add you as a *SPIP Shepherd*? :)

> API in Spark for distributing new delegation tokens (Improve delegation token 
> handling in secure clusters)
> --
>
> Key: SPARK-19143
> URL: https://issues.apache.org/jira/browse/SPARK-19143
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Ruslan Dautkhanov
>
> Spin off from SPARK-14743 and comments chain in [recent comments| 
> https://issues.apache.org/jira/browse/SPARK-5493?focusedCommentId=15802179&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15802179]
>  in SPARK-5493.
> Spark currently doesn't have a way for distribution new delegation tokens. 
> Quoting [~vanzin] from SPARK-5493 
> {quote}
> IIRC Livy doesn't yet support delegation token renewal. Once it reaches the 
> TTL, the session is unusable.
> There might be ways to hack support for that without changes in Spark, but 
> I'd like to see a proper API in Spark for distributing new delegation tokens. 
> I mentioned that in SPARK-14743, but although that bug is closed, that 
> particular feature hasn't been implemented yet.
> {quote}
> Other thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19143) API in Spark for distributing new delegation tokens (Improve delegation token handling in secure clusters)

2017-03-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943243#comment-15943243
 ] 

Saisai Shao commented on SPARK-19143:
-

Attach a WIP branch 
(https://github.com/jerryshao/apache-spark/tree/SPARK-19143).

> API in Spark for distributing new delegation tokens (Improve delegation token 
> handling in secure clusters)
> --
>
> Key: SPARK-19143
> URL: https://issues.apache.org/jira/browse/SPARK-19143
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Ruslan Dautkhanov
>
> Spin off from SPARK-14743 and comments chain in [recent comments| 
> https://issues.apache.org/jira/browse/SPARK-5493?focusedCommentId=15802179&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15802179]
>  in SPARK-5493.
> Spark currently doesn't have a way for distribution new delegation tokens. 
> Quoting [~vanzin] from SPARK-5493 
> {quote}
> IIRC Livy doesn't yet support delegation token renewal. Once it reaches the 
> TTL, the session is unusable.
> There might be ways to hack support for that without changes in Spark, but 
> I'd like to see a proper API in Spark for distributing new delegation tokens. 
> I mentioned that in SPARK-14743, but although that bug is closed, that 
> particular feature hasn't been implemented yet.
> {quote}
> Other thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20059) HbaseCredentialProvider uses wrong classloader

2017-03-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942976#comment-15942976
 ] 

Saisai Shao edited comment on SPARK-20059 at 3/27/17 10:02 AM:
---

[~sowen], it probably is not the same issue.

In SPARK-20019, jars are added through SQL command in to Hive's SessionState, 
the code path is different.

And in SPARK-11421, AFAIK,  the target is to add jars to the current 
classloader in the runtime.

Here the issue is during start, jars specified through {{--jars}} should be 
added to Spark's child classloader in yarn cluster mode. 

They're probably all classloader related issues, but I think they're targeted 
to the different area, and touched the different code path.



was (Author: jerryshao):
[~sowen], it probably is not the same issue.

In SPARK-20019, jars are added through SQL command in to Hive's SessionState, 
the code path is different.

And in SPARK-11421, AFAIK,  the target is to add jars to the current 
classloader in the runtime.

Here the issue is during start, jars specified through {{--jars}} should be 
added to Spark's child classloader in yarn cluster mode. 

They're all probably all classloader related issues, but I think they're 
targeted to the different area, and touched the different code path.


> HbaseCredentialProvider uses wrong classloader
> --
>
> Key: SPARK-20059
> URL: https://issues.apache.org/jira/browse/SPARK-20059
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Saisai Shao
>
> {{HBaseCredentialProvider}} uses system classloader instead of child 
> classloader, which will make HBase jars specified with {{--jars}} fail to 
> work, so here we should use the right class loader.
> Besides in yarn cluster mode jars specified with {{--jars}} is not added into 
> client's class path, which will make it fail to load HBase jars and issue 
> tokens in our scenario. Also some customized credential provider cannot be 
> registered into client.
> So here I will fix this two issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20059) HbaseCredentialProvider uses wrong classloader

2017-03-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942976#comment-15942976
 ] 

Saisai Shao commented on SPARK-20059:
-

[~sowen], it probably is not the same issue.

In SPARK-20019, jars are added through SQL command in to Hive's SessionState, 
the code path is different.

And in SPARK-11421, AFAIK,  the target is to add jars to the current 
classloader in the runtime.

Here the issue is during start, jars specified through {{--jars}} should be 
added to Spark's child classloader in yarn cluster mode. 

They're all probably all classloader related issues, but I think they're 
targeted to the different area, and touched the different code path.


> HbaseCredentialProvider uses wrong classloader
> --
>
> Key: SPARK-20059
> URL: https://issues.apache.org/jira/browse/SPARK-20059
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Saisai Shao
>
> {{HBaseCredentialProvider}} uses system classloader instead of child 
> classloader, which will make HBase jars specified with {{--jars}} fail to 
> work, so here we should use the right class loader.
> Besides in yarn cluster mode jars specified with {{--jars}} is not added into 
> client's class path, which will make it fail to load HBase jars and issue 
> tokens in our scenario. Also some customized credential provider cannot be 
> registered into client.
> So here I will fix this two issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20059) HbaseCredentialProvider uses wrong classloader

2017-03-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-20059:

Description: 
{{HBaseCredentialProvider}} uses system classloader instead of child 
classloader, which will make HBase jars specified with {{--jars}} fail to work, 
so here we should use the right class loader.

Besides in yarn cluster mode jars specified with {{--jars}} is not added into 
client's class path, which will make it fail to load HBase jars and issue 
tokens in our scenario. Also some customized credential provider cannot be 
registered into client.

So here I will fix this two issues.



  was:
{{HBaseCredentialProvider}} uses system classloader instead of child 
classloader, which will make HBase jars specified with {{--jars}} fail to work, 
so here we should use the right class loader.

Besides in yarn client mode jars specified with {{--jars}} is not added into 
client's class path, which will make it fail to load HBase jars and issue 
tokens in our scenario. Also some customized credential provider cannot be 
registered into client.

So here I will fix this two issues.




> HbaseCredentialProvider uses wrong classloader
> --
>
> Key: SPARK-20059
> URL: https://issues.apache.org/jira/browse/SPARK-20059
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Saisai Shao
>
> {{HBaseCredentialProvider}} uses system classloader instead of child 
> classloader, which will make HBase jars specified with {{--jars}} fail to 
> work, so here we should use the right class loader.
> Besides in yarn cluster mode jars specified with {{--jars}} is not added into 
> client's class path, which will make it fail to load HBase jars and issue 
> tokens in our scenario. Also some customized credential provider cannot be 
> registered into client.
> So here I will fix this two issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: spark-submit config via file

2017-03-26 Thread Saisai Shao
It's quite obvious your hdfs URL is not complete, please looks at the
exception, your hdfs URI doesn't have host, port. Normally it should be OK
if HDFS is your default FS.

I think the problem is you're running on HDI, in which default FS is wasb.
So here short name without host:port will lead to error. This looks like a
HDI specific issue, you'd better ask HDI.

Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no
host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(
DistributedFileSystem.java:154)

at org.apache.hadoop.fs.FileSystem.createFileSystem(
FileSystem.java:2791)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
FileSystem.java:2825)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)




On Fri, Mar 24, 2017 at 9:18 PM, Yong Zhang  wrote:

> Of course it is possible.
>
>
> You can always to set any configurations in your application using API,
> instead of pass in through the CLI.
>
>
> val sparkConf = new SparkConf().setAppName(properties.get("appName")).set(
> "master", properties.get("master")).set(xxx, properties.get("xxx"))
>
> Your error is your environment problem.
>
> Yong
> --
> *From:* , Roy 
> *Sent:* Friday, March 24, 2017 7:38 AM
> *To:* user
> *Subject:* spark-submit config via file
>
> Hi,
>
> I am trying to deploy spark job by using spark-submit which has bunch of
> parameters like
>
> spark-submit --class StreamingEventWriterDriver --master yarn
> --deploy-mode cluster --executor-memory 3072m --executor-cores 4 --files
> streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf
> "streaming.conf"
>
> I was looking a way to put all these flags in the file to pass to
> spark-submit to make my spark-submitcommand simple like this
>
> spark-submit --class StreamingEventWriterDriver --master yarn
> --deploy-mode cluster --properties-file properties.conf --files
> streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf
> "streaming.conf"
>
> properties.conf has following contents
>
>
> spark.executor.memory 3072m
>
> spark.executor.cores 4
>
>
> But I am getting following error
>
>
> 17/03/24 11:36:26 INFO Client: Use hdfs cache file as spark.yarn.archive
> for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.6.0.0-403/spark2/
> spark2-hdp-yarn-archive.tar.gz
>
> 17/03/24 11:36:26 WARN AzureFileSystemThreadPoolExecutor: Disabling
> threads for Delete operation as thread count 0 is <= 1
>
> 17/03/24 11:36:26 INFO AzureFileSystemThreadPoolExecutor: Time taken for
> Delete operation is: 1 ms with threads: 0
>
> 17/03/24 11:36:27 INFO Client: Deleted staging directory wasb://
> a...@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_
> 1488402758319_0492
>
> Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no
> host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
>
> at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(
> DistributedFileSystem.java:154)
>
> at org.apache.hadoop.fs.FileSystem.createFileSystem(
> FileSystem.java:2791)
>
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
>
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
> FileSystem.java:2825)
>
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
>
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>
> at org.apache.spark.deploy.yarn.Client.copyFileToRemote(
> Client.scala:364)
>
> at org.apache.spark.deploy.yarn.Client.org$apache$spark$
> deploy$yarn$Client$$distribute$1(Client.scala:480)
>
> at org.apache.spark.deploy.yarn.Client.prepareLocalResources(
> Client.scala:552)
>
> at org.apache.spark.deploy.yarn.Client.
> createContainerLaunchContext(Client.scala:881)
>
> at org.apache.spark.deploy.yarn.Client.submitApplication(
> Client.scala:170)
>
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1218)
>
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1277)
>
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
>
> at org.apache.spark.deploy.SparkSubmi

[jira] [Commented] (SPARK-19992) spark-submit on deployment-mode cluster

2017-03-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939867#comment-15939867
 ] 

Saisai Shao commented on SPARK-19992:
-

Sorry I cannot give you valid suggestions without knowing your actual 
environment. Basically running Spark on yarn you don't have to configure 
anything except HADOOP_CONF_DIR specified in spark-env.sh. Other than this 
default configurations should be enough.

You could also send your problem to mail list, I think there will be more users 
who met the same problem before. Here JIRA is mainly used to track dev work of 
Spark, not for questions.

> spark-submit on deployment-mode cluster
> ---
>
> Key: SPARK-19992
> URL: https://issues.apache.org/jira/browse/SPARK-19992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.2
> Environment: spark version 2.0.2
> hadoop version 2.6.0
>Reporter: narendra maru
>
> spark version 2.0.2
> hadoop version 2.6.0
> spark -submit command
> "spark-submit --class spark.mongohadoop.testing3 --master yarn --deploy-mode 
> cluster --jars /home/ec2-user/jars/hgmongonew.jar, 
> /home/ec2-user/jars/mongo-hadoop-spark-2.0.1.jar"
> after adding following in
> 1 Spark-default.conf
> spark.executor.extraJavaOptions -Dconfig.fuction.conf 
> spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*
> spark.eventLog.dir=hdfs://localhost:9000/user/spark/applicationHistory
> spark.eventLog.enabled=true
> 2yarn-site.xml
> 
> yarn.application.classpath
> 
> /usr/local/hadoop-2.6.0/etc/hadoop,
> /usr/local/hadoop-2.6.0/,
> /usr/local/hadoop-2.6.0/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/lib/
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/,
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/tools/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/*,
> /usr/local/spark-2.0.2-bin-hadoop2.6/jars/spark-yarn_2.11-2.0.2.jar
> 
> 
> Error on log:-
> Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ApplicationMaster
> Error on terminal:-
> diagnostics: Application application_1489673977198_0002 failed 2 times due to 
> AM Container for appattempt_1489673977198_0002_02 exited with exitCode: 1 
> For more detailed output, check application tracking 
> page:http://bdg-hdp-sparkmaster:8088/proxy/application_1489673977198_0002/Then,
>  click on links to logs of each attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20050) Kafka 0.10 DirectStream doesn't commit last processed batch's offset when graceful shutdown

2017-03-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937845#comment-15937845
 ] 

Saisai Shao commented on SPARK-20050:
-

I think you could register a commit callback function in {{commitAsync}}, this 
callback function will be invoked once offset is committed into Kafka, I think 
you could use this to know when to stop the application.

> Kafka 0.10 DirectStream doesn't commit last processed batch's offset when 
> graceful shutdown
> ---
>
> Key: SPARK-20050
> URL: https://issues.apache.org/jira/browse/SPARK-20050
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.2.0
>Reporter: Sasaki Toru
>
> I use Kafka 0.10 DirectStream with properties 'enable.auto.commit=false' and 
> call 'DirectKafkaInputDStream#commitAsync' finally in each batches,  such 
> below
> {code}
> val kafkaStream = KafkaUtils.createDirectStream[String, String](...)
> kafkaStream.map { input =>
>   "key: " + input.key.toString + " value: " + input.value.toString + " 
> offset: " + input.offset.toString
>   }.foreachRDD { rdd =>
> rdd.foreach { input =>
> println(input)
>   }
> }
> kafkaStream.foreachRDD { rdd =>
>   val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
>   kafkaStream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
> }
> {\code}
> Some records which processed in the last batch before Streaming graceful 
> shutdown reprocess in the first batch after Spark Streaming restart.
> It may cause offsets specified in commitAsync will commit in the head of next 
> batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20058) the running application status changed from running to waiting when a master is down and it change to another standy by master

2017-03-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937841#comment-15937841
 ] 

Saisai Shao commented on SPARK-20058:
-

I have a PR for this bug very very long ago 
(https://github.com/apache/spark/pull/10506).

I've already bring out to latest, but still not one is reviewing it. I have 
seen this issue is brought up either in mail list or in jira several times. 
[~srowen] Can you and someone else could help to review it?

> the running application status changed  from running to waiting when a master 
> is down and it change to another standy by master
> ---
>
> Key: SPARK-20058
> URL: https://issues.apache.org/jira/browse/SPARK-20058
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: wangweidong
>Priority: Minor
>
> 1.I deployed the spark with cluster mode, test186 is master, test171 is a 
> backup  master, workers is test137, test155 and test138.
> 2, Start spark with command sbin/start-all.sh
> 3, submit my task with command bin/spark-submit --supervise --deployed-mode 
> cluster --master spark://test186:7077 etc.
> 4, view web ui ,enter test186:8080 , I can see my application is running 
> normally.
> 5, Stop the master of test186, after a period of time, view web ui with 
> test171(standby master), I see my application is waiting and it can not 
> change to run, but type one application and enter the detail page, i can see 
> it is actually runing.
> Is it an bug ? or i start spark with incorrect setting?
> Help!!!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20058) the running application status changed from running to waiting when a master is down and it change to another standy by master

2017-03-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937841#comment-15937841
 ] 

Saisai Shao edited comment on SPARK-20058 at 3/23/17 7:02 AM:
--

I have a PR for this bug very very long ago 
(https://github.com/apache/spark/pull/10506).

I've already brought it out to latest, but still not one is reviewing it. I 
have seen this issue is brought up either in mail list or in jira several 
times. [~srowen] Can you and someone else could help to review it?


was (Author: jerryshao):
I have a PR for this bug very very long ago 
(https://github.com/apache/spark/pull/10506).

I've already bring out to latest, but still not one is reviewing it. I have 
seen this issue is brought up either in mail list or in jira several times. 
[~srowen] Can you and someone else could help to review it?

> the running application status changed  from running to waiting when a master 
> is down and it change to another standy by master
> ---
>
> Key: SPARK-20058
> URL: https://issues.apache.org/jira/browse/SPARK-20058
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: wangweidong
>Priority: Minor
>
> 1.I deployed the spark with cluster mode, test186 is master, test171 is a 
> backup  master, workers is test137, test155 and test138.
> 2, Start spark with command sbin/start-all.sh
> 3, submit my task with command bin/spark-submit --supervise --deployed-mode 
> cluster --master spark://test186:7077 etc.
> 4, view web ui ,enter test186:8080 , I can see my application is running 
> normally.
> 5, Stop the master of test186, after a period of time, view web ui with 
> test171(standby master), I see my application is waiting and it can not 
> change to run, but type one application and enter the detail page, i can see 
> it is actually runing.
> Is it an bug ? or i start spark with incorrect setting?
> Help!!!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20058) the running application status changed from running to waiting when a master is down and it change to another standy by master

2017-03-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936255#comment-15936255
 ] 

Saisai Shao commented on SPARK-20058:
-

Please subscribe this spark user mail list and set the question to this mail 
list.

> the running application status changed  from running to waiting when a master 
> is down and it change to another standy by master
> ---
>
> Key: SPARK-20058
> URL: https://issues.apache.org/jira/browse/SPARK-20058
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: wangweidong
>Priority: Minor
>
> 1.I deployed the spark with cluster mode, test186 is master, test171 is a 
> backup  master, workers is test137, test155 and test138.
> 2, Start spark with command sbin/start-all.sh
> 3, submit my task with command bin/spark-submit --supervise --deployed-mode 
> cluster --master spark://test186:7077 etc.
> 4, view web ui ,enter test186:8080 , I can see my application is running 
> normally.
> 5, Stop the master of test186, after a period of time, view web ui with 
> test171(standby master), I see my application is waiting and it can not 
> change to run, but type one application and enter the detail page, i can see 
> it is actually runing.
> Is it an bug ? or i start spark with incorrect setting?
> Help!!!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20058) the running application status changed from running to waiting when a master is down and it change to another standy by master

2017-03-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936255#comment-15936255
 ] 

Saisai Shao edited comment on SPARK-20058 at 3/22/17 1:03 PM:
--

Please subscribe this spark user mail list and send the question to this mail 
list.


was (Author: jerryshao):
Please subscribe this spark user mail list and set the question to this mail 
list.

> the running application status changed  from running to waiting when a master 
> is down and it change to another standy by master
> ---
>
> Key: SPARK-20058
> URL: https://issues.apache.org/jira/browse/SPARK-20058
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: wangweidong
>Priority: Minor
>
> 1.I deployed the spark with cluster mode, test186 is master, test171 is a 
> backup  master, workers is test137, test155 and test138.
> 2, Start spark with command sbin/start-all.sh
> 3, submit my task with command bin/spark-submit --supervise --deployed-mode 
> cluster --master spark://test186:7077 etc.
> 4, view web ui ,enter test186:8080 , I can see my application is running 
> normally.
> 5, Stop the master of test186, after a period of time, view web ui with 
> test171(standby master), I see my application is waiting and it can not 
> change to run, but type one application and enter the detail page, i can see 
> it is actually runing.
> Is it an bug ? or i start spark with incorrect setting?
> Help!!!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20059) HbaseCredentialProvider uses wrong classloader

2017-03-22 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-20059:
---

 Summary: HbaseCredentialProvider uses wrong classloader
 Key: SPARK-20059
 URL: https://issues.apache.org/jira/browse/SPARK-20059
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 2.1.0, 2.2.0
Reporter: Saisai Shao


{{HBaseCredentialProvider}} uses system classloader instead of child 
classloader, which will make HBase jars specified with {{--jars}} fail to work, 
so here we should use the right class loader.

Besides in yarn client mode jars specified with {{--jars}} is not added into 
client's class path, which will make it fail to load HBase jars and issue 
tokens in our scenario. Also some customized credential provider cannot be 
registered into client.

So here I will fix this two issues.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19992) spark-submit on deployment-mode cluster

2017-03-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936241#comment-15936241
 ] 

Saisai Shao edited comment on SPARK-19992 at 3/22/17 12:51 PM:
---

Oh, I see. Check the code again, looks like "/*" cannot be worked with "local" 
schema, only hadoop support schema like hdfs, file could support glob path.

I agree this probably just a setup / env problem.


was (Author: jerryshao):
Oh, I see. Check the code again, looks like "/*" cannot be worked with "local" 
schema, only hadoop support schema like hdfs, file could support glob path.

> spark-submit on deployment-mode cluster
> ---
>
> Key: SPARK-19992
> URL: https://issues.apache.org/jira/browse/SPARK-19992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.2
> Environment: spark version 2.0.2
> hadoop version 2.6.0
>Reporter: narendra maru
>
> spark version 2.0.2
> hadoop version 2.6.0
> spark -submit command
> "spark-submit --class spark.mongohadoop.testing3 --master yarn --deploy-mode 
> cluster --jars /home/ec2-user/jars/hgmongonew.jar, 
> /home/ec2-user/jars/mongo-hadoop-spark-2.0.1.jar"
> after adding following in
> 1 Spark-default.conf
> spark.executor.extraJavaOptions -Dconfig.fuction.conf 
> spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*
> spark.eventLog.dir=hdfs://localhost:9000/user/spark/applicationHistory
> spark.eventLog.enabled=true
> 2yarn-site.xml
> 
> yarn.application.classpath
> 
> /usr/local/hadoop-2.6.0/etc/hadoop,
> /usr/local/hadoop-2.6.0/,
> /usr/local/hadoop-2.6.0/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/lib/
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/,
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/tools/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/*,
> /usr/local/spark-2.0.2-bin-hadoop2.6/jars/spark-yarn_2.11-2.0.2.jar
> 
> 
> Error on log:-
> Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ApplicationMaster
> Error on terminal:-
> diagnostics: Application application_1489673977198_0002 failed 2 times due to 
> AM Container for appattempt_1489673977198_0002_02 exited with exitCode: 1 
> For more detailed output, check application tracking 
> page:http://bdg-hdp-sparkmaster:8088/proxy/application_1489673977198_0002/Then,
>  click on links to logs of each attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19992) spark-submit on deployment-mode cluster

2017-03-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936241#comment-15936241
 ] 

Saisai Shao commented on SPARK-19992:
-

Oh, I see. Check the code again, looks like "/*" cannot be worked with "local" 
schema, only hadoop support schema like hdfs, file could support glob path.

> spark-submit on deployment-mode cluster
> ---
>
> Key: SPARK-19992
> URL: https://issues.apache.org/jira/browse/SPARK-19992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.2
> Environment: spark version 2.0.2
> hadoop version 2.6.0
>Reporter: narendra maru
>
> spark version 2.0.2
> hadoop version 2.6.0
> spark -submit command
> "spark-submit --class spark.mongohadoop.testing3 --master yarn --deploy-mode 
> cluster --jars /home/ec2-user/jars/hgmongonew.jar, 
> /home/ec2-user/jars/mongo-hadoop-spark-2.0.1.jar"
> after adding following in
> 1 Spark-default.conf
> spark.executor.extraJavaOptions -Dconfig.fuction.conf 
> spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*
> spark.eventLog.dir=hdfs://localhost:9000/user/spark/applicationHistory
> spark.eventLog.enabled=true
> 2yarn-site.xml
> 
> yarn.application.classpath
> 
> /usr/local/hadoop-2.6.0/etc/hadoop,
> /usr/local/hadoop-2.6.0/,
> /usr/local/hadoop-2.6.0/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/lib/
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/,
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/tools/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/*,
> /usr/local/spark-2.0.2-bin-hadoop2.6/jars/spark-yarn_2.11-2.0.2.jar
> 
> 
> Error on log:-
> Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ApplicationMaster
> Error on terminal:-
> diagnostics: Application application_1489673977198_0002 failed 2 times due to 
> AM Container for appattempt_1489673977198_0002_02 exited with exitCode: 1 
> For more detailed output, check application tracking 
> page:http://bdg-hdp-sparkmaster:8088/proxy/application_1489673977198_0002/Then,
>  click on links to logs of each attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19941) Spark should not schedule tasks on executors on decommissioning YARN nodes

2017-03-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932088#comment-15932088
 ] 

Saisai Shao commented on SPARK-19941:
-

I think this scenario is quite similar to container preemption. In container 
preemption scenario, AM can be informed from RM which containers will be 
preempted in the next 15 seconds (by default), and AM could react based on such 
information.

I made a similar PR to avoid scheduling tasks on the executors going to be 
preempted. But finally it got rejected because the main reason is that let to 
be preempted executors idle for 15 seconds is too long and waste the resources. 
In your description the executors will be idle for 60 seconds before 
decommission, so this will really waste the resource if most of the works could 
be done in 1 minutes on this executors.

Also I'm not sure why the job will be hang as you mentioned before. I think the 
failed tasks will be rerun again.

So IMHO I think it is better not to handle this scenario unless there's some 
bad problems we met. Sometimes the effort of rerun tasks is smaller than 
wasting the resources.

> Spark should not schedule tasks on executors on decommissioning YARN nodes
> --
>
> Key: SPARK-19941
> URL: https://issues.apache.org/jira/browse/SPARK-19941
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, YARN
>Affects Versions: 2.1.0
> Environment: Hadoop 2.8.0-rc1
>Reporter: Karthik Palaniappan
>
> Hadoop 2.8 added a mechanism to gracefully decommission Node Managers in 
> YARN: https://issues.apache.org/jira/browse/YARN-914
> Essentially you can mark nodes to be decommissioned, and let them a) finish 
> work in progress and b) finish serving shuffle data. But no new work will be 
> scheduled on the node.
> Spark should respect when NMs are set to decommissioned, and similarly 
> decommission executors on those nodes by not scheduling any more tasks on 
> them.
> It looks like in the future YARN may inform the app master when containers 
> will be killed: https://issues.apache.org/jira/browse/YARN-3784. However, I 
> don't think Spark should schedule based on a timeout. We should gracefully 
> decommission the executor as fast as possible (which is the spirit of 
> YARN-914). The app master can query the RM for NM statuses (if it doesn't 
> already have them) and stop scheduling on executors on NMs that are 
> decommissioning.
> Stretch feature: The timeout may be useful in determining whether running 
> further tasks on the executor is even helpful. Spark may be able to tell that 
> shuffle data will not be consumed by the time the node is decommissioned, so 
> it is not worth computing. The executor can be killed immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19995) Using real user to connect HiveMetastore in HiveClientImpl

2017-03-17 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-19995:
---

 Summary: Using real user to connect HiveMetastore in HiveClientImpl
 Key: SPARK-19995
 URL: https://issues.apache.org/jira/browse/SPARK-19995
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Saisai Shao


If user specify "--proxy-user" in kerberized environment with Hive catalog 
implementation, HiveClientImpl will try to connect hive metastore with current 
user. While we use real user to do kinit, this will make connection failure. We 
should change like what we did before in yarn code to use real user.

{noformat}
ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:420)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:236)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
at 
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:188)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:366)
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:270)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.(HiveExternalCatalog.scala:65)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173)
at 
org.apache.spark.sql.internal.SharedState.(SharedState.scala:86)
at 
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession

[jira] [Comment Edited] (SPARK-19992) spark-submit on deployment-mode cluster

2017-03-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929547#comment-15929547
 ] 

Saisai Shao edited comment on SPARK-19992 at 3/17/17 7:48 AM:
--

Looks like I guess wrong from the url you provided.

If you're trying to use 
"spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*", make sure 
the spark related jars existed under the same path in every node.


was (Author: jerryshao):
Looks like I guess wrong from the url you provided.

If you're trying to use 
"spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*", make sure 
these jars existed in every node.

> spark-submit on deployment-mode cluster
> ---
>
> Key: SPARK-19992
> URL: https://issues.apache.org/jira/browse/SPARK-19992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.2
> Environment: spark version 2.0.2
> hadoop version 2.6.0
>Reporter: narendra maru
>
> spark version 2.0.2
> hadoop version 2.6.0
> spark -submit command
> "spark-submit --class spark.mongohadoop.testing3 --master yarn --deploy-mode 
> cluster --jars /home/ec2-user/jars/hgmongonew.jar, 
> /home/ec2-user/jars/mongo-hadoop-spark-2.0.1.jar"
> after adding following in
> 1 Spark-default.conf
> spark.executor.extraJavaOptions -Dconfig.fuction.conf 
> spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*
> spark.eventLog.dir=hdfs://localhost:9000/user/spark/applicationHistory
> spark.eventLog.enabled=true
> 2yarn-site.xml
> 
> yarn.application.classpath
> 
> /usr/local/hadoop-2.6.0/etc/hadoop,
> /usr/local/hadoop-2.6.0/,
> /usr/local/hadoop-2.6.0/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/lib/
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/,
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/tools/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/*,
> /usr/local/spark-2.0.2-bin-hadoop2.6/jars/spark-yarn_2.11-2.0.2.jar
> 
> 
> Error on log:-
> Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ApplicationMaster
> Error on terminal:-
> diagnostics: Application application_1489673977198_0002 failed 2 times due to 
> AM Container for appattempt_1489673977198_0002_02 exited with exitCode: 1 
> For more detailed output, check application tracking 
> page:http://bdg-hdp-sparkmaster:8088/proxy/application_1489673977198_0002/Then,
>  click on links to logs of each attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19992) spark-submit on deployment-mode cluster

2017-03-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929547#comment-15929547
 ] 

Saisai Shao commented on SPARK-19992:
-

Looks like I guess wrong from the url you provided.

If you're trying to use 
"spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*", make sure 
these jars existed in every node.

> spark-submit on deployment-mode cluster
> ---
>
> Key: SPARK-19992
> URL: https://issues.apache.org/jira/browse/SPARK-19992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.2
> Environment: spark version 2.0.2
> hadoop version 2.6.0
>Reporter: narendra maru
>
> spark version 2.0.2
> hadoop version 2.6.0
> spark -submit command
> "spark-submit --class spark.mongohadoop.testing3 --master yarn --deploy-mode 
> cluster --jars /home/ec2-user/jars/hgmongonew.jar, 
> /home/ec2-user/jars/mongo-hadoop-spark-2.0.1.jar"
> after adding following in
> 1 Spark-default.conf
> spark.executor.extraJavaOptions -Dconfig.fuction.conf 
> spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*
> spark.eventLog.dir=hdfs://localhost:9000/user/spark/applicationHistory
> spark.eventLog.enabled=true
> 2yarn-site.xml
> 
> yarn.application.classpath
> 
> /usr/local/hadoop-2.6.0/etc/hadoop,
> /usr/local/hadoop-2.6.0/,
> /usr/local/hadoop-2.6.0/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/lib/
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/,
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/tools/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/*,
> /usr/local/spark-2.0.2-bin-hadoop2.6/jars/spark-yarn_2.11-2.0.2.jar
> 
> 
> Error on log:-
> Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ApplicationMaster
> Error on terminal:-
> diagnostics: Application application_1489673977198_0002 failed 2 times due to 
> AM Container for appattempt_1489673977198_0002_02 exited with exitCode: 1 
> For more detailed output, check application tracking 
> page:http://bdg-hdp-sparkmaster:8088/proxy/application_1489673977198_0002/Then,
>  click on links to logs of each attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19992) spark-submit on deployment-mode cluster

2017-03-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929544#comment-15929544
 ] 

Saisai Shao commented on SPARK-19992:
-

Are you using HDP environment, if so I guess need to configuration hdp.version 
in Spark, you could google it.

> spark-submit on deployment-mode cluster
> ---
>
> Key: SPARK-19992
> URL: https://issues.apache.org/jira/browse/SPARK-19992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.2
> Environment: spark version 2.0.2
> hadoop version 2.6.0
>Reporter: narendra maru
>
> spark version 2.0.2
> hadoop version 2.6.0
> spark -submit command
> "spark-submit --class spark.mongohadoop.testing3 --master yarn --deploy-mode 
> cluster --jars /home/ec2-user/jars/hgmongonew.jar, 
> /home/ec2-user/jars/mongo-hadoop-spark-2.0.1.jar"
> after adding following in
> 1 Spark-default.conf
> spark.executor.extraJavaOptions -Dconfig.fuction.conf 
> spark.yarn.jars=local:/usr/local/spark-2.0.2-bin-hadoop2.6/yarn/*
> spark.eventLog.dir=hdfs://localhost:9000/user/spark/applicationHistory
> spark.eventLog.enabled=true
> 2yarn-site.xml
> 
> yarn.application.classpath
> 
> /usr/local/hadoop-2.6.0/etc/hadoop,
> /usr/local/hadoop-2.6.0/,
> /usr/local/hadoop-2.6.0/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/,
> /usr/local/hadoop-2.6.0/share/hadoop/common/lib/
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/,
> /usr/local/hadoop-2.6.0/share/hadoop/hdfs/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/,
> /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/tools/lib/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/,
> /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/*,
> /usr/local/spark-2.0.2-bin-hadoop2.6/jars/spark-yarn_2.11-2.0.2.jar
> 
> 
> Error on log:-
> Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ApplicationMaster
> Error on terminal:-
> diagnostics: Application application_1489673977198_0002 failed 2 times due to 
> AM Container for appattempt_1489673977198_0002_02 exited with exitCode: 1 
> For more detailed output, check application tracking 
> page:http://bdg-hdp-sparkmaster:8088/proxy/application_1489673977198_0002/Then,
>  click on links to logs of each attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19143) API in Spark for distributing new delegation tokens (Improve delegation token handling in secure clusters)

2017-03-09 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904497#comment-15904497
 ] 

Saisai Shao commented on SPARK-19143:
-

Hi all, I wrote a rough design doc based on the comments above, here is the 
link 
(https://docs.google.com/document/d/1DFWGHu4_GJapbbfXGWsot_z_W9Wka_39DFNmg9r9SAI/edit?usp=sharing).

[~tgraves] [~vanzin] [~mridulm80] please review and comment, greatly appreciate 
your suggestions.

> API in Spark for distributing new delegation tokens (Improve delegation token 
> handling in secure clusters)
> --
>
> Key: SPARK-19143
> URL: https://issues.apache.org/jira/browse/SPARK-19143
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Ruslan Dautkhanov
>
> Spin off from SPARK-14743 and comments chain in [recent comments| 
> https://issues.apache.org/jira/browse/SPARK-5493?focusedCommentId=15802179&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15802179]
>  in SPARK-5493.
> Spark currently doesn't have a way for distribution new delegation tokens. 
> Quoting [~vanzin] from SPARK-5493 
> {quote}
> IIRC Livy doesn't yet support delegation token renewal. Once it reaches the 
> TTL, the session is unusable.
> There might be ways to hack support for that without changes in Spark, but 
> I'd like to see a proper API in Spark for distributing new delegation tokens. 
> I mentioned that in SPARK-14743, but although that bug is closed, that 
> particular feature hasn't been implemented yet.
> {quote}
> Other thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: question on Write Ahead Log (Spark Streaming )

2017-03-08 Thread Saisai Shao
IIUC, your scenario is quite like what currently ReliableKafkaReceiver
does. You can only send ack to the upstream source after WAL is persistent,
otherwise because of asynchronization of data processing and data
receiving, there's still a chance data could be lost if you send out ack
before WAL.

You could refer to ReliableKafkaReceiver.

On Thu, Mar 9, 2017 at 12:58 AM, kant kodali  wrote:

> Hi All,
>
> I am using a Receiver based approach. And I understand that spark
> streaming API's will convert the received data from receiver into blocks
> and these blocks that are in memory are also stored in WAL if one enables
> it. my upstream source which is not Kafka can also replay by which I mean
> if I don't send an ack to my upstream source it will resend it so I don't
> have to write the received data to WAL however I still need to enable WAL
> correct? because there are blocks that are in memory that needs to written
> to WAL so they can be recovered later.
>
> Thanks,
> kant
>


[jira] [Comment Edited] (SPARK-19812) YARN shuffle service fails to relocate recovery DB directories

2017-03-07 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900821#comment-15900821
 ] 

Saisai Shao edited comment on SPARK-19812 at 3/8/17 7:42 AM:
-

[~tgraves], I'm not quite sure what you mean here?

bq. The tests are using files rather then directories so it didn't catch. We 
need to fix the test also.

>From my understanding this issues happens when dest dir is not empty and try 
>to move with REPLACE_EXISTING. Also be happened when calling rename failed and 
>the source dir is not empty directory.

But I cannot imagine how this happened, from the log it is more like a `rename` 
failure issue, since the path in Exception points to source dir.



was (Author: jerryshao):
[~tgraves], I'm not quite sure what you mean here?

bq. The tests are using files rather then directories so it didn't catch. We 
need to fix the test also.

>From my understanding this issues happens when dest dir is not empty and try 
>to move with REPLACE_EXISTING. Also be happened when calling rename failed and 
>the source dir is not empty directory.

But I cannot imagine how this happened, because if dest dir is not empty, then 
it should be returned before, will not go to check old NM local dirs.



> YARN shuffle service fails to relocate recovery DB directories
> --
>
> Key: SPARK-19812
> URL: https://issues.apache.org/jira/browse/SPARK-19812
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> The yarn shuffle service tries to switch from the yarn local directories to 
> the real recovery directory but can fail to move the existing recovery db's.  
> It fails due to Files.move not doing directories that have contents.
> 2017-03-03 14:57:19,558 [main] ERROR yarn.YarnShuffleService: Failed to move 
> recovery file sparkShuffleRecovery.ldb to the path 
> /mapred/yarn-nodemanager/nm-aux-services/spark_shuffle
> java.nio.file.DirectoryNotEmptyException:/yarn-local/sparkShuffleRecovery.ldb
> at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:498)
> at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
> at java.nio.file.Files.move(Files.java:1395)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.initRecoveryDb(YarnShuffleService.java:369)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.createSecretManager(YarnShuffleService.java:200)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:174)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:262)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:357)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:636)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:684)
> This used to use f.renameTo and we switched it in the pr due to review 
> comments and it looks like didn't do a final real test. The tests are using 
> files rather then directories so it didn't catch. We need to fix the test 
> also.
> history: 
> https://github.com/apache/spark/pull/14999/commits/65de8531ccb91287f5a8a749c7819e99533b9440



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19812) YARN shuffle service fails to relocate recovery DB directories

2017-03-07 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900821#comment-15900821
 ] 

Saisai Shao edited comment on SPARK-19812 at 3/8/17 7:40 AM:
-

[~tgraves], I'm not quite sure what you mean here?

bq. The tests are using files rather then directories so it didn't catch. We 
need to fix the test also.

>From my understanding this issues happens when dest dir is not empty and try 
>to move with REPLACE_EXISTING. Also be happened when calling rename failed and 
>the source dir is not empty directory.

But I cannot imagine how this happened, because if dest dir is not empty, then 
it should be returned before, will not go to check old NM local dirs.




was (Author: jerryshao):
[~tgraves], I'm not quite sure what you mean here?

bq. The tests are using files rather then directories so it didn't catch. We 
need to fix the test also.

>From my understanding this issues happens when dest dir is not empty and try 
>to move with REPLACE_EXISTING, but I cannot imagine how this happened, because 
>if dest dir is not empty, then it should be returned before, will not go to 
>check old NM local dirs.

> YARN shuffle service fails to relocate recovery DB directories
> --
>
> Key: SPARK-19812
> URL: https://issues.apache.org/jira/browse/SPARK-19812
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> The yarn shuffle service tries to switch from the yarn local directories to 
> the real recovery directory but can fail to move the existing recovery db's.  
> It fails due to Files.move not doing directories that have contents.
> 2017-03-03 14:57:19,558 [main] ERROR yarn.YarnShuffleService: Failed to move 
> recovery file sparkShuffleRecovery.ldb to the path 
> /mapred/yarn-nodemanager/nm-aux-services/spark_shuffle
> java.nio.file.DirectoryNotEmptyException:/yarn-local/sparkShuffleRecovery.ldb
> at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:498)
> at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
> at java.nio.file.Files.move(Files.java:1395)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.initRecoveryDb(YarnShuffleService.java:369)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.createSecretManager(YarnShuffleService.java:200)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:174)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:262)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:357)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:636)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:684)
> This used to use f.renameTo and we switched it in the pr due to review 
> comments and it looks like didn't do a final real test. The tests are using 
> files rather then directories so it didn't catch. We need to fix the test 
> also.
> history: 
> https://github.com/apache/spark/pull/14999/commits/65de8531ccb91287f5a8a749c7819e99533b9440



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19812) YARN shuffle service fails to relocate recovery DB directories

2017-03-07 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900821#comment-15900821
 ] 

Saisai Shao commented on SPARK-19812:
-

[~tgraves], I'm not quite sure what you mean here?

bq. The tests are using files rather then directories so it didn't catch. We 
need to fix the test also.

>From my understanding this issues happens when dest dir is not empty and try 
>to move with REPLACE_EXISTING, but I cannot imagine how this happened, because 
>if dest dir is not empty, then it should be returned before, will not go to 
>check old NM local dirs.

> YARN shuffle service fails to relocate recovery DB directories
> --
>
> Key: SPARK-19812
> URL: https://issues.apache.org/jira/browse/SPARK-19812
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> The yarn shuffle service tries to switch from the yarn local directories to 
> the real recovery directory but can fail to move the existing recovery db's.  
> It fails due to Files.move not doing directories that have contents.
> 2017-03-03 14:57:19,558 [main] ERROR yarn.YarnShuffleService: Failed to move 
> recovery file sparkShuffleRecovery.ldb to the path 
> /mapred/yarn-nodemanager/nm-aux-services/spark_shuffle
> java.nio.file.DirectoryNotEmptyException:/yarn-local/sparkShuffleRecovery.ldb
> at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:498)
> at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
> at java.nio.file.Files.move(Files.java:1395)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.initRecoveryDb(YarnShuffleService.java:369)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.createSecretManager(YarnShuffleService.java:200)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:174)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:262)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:357)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:636)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:684)
> This used to use f.renameTo and we switched it in the pr due to review 
> comments and it looks like didn't do a final real test. The tests are using 
> files rather then directories so it didn't catch. We need to fix the test 
> also.
> history: 
> https://github.com/apache/spark/pull/14999/commits/65de8531ccb91287f5a8a749c7819e99533b9440



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19802) Remote History Server

2017-03-02 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893580#comment-15893580
 ] 

Saisai Shao commented on SPARK-19802:
-

Spark's {{ApplicationHistoryProvider}} is pluggable, user could implement their 
own provider and plug into Spark's history server. So you could implement a 
{{HistoryProvider}} you wanted out of Spark.

>From your description, this is more like a Hadoop ATS (Hadoop application 
>timeline server). We have an implementation of Timeline based history provider 
>for Spark's history server. The main feature is like what you mentioned query 
>through TCP, get the event and display on UI.

> Remote History Server
> -
>
> Key: SPARK-19802
> URL: https://issues.apache.org/jira/browse/SPARK-19802
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Ben Barnard
>
> Currently the history server expects to find history in a filesystem 
> somewhere. It would be nice to have a history server that listens for 
> application events on a TCP port, and have a EventLoggingListener that sends 
> events to the listening history server instead of writing to a file. This 
> would allow the history server to show up-to-date history for past and 
> running jobs in a cluster environment that lacks a shared filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: How to use ManualClock with Spark streaming

2017-02-28 Thread Saisai Shao
I don't think using ManualClock is a right way to fix your problem here in
Spark Streaming.

ManualClock in Spark is mainly used for unit test, it should manually
advance the time to make the unit test work. The usage looks different
compared to the scenario you mentioned.

Thanks
Jerry

On Tue, Feb 28, 2017 at 10:53 PM, Hemalatha A <
hemalatha.amru...@googlemail.com> wrote:

>
> Hi,
>
> I am running streaming application reading data from kafka and performing
> window operations on it. I have a usecase where  all incoming events have a
> fixed latency of 10s, which means data belonging to minute 10:00:00 will
> arrive 10s late at 10:00:10.
>
> I want to set the spark clock to "Manualclock" and set the time behind by
> 10s so that the batch calculation triggers at 10:00:10, during which time
> all the events for the previous minute has arrived.
>
> But, I see that "spark.streaming.clock" is hardcoded to "
> org.apache.spark.util.SystemClock" in the code.
>
> Is there a way to easily  hack this property to use Manual clock.
> --
>
>
> Regards
> Hemalatha
>


[jira] [Commented] (SPARK-19750) Spark UI http -> https redirect error

2017-02-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887204#comment-15887204
 ] 

Saisai Shao commented on SPARK-19750:
-

This issue was found by [~yeshavora], credits to her.

> Spark UI http -> https redirect error
> -
>
> Key: SPARK-19750
> URL: https://issues.apache.org/jira/browse/SPARK-19750
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Saisai Shao
>
> Spark's http redirect uses port 0 as a secure port to do redirect if http 
> port is not set, this will introduce {{ java.net.NoRouteToHostException: 
> Can't assign requested address }}, so here fixed to use bound port for 
> redirect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: spark.speculation setting support on standalone mode?

2017-02-27 Thread Saisai Shao
I think it should be. These configurations doesn't depend on specific
cluster manager use chooses.



On Tue, Feb 28, 2017 at 4:42 AM, satishl  wrote:

> Are spark.speculation and related settings supported on standalone mode?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/spark-speculation-setting-support-
> on-standalone-mode-tp28433.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[jira] [Created] (SPARK-19750) Spark UI http -> https redirect error

2017-02-27 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-19750:
---

 Summary: Spark UI http -> https redirect error
 Key: SPARK-19750
 URL: https://issues.apache.org/jira/browse/SPARK-19750
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.1.0, 2.0.2
Reporter: Saisai Shao


Spark's http redirect uses port 0 as a secure port to do redirect if http port 
is not set, this will introduce {{ java.net.NoRouteToHostException: Can't 
assign requested address }}, so here fixed to use bound port for redirect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19688) Spark on Yarn Credentials File set to different application directory

2017-02-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882027#comment-15882027
 ] 

Saisai Shao commented on SPARK-19688:
-

According to my test, "spark.yarn.credentials.file" will be overwritten in 
yarn-client to point to a correct path when launching application 
(https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L737).
 So even Spark Streaming checkpoint still keeps the old configuration, it will 
be overwritten when the new application is started. So I don't see an issue 
here except this weird setting.

> Spark on Yarn Credentials File set to different application directory
> -
>
> Key: SPARK-19688
> URL: https://issues.apache.org/jira/browse/SPARK-19688
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, YARN
>Affects Versions: 1.6.3
>Reporter: Devaraj Jonnadula
>Priority: Minor
>
> spark.yarn.credentials.file property is set to different application Id 
> instead of actual Application Id 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19707) Improve the invalid path check for sc.addJar

2017-02-23 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-19707:

Summary: Improve the invalid path check for sc.addJar  (was: Improve the 
invalid path handling for sc.addJar)

> Improve the invalid path check for sc.addJar
> 
>
> Key: SPARK-19707
> URL: https://issues.apache.org/jira/browse/SPARK-19707
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>    Reporter: Saisai Shao
>
> Currently in Spark there're two issues when we add jars with invalid path:
> * If the jar path is a empty string {--jar ",dummy.jar"}, then Spark will 
> resolve it to the current directory path and add to classpath / file server, 
> which is unwanted.
> * If the jar path is a invalid path (file doesn't exist), file server doesn't 
> check this and will still added file server, the exception will be thrown 
> until job is running. This local path could be checked immediately, no need 
> to wait until task running. We have similar check in {{addFile}}, but lacks 
> similar one in {{addJar}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19707) Improve the invalid path handling for sc.addJar

2017-02-23 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-19707:
---

 Summary: Improve the invalid path handling for sc.addJar
 Key: SPARK-19707
 URL: https://issues.apache.org/jira/browse/SPARK-19707
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Saisai Shao


Currently in Spark there're two issues when we add jars with invalid path:

* If the jar path is a empty string {--jar ",dummy.jar"}, then Spark will 
resolve it to the current directory path and add to classpath / file server, 
which is unwanted.
* If the jar path is a invalid path (file doesn't exist), file server doesn't 
check this and will still added file server, the exception will be thrown until 
job is running. This local path could be checked immediately, no need to wait 
until task running. We have similar check in {{addFile}}, but lacks similar one 
in {{addJar}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19688) Spark on Yarn Credentials File set to different application directory

2017-02-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879893#comment-15879893
 ] 

Saisai Shao commented on SPARK-19688:
-

I see. So what issue did you encounter when you restart the application 
manually, or you just saw the abnormal credential configuration?

>From my understanding, this credential configuration will be overwritten when 
>you restart the application, so it should be fine.

> Spark on Yarn Credentials File set to different application directory
> -
>
> Key: SPARK-19688
> URL: https://issues.apache.org/jira/browse/SPARK-19688
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, YARN
>Affects Versions: 1.6.3
>Reporter: Devaraj Jonnadula
>Priority: Minor
>
> spark.yarn.credentials.file property is set to different application Id 
> instead of actual Application Id 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19688) Spark on Yarn Credentials File set to different application directory

2017-02-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879883#comment-15879883
 ] 

Saisai Shao commented on SPARK-19688:
-

[~j.devaraj], when you say Spark application is restarted, are you pointing to 
yarn's reattempt mechanism or you manually restart the application?

> Spark on Yarn Credentials File set to different application directory
> -
>
> Key: SPARK-19688
> URL: https://issues.apache.org/jira/browse/SPARK-19688
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, YARN
>Affects Versions: 1.6.3
>Reporter: Devaraj Jonnadula
>Priority: Minor
>
> spark.yarn.credentials.file property is set to different application Id 
> instead of actual Application Id 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Saisai Shao
It is too verbose, and will significantly increase the size event log.

Here is the comment in the code:

// No-op because logging every update would be overkill
> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>
>
On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari 
wrote:

> Thanks a lot the information!
>
> Is there any reason why EventLoggingListener ignore this event?
>
> *Thanks,*
>
>
> *​Parag​*
>
> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao 
> wrote:
>
>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>> will not be written into event-log, I think that's why you cannot get such
>> info in history server.
>>
>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari 
>> wrote:
>>
>>> Hi,
>>>
>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>
>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>> myrdd.setName("test")
>>> myrdd.cache
>>> myrdd.collect
>>>
>>> But I am not able to see any RDD info in "storage" tab in spark history
>>> server.
>>>
>>> I looked at this
>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>> but it is not helping as I have exact similar program mentioned there. Can
>>> anyone help?
>>>
>>>
>>> *Thanks,*
>>>
>>> *​Parag​*
>>>
>>
>>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Saisai Shao
AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it will
not be written into event-log, I think that's why you cannot get such info
in history server.

On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari 
wrote:

> Hi,
>
> I am running spark shell in spark version 2.0.2. Here is my program,
>
> var myrdd = sc.parallelize(Array.range(1, 10))
> myrdd.setName("test")
> myrdd.cache
> myrdd.collect
>
> But I am not able to see any RDD info in "storage" tab in spark history
> server.
>
> I looked at this
> 
> but it is not helping as I have exact similar program mentioned there. Can
> anyone help?
>
>
> *Thanks,*
>
> *​Parag​*
>


[jira] [Commented] (SPARK-19688) Spark on Yarn Credentials File set to different application directory

2017-02-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879564#comment-15879564
 ] 

Saisai Shao commented on SPARK-19688:
-

I see, so we should exclude this configuration in checkpoint and make it 
re-configured after restarted.

> Spark on Yarn Credentials File set to different application directory
> -
>
> Key: SPARK-19688
> URL: https://issues.apache.org/jira/browse/SPARK-19688
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, YARN
>Affects Versions: 1.6.3
>Reporter: Devaraj Jonnadula
>Priority: Minor
>
> spark.yarn.credentials.file property is set to different application Id 
> instead of actual Application Id 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19688) Spark on Yarn Credentials File set to different application directory

2017-02-22 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-19688:

Component/s: DStreams

> Spark on Yarn Credentials File set to different application directory
> -
>
> Key: SPARK-19688
> URL: https://issues.apache.org/jira/browse/SPARK-19688
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, YARN
>Affects Versions: 1.6.3
>Reporter: Devaraj Jonnadula
>Priority: Minor
>
> spark.yarn.credentials.file property is set to different application Id 
> instead of actual Application Id 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19688) Spark on Yarn Credentials File set to different application directory

2017-02-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877587#comment-15877587
 ] 

Saisai Shao commented on SPARK-19688:
-

Can you please elaborate the problem you met, otherwise it is hard for others 
to identify.

Also "spark.yarn.credentials.file" is a internal configuration, usually user 
should not configure it.

> Spark on Yarn Credentials File set to different application directory
> -
>
> Key: SPARK-19688
> URL: https://issues.apache.org/jira/browse/SPARK-19688
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.3
>Reporter: Devaraj Jonnadula
>Priority: Minor
>
> spark.yarn.credentials.file property is set to different application Id 
> instead of actual Application Id 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19649) Spark YARN client throws exception if job succeeds and max-completed-applications=0

2017-02-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874101#comment-15874101
 ] 

Saisai Shao commented on SPARK-19649:
-

MapReduce could delegate to history server to query the state again, this may 
not be applied to Spark.

> Spark YARN client throws exception if job succeeds and 
> max-completed-applications=0
> ---
>
> Key: SPARK-19649
> URL: https://issues.apache.org/jira/browse/SPARK-19649
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.3
> Environment: EMR release label 4.8.x
>Reporter: Joshua Caplan
>Priority: Minor
>
> I believe the patch in SPARK-3877 created a new race condition between YARN 
> and the Spark client.
> I typically configure YARN not to keep *any* recent jobs in memory, as some 
> of my jobs get pretty large.
> {code}
> yarn-site yarn.resourcemanager.max-completed-applications 0
> {code}
> The once-per-second call to getApplicationReport may thus encounter a RUNNING 
> application followed by a not found application, and report a false negative.
> (typical) Executor log:
> {code}
> 17/01/09 19:31:23 INFO ApplicationMaster: Final app status: SUCCEEDED, 
> exitCode: 0
> 17/01/09 19:31:23 INFO SparkContext: Invoking stop() from shutdown hook
> 17/01/09 19:31:24 INFO SparkUI: Stopped Spark web UI at 
> http://10.0.0.168:37046
> 17/01/09 19:31:24 INFO YarnClusterSchedulerBackend: Shutting down all 
> executors
> 17/01/09 19:31:24 INFO YarnClusterSchedulerBackend: Asking each executor to 
> shut down
> 17/01/09 19:31:24 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 17/01/09 19:31:24 INFO MemoryStore: MemoryStore cleared
> 17/01/09 19:31:24 INFO BlockManager: BlockManager stopped
> 17/01/09 19:31:24 INFO BlockManagerMaster: BlockManagerMaster stopped
> 17/01/09 19:31:24 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 17/01/09 19:31:24 INFO SparkContext: Successfully stopped SparkContext
> 17/01/09 19:31:24 INFO ApplicationMaster: Unregistering ApplicationMaster 
> with SUCCEEDED
> 17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Shutting 
> down remote daemon.
> 17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Remote 
> daemon shut down; proceeding with flushing remote transports.
> 17/01/09 19:31:24 INFO AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Remoting 
> shut down.
> {code}
> Client log:
> {code}
> 17/01/09 19:31:23 INFO Client: Application report for 
> application_1483983939941_0056 (state: RUNNING)
> 17/01/09 19:31:24 ERROR Client: Application application_1483983939941_0056 
> not found.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1483983939941_0056 is killed
>   at org.apache.spark.deploy.yarn.Client.run(Client.scala:1038)
>   at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
>   at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19649) Spark YARN client throws exception if job succeeds and max-completed-applications=0

2017-02-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874098#comment-15874098
 ] 

Saisai Shao commented on SPARK-19649:
-

>From my understanding, this looks like a deliberately made exception, since 
>you configured  {{max-completed-applications}} to 0, so you will have a great 
>chance to not catch finished state for this application in RM., because Spark 
>queries the state from RM in a polling mechanism. This looks like not a race 
>condition issue, unless RM could push finished state actively to Spark, then 
>it is hard to catch the finished state from Spark side.

> Spark YARN client throws exception if job succeeds and 
> max-completed-applications=0
> ---
>
> Key: SPARK-19649
> URL: https://issues.apache.org/jira/browse/SPARK-19649
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.3
> Environment: EMR release label 4.8.x
>Reporter: Joshua Caplan
>Priority: Minor
>
> I believe the patch in SPARK-3877 created a new race condition between YARN 
> and the Spark client.
> I typically configure YARN not to keep *any* recent jobs in memory, as some 
> of my jobs get pretty large.
> {code}
> yarn-site yarn.resourcemanager.max-completed-applications 0
> {code}
> The once-per-second call to getApplicationReport may thus encounter a RUNNING 
> application followed by a not found application, and report a false negative.
> (typical) Executor log:
> {code}
> 17/01/09 19:31:23 INFO ApplicationMaster: Final app status: SUCCEEDED, 
> exitCode: 0
> 17/01/09 19:31:23 INFO SparkContext: Invoking stop() from shutdown hook
> 17/01/09 19:31:24 INFO SparkUI: Stopped Spark web UI at 
> http://10.0.0.168:37046
> 17/01/09 19:31:24 INFO YarnClusterSchedulerBackend: Shutting down all 
> executors
> 17/01/09 19:31:24 INFO YarnClusterSchedulerBackend: Asking each executor to 
> shut down
> 17/01/09 19:31:24 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 17/01/09 19:31:24 INFO MemoryStore: MemoryStore cleared
> 17/01/09 19:31:24 INFO BlockManager: BlockManager stopped
> 17/01/09 19:31:24 INFO BlockManagerMaster: BlockManagerMaster stopped
> 17/01/09 19:31:24 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 17/01/09 19:31:24 INFO SparkContext: Successfully stopped SparkContext
> 17/01/09 19:31:24 INFO ApplicationMaster: Unregistering ApplicationMaster 
> with SUCCEEDED
> 17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Shutting 
> down remote daemon.
> 17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Remote 
> daemon shut down; proceeding with flushing remote transports.
> 17/01/09 19:31:24 INFO AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Remoting 
> shut down.
> {code}
> Client log:
> {code}
> 17/01/09 19:31:23 INFO Client: Application report for 
> application_1483983939941_0056 (state: RUNNING)
> 17/01/09 19:31:24 ERROR Client: Application application_1483983939941_0056 
> not found.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1483983939941_0056 is killed
>   at org.apache.spark.deploy.yarn.Client.run(Client.scala:1038)
>   at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
>   at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19588) Allow putting keytab file to HDFS location specified in spark.yarn.keytab

2017-02-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865434#comment-15865434
 ] 

Saisai Shao commented on SPARK-19588:
-

Putting on HDFS still requires downloading to local disk for 
driver/yarn#client, since driver/yarn#client is not in control by yarn, so 
there's no difference whether putting it locally or on HDFS.



> Allow putting keytab file to HDFS location specified in spark.yarn.keytab
> -
>
> Key: SPARK-19588
> URL: https://issues.apache.org/jira/browse/SPARK-19588
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, Spark Submit
>Affects Versions: 2.0.2, 2.1.0
> Environment: kerberized cluster, Spark 2
>Reporter: Ruslan Dautkhanov
>  Labels: authentication, kerberos, security, yarn-client
>
> As a workaround for SPARK-19038 tried putting keytab in user's home directory 
> in HDFS but this fails with 
> {noformat}
> Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
> hdfs:///user/svc_odiprd/.kt does not exist
> at 
> org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:555)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:158)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {noformat}
> This is yarn-client mode, so driver probably can't see HDFS while submitting 
> a job; although I suspect it doesn't not only with yarn-client.
> Would be great to support reading keytab for kerberos ticket renewals 
> directly from HDFS.
> We think that in some scenarios it's more secure than referencing a keytab 
> from a local fs on a client machine that does a spark-submit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19579) spark-submit fails to run Kafka Stream python script

2017-02-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865360#comment-15865360
 ] 

Saisai Shao commented on SPARK-19579:
-

Only this specific part is not supported, all the that Python APIs are still 
supported.

> spark-submit fails to run Kafka Stream python script
> 
>
> Key: SPARK-19579
> URL: https://issues.apache.org/jira/browse/SPARK-19579
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.1.0
> Environment: Linux Ubuntu 16.10 64bit
>Reporter: Piotr Nestorow
>
> Kafka Stream python script is executed but it fails with:
> TypeError: 'JavaPackage' object is not callable
> The Spark Kafka streaming jar is provided:
> spark-streaming-kafka-0-10_2.11-2.1.0.jar
> Kafka version: kafka_2.11-0.10.1.1
> In the conf/spark-defaults.conf:
> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.1.0
> Also at runtime:
> bin/spark-submit --jars 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar  
> --driver-class-path 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar 
> /home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py  
> localhost:2181 example_topic
> Also the Spark Kafka streaming example:
> 'examples/src/main/python/streaming/kafka_wordcount.py'
> can be used to check this problem.
> Nothing helps...
> Detailed output:
> _
>   Spark Streaming's Kafka libraries not found in class path. Try one of the 
> following.
>   1. Include the Kafka library and its dependencies with in the
>  spark-submit command as
>  $ bin/spark-submit --packages 
> org.apache.spark:spark-streaming-kafka-0-8:2.1.0 ...
>   2. Download the JAR of the artifact from Maven Central 
> http://search.maven.org/,
>  Group Id = org.apache.spark, Artifact Id = 
> spark-streaming-kafka-0-8-assembly, Version = 2.1.0.
>  Then, include the jar in the spark-submit command as
>  $ bin/spark-submit --jars  ...
> 
> Traceback (most recent call last):
>   File "/home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py", line 
> 18, in 
> kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", 
> {topic: 1})
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 69, in createStream
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 195, in _get_helper
> TypeError: 'JavaPackage' object is not callable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-19579) spark-submit fails to run Kafka Stream python script

2017-02-14 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao closed SPARK-19579.
---
Resolution: Won't Fix

> spark-submit fails to run Kafka Stream python script
> 
>
> Key: SPARK-19579
> URL: https://issues.apache.org/jira/browse/SPARK-19579
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.1.0
> Environment: Linux Ubuntu 16.10 64bit
>Reporter: Piotr Nestorow
>
> Kafka Stream python script is executed but it fails with:
> TypeError: 'JavaPackage' object is not callable
> The Spark Kafka streaming jar is provided:
> spark-streaming-kafka-0-10_2.11-2.1.0.jar
> Kafka version: kafka_2.11-0.10.1.1
> In the conf/spark-defaults.conf:
> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.1.0
> Also at runtime:
> bin/spark-submit --jars 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar  
> --driver-class-path 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar 
> /home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py  
> localhost:2181 example_topic
> Also the Spark Kafka streaming example:
> 'examples/src/main/python/streaming/kafka_wordcount.py'
> can be used to check this problem.
> Nothing helps...
> Detailed output:
> _
>   Spark Streaming's Kafka libraries not found in class path. Try one of the 
> following.
>   1. Include the Kafka library and its dependencies with in the
>  spark-submit command as
>  $ bin/spark-submit --packages 
> org.apache.spark:spark-streaming-kafka-0-8:2.1.0 ...
>   2. Download the JAR of the artifact from Maven Central 
> http://search.maven.org/,
>  Group Id = org.apache.spark, Artifact Id = 
> spark-streaming-kafka-0-8-assembly, Version = 2.1.0.
>  Then, include the jar in the spark-submit command as
>  $ bin/spark-submit --jars  ...
> 
> Traceback (most recent call last):
>   File "/home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py", line 
> 18, in 
> kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", 
> {topic: 1})
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 69, in createStream
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 195, in _get_helper
> TypeError: 'JavaPackage' object is not callable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19579) spark-submit fails to run Kafka Stream python script

2017-02-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865360#comment-15865360
 ] 

Saisai Shao edited comment on SPARK-19579 at 2/14/17 8:33 AM:
--

Only this specific part is not supported, all the other Python APIs are still 
supported.


was (Author: jerryshao):
Only this specific part is not supported, all the that Python APIs are still 
supported.

> spark-submit fails to run Kafka Stream python script
> 
>
> Key: SPARK-19579
> URL: https://issues.apache.org/jira/browse/SPARK-19579
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.1.0
> Environment: Linux Ubuntu 16.10 64bit
>Reporter: Piotr Nestorow
>
> Kafka Stream python script is executed but it fails with:
> TypeError: 'JavaPackage' object is not callable
> The Spark Kafka streaming jar is provided:
> spark-streaming-kafka-0-10_2.11-2.1.0.jar
> Kafka version: kafka_2.11-0.10.1.1
> In the conf/spark-defaults.conf:
> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.1.0
> Also at runtime:
> bin/spark-submit --jars 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar  
> --driver-class-path 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar 
> /home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py  
> localhost:2181 example_topic
> Also the Spark Kafka streaming example:
> 'examples/src/main/python/streaming/kafka_wordcount.py'
> can be used to check this problem.
> Nothing helps...
> Detailed output:
> _
>   Spark Streaming's Kafka libraries not found in class path. Try one of the 
> following.
>   1. Include the Kafka library and its dependencies with in the
>  spark-submit command as
>  $ bin/spark-submit --packages 
> org.apache.spark:spark-streaming-kafka-0-8:2.1.0 ...
>   2. Download the JAR of the artifact from Maven Central 
> http://search.maven.org/,
>  Group Id = org.apache.spark, Artifact Id = 
> spark-streaming-kafka-0-8-assembly, Version = 2.1.0.
>  Then, include the jar in the spark-submit command as
>  $ bin/spark-submit --jars  ...
> 
> Traceback (most recent call last):
>   File "/home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py", line 
> 18, in 
> kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", 
> {topic: 1})
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 69, in createStream
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 195, in _get_helper
> TypeError: 'JavaPackage' object is not callable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19579) spark-submit fails to run Kafka Stream python script

2017-02-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865346#comment-15865346
 ] 

Saisai Shao commented on SPARK-19579:
-

The support of Python API is rejected by the community, so there will no 
roadmap for it.

> spark-submit fails to run Kafka Stream python script
> 
>
> Key: SPARK-19579
> URL: https://issues.apache.org/jira/browse/SPARK-19579
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.1.0
> Environment: Linux Ubuntu 16.10 64bit
>Reporter: Piotr Nestorow
>
> Kafka Stream python script is executed but it fails with:
> TypeError: 'JavaPackage' object is not callable
> The Spark Kafka streaming jar is provided:
> spark-streaming-kafka-0-10_2.11-2.1.0.jar
> Kafka version: kafka_2.11-0.10.1.1
> In the conf/spark-defaults.conf:
> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.1.0
> Also at runtime:
> bin/spark-submit --jars 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar  
> --driver-class-path 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar 
> /home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py  
> localhost:2181 example_topic
> Also the Spark Kafka streaming example:
> 'examples/src/main/python/streaming/kafka_wordcount.py'
> can be used to check this problem.
> Nothing helps...
> Detailed output:
> _
>   Spark Streaming's Kafka libraries not found in class path. Try one of the 
> following.
>   1. Include the Kafka library and its dependencies with in the
>  spark-submit command as
>  $ bin/spark-submit --packages 
> org.apache.spark:spark-streaming-kafka-0-8:2.1.0 ...
>   2. Download the JAR of the artifact from Maven Central 
> http://search.maven.org/,
>  Group Id = org.apache.spark, Artifact Id = 
> spark-streaming-kafka-0-8-assembly, Version = 2.1.0.
>  Then, include the jar in the spark-submit command as
>  $ bin/spark-submit --jars  ...
> 
> Traceback (most recent call last):
>   File "/home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py", line 
> 18, in 
> kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", 
> {topic: 1})
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 69, in createStream
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 195, in _get_helper
> TypeError: 'JavaPackage' object is not callable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19579) spark-submit fails to run Kafka Stream python script

2017-02-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865330#comment-15865330
 ] 

Saisai Shao commented on SPARK-19579:
-

If you're using Python API to writing Streaming application, then it is not 
supported.

> spark-submit fails to run Kafka Stream python script
> 
>
> Key: SPARK-19579
> URL: https://issues.apache.org/jira/browse/SPARK-19579
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.1.0
> Environment: Linux Ubuntu 16.10 64bit
>Reporter: Piotr Nestorow
>
> Kafka Stream python script is executed but it fails with:
> TypeError: 'JavaPackage' object is not callable
> The Spark Kafka streaming jar is provided:
> spark-streaming-kafka-0-10_2.11-2.1.0.jar
> Kafka version: kafka_2.11-0.10.1.1
> In the conf/spark-defaults.conf:
> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.1.0
> Also at runtime:
> bin/spark-submit --jars 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar  
> --driver-class-path 
> /usr/local/spark/jars/spark-streaming-kafka-0-10_2.11-2.1.0.jar 
> /home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py  
> localhost:2181 example_topic
> Also the Spark Kafka streaming example:
> 'examples/src/main/python/streaming/kafka_wordcount.py'
> can be used to check this problem.
> Nothing helps...
> Detailed output:
> _
>   Spark Streaming's Kafka libraries not found in class path. Try one of the 
> following.
>   1. Include the Kafka library and its dependencies with in the
>  spark-submit command as
>  $ bin/spark-submit --packages 
> org.apache.spark:spark-streaming-kafka-0-8:2.1.0 ...
>   2. Download the JAR of the artifact from Maven Central 
> http://search.maven.org/,
>  Group Id = org.apache.spark, Artifact Id = 
> spark-streaming-kafka-0-8-assembly, Version = 2.1.0.
>  Then, include the jar in the spark-submit command as
>  $ bin/spark-submit --jars  ...
> 
> Traceback (most recent call last):
>   File "/home/sysveradmin/work/Programs/ApacheSpark/ex_kafka_stream.py", line 
> 18, in 
> kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", 
> {topic: 1})
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 69, in createStream
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 195, in _get_helper
> TypeError: 'JavaPackage' object is not callable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Review Request 56641: Change Livy recovery folder permission to 0700

2017-02-13 Thread Saisai Shao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56641/
---

Review request for Ambari, Jayush Luniya and Sumit Mohanty.


Bugs: AMBARI-17
https://issues.apache.org/jira/browse/AMBARI-17


Repository: ambari


Description
---

Change Livy recovery folder permission to 0700


Diffs
-

  
ambari-server/src/main/resources/common-services/SPARK/1.2.1/package/scripts/setup_livy.py
 32615c3 
  
ambari-server/src/main/resources/common-services/SPARK2/2.0.0/package/scripts/setup_livy2.py
 2e92509 
  ambari-server/src/test/python/stacks/2.5/SPARK/test_spark_livy.py b9199c7 
  ambari-server/src/test/python/stacks/2.6/SPARK2/test_spark_livy2.py 75aec84 

Diff: https://reviews.apache.org/r/56641/diff/


Testing
---

Manual verification.


Thanks,

Saisai Shao



<    7   8   9   10   11   12   13   14   15   16   >