[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Component/s: Spark Core

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.

[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Component/s: (was: Spark Core)
 YARN

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>

[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Priority: Major  (was: Blocker)

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.api.

[jira] [Commented] (SPARK-24487) Add support for RabbitMQ.

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505608#comment-16505608
 ] 

Saisai Shao commented on SPARK-24487:
-

What's the usage and purpose to integrate RabbitMQ to Spark?

> Add support for RabbitMQ.
> -
>
> Key: SPARK-24487
> URL: https://issues.apache.org/jira/browse/SPARK-24487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Michał Jurkiewicz
>Priority: Major
>
> Add support for RabbitMQ.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505607#comment-16505607
 ] 

Saisai Shao commented on HIVE-16391:


Any comment [~vanzin] [~ste...@apache.org]?

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (LIVY-476) Changing log-level INFO

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reopened LIVY-476:
--

> Changing log-level INFO
> ---
>
> Key: LIVY-476
> URL: https://issues.apache.org/jira/browse/LIVY-476
> Project: Livy
>  Issue Type: Question
>  Components: API
>Affects Versions: 0.4.0
> Environment: Azure 
>Reporter: Antoine Ly
>Priority: Major
> Attachments: scrren.jpg
>
>
> I am using the LIVY API with a web application. We use it to submit spark job 
> then collect results to print them on the front page. To do so, we have to 
> interpret the LIVY callback.
> However, despite we changed the log level (log4j params changed to ERROR 
> instead of INFO), we get INFO message that prevent the application to 
> interpret properly the LIVY output. (see image below). It happens only with 
> RSpark session.
> We also tried to to change log4j log level directly in hdfs and spark but no 
> success...
> Could you please help on that to be able to remove the underlined lines below?
>  
> Thanks a lot.
> !scrren.jpg!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-476) Changing log-level INFO

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-476.
--
Resolution: Not A Problem

> Changing log-level INFO
> ---
>
> Key: LIVY-476
> URL: https://issues.apache.org/jira/browse/LIVY-476
> Project: Livy
>  Issue Type: Question
>  Components: API
>Affects Versions: 0.4.0
> Environment: Azure 
>Reporter: Antoine Ly
>Priority: Major
> Attachments: scrren.jpg
>
>
> I am using the LIVY API with a web application. We use it to submit spark job 
> then collect results to print them on the front page. To do so, we have to 
> interpret the LIVY callback.
> However, despite we changed the log level (log4j params changed to ERROR 
> instead of INFO), we get INFO message that prevent the application to 
> interpret properly the LIVY output. (see image below). It happens only with 
> RSpark session.
> We also tried to to change log4j log level directly in hdfs and spark but no 
> success...
> Could you please help on that to be able to remove the underlined lines below?
>  
> Thanks a lot.
> !scrren.jpg!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-476) Changing log-level INFO

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504545#comment-16504545
 ] 

Saisai Shao commented on LIVY-476:
--

Not super sure about your issues, can you please explain more?

> Changing log-level INFO
> ---
>
> Key: LIVY-476
> URL: https://issues.apache.org/jira/browse/LIVY-476
> Project: Livy
>  Issue Type: Question
>  Components: API
>Affects Versions: 0.4.0
> Environment: Azure 
>Reporter: Antoine Ly
>Priority: Major
> Attachments: scrren.jpg
>
>
> I am using the LIVY API with a web application. We use it to submit spark job 
> then collect results to print them on the front page. To do so, we have to 
> interpret the LIVY callback.
> However, despite we changed the log level (log4j params changed to ERROR 
> instead of INFO), we get INFO message that prevent the application to 
> interpret properly the LIVY output. (see image below). It happens only with 
> RSpark session.
> We also tried to to change log4j log level directly in hdfs and spark but no 
> success...
> Could you please help on that to be able to remove the underlined lines below?
>  
> Thanks a lot.
> !scrren.jpg!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503188#comment-16503188
 ] 

Saisai Shao commented on HIVE-16391:


Uploaded a new patch [^HIVE-16391.1.patch]to use the solution mentioned by 
Marcelo.

Simply by adding two new maven modules and rename the original "hive-exec" 
module. One added module is new "hive-exec" which is compliant to existing 
Hive, another added module "hive-exec-spark" is specifically for Spark.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-06 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated HIVE-16391:
---
Attachment: HIVE-16391.1.patch

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502976#comment-16502976
 ] 

Saisai Shao commented on HIVE-16391:


[~vanzin] one problem about your proposed solution: hive-exec test jar is not 
valid anymore, because we changed the artifact name for the current "hive-exec" 
pom. This might affect the user who relies on this test jar.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502756#comment-16502756
 ] 

Saisai Shao commented on HIVE-16391:


{quote}The problem with that is that it changes the meaning of Hive's 
artifacts, so anybody currently importing hive-exec would see a breakage, and 
that's probably not desired.
{quote}
 
 This might not be acceptable from Hive community, because it will break the 
current user as you mentioned.

As [~joshrosen] mentioned, Spark wants the hive-exec jar which shades kryo and 
prototuf-java, not a pure non-shaded jar.
{quote}Another option is to change the artifact name of the current "hive-exec" 
pom. Then you'd publish the normal jar under the new artifact name, then have a 
separate module that imports that jar, shades dependencies, and publishes the 
result as "hive-exec". That would maintain compatibility with existing 
artifacts.
{quote}
I can try this approach, but it seems not a small change for Hive, I'm not sure 
if Hive community will accept such approach (at least for branch 1.2).

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated HIVE-16391:
---
Fix Version/s: 1.2.3
Affects Version/s: 1.2.2
   Attachment: HIVE-16391.patch
   Status: Patch Available  (was: Open)

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501667#comment-16501667
 ] 

Saisai Shao commented on HIVE-16391:


I see, thanks. Will upload the patch.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501561#comment-16501561
 ] 

Saisai Shao commented on HIVE-16391:


Seems there's no permission for me to upload a file.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501561#comment-16501561
 ] 

Saisai Shao edited comment on HIVE-16391 at 6/5/18 10:15 AM:
-

Seems there's no permission for me to upload a file. There's no such button.


was (Author: jerryshao):
Seems there's no permission for me to upload a file.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501415#comment-16501415
 ] 

Saisai Shao commented on HIVE-16391:


I'm not sure if submitting a PR is a right way to review in Hive Community, 
waiting for the feedback.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-04 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501285#comment-16501285
 ] 

Saisai Shao commented on HIVE-16391:


Hi [~joshrosen] I'm trying to make the hive changes as you mentioned above 
using the new classifier {{core-spark}}. I found one problem about release two 
shaded jars (one is hive-exec, another is hive-exec-core-spark). The published 
pom file is still reduced pom file, which is related to hive-exec, so when 
Spark using hive-exec-core-spark jar, it should explicitly declare all the 
transitive dependencies of hive-exec.

I'm not sure if there's a way to publish two pom files mapping to two different 
shaded jars, or it is acceptable for Spark to explicitly declare all the 
transitive dependencies, like {{core}} classifier you used before?

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2018-06-04 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501225#comment-16501225
 ] 

Saisai Shao commented on SPARK-20202:
-

OK, for the 1st, I've already started working on it locally. Looks like it is 
not a big change, only some POM changes are enough, I will submit a patch to 
Hive community.

> Remove references to org.spark-project.hive
> ---
>
> Key: SPARK-20202
> URL: https://issues.apache.org/jira/browse/SPARK-20202
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 1.6.4, 2.0.3, 2.1.1
>Reporter: Owen O'Malley
>Priority: Major
>
> Spark can't continue to depend on their fork of Hive and must move to 
> standard Hive versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2018-06-03 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499707#comment-16499707
 ] 

Saisai Shao commented on SPARK-20202:
-

What is our plan to to fix this issue, are we going to use new Hive version, or 
we are still stick to 1.2?

If we're still stick to 1.2, [~ste...@apache.org] and I will take this issue 
and make the ball rolling in Hive community.

> Remove references to org.spark-project.hive
> ---
>
> Key: SPARK-20202
> URL: https://issues.apache.org/jira/browse/SPARK-20202
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 1.6.4, 2.0.3, 2.1.1
>Reporter: Owen O'Malley
>Priority: Major
>
> Spark can't continue to depend on their fork of Hive and must move to 
> standard Hive versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

2018-06-01 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497931#comment-16497931
 ] 

Saisai Shao commented on SPARK-18673:
-

Thanks Steve, looking forward to your inputs.

> Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version
> --
>
> Key: SPARK-18673
> URL: https://issues.apache.org/jira/browse/SPARK-18673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Spark built with -Dhadoop.version=3.0.0-alpha2-SNAPSHOT 
>Reporter: Steve Loughran
>Priority: Major
>
> Spark Dataframes fail to run on Hadoop 3.0.x, because hive.jar's shimloader 
> considers 3.x to be an unknown Hadoop version.
> Hive itself will have to fix this; as Spark uses its own hive 1.2.x JAR, it 
> will need to be updated to match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24355) Improve Spark shuffle server responsiveness to non-ChunkFetch requests

2018-05-31 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497626#comment-16497626
 ] 

Saisai Shao commented on SPARK-24355:
-

Do we have a test result before and after?

> Improve Spark shuffle server responsiveness to non-ChunkFetch requests
> --
>
> Key: SPARK-24355
> URL: https://issues.apache.org/jira/browse/SPARK-24355
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.3.0
> Environment: Hadoop-2.7.4
> Spark-2.3.0
>Reporter: Min Shen
>Priority: Major
>
> We run Spark on YARN, and deploy Spark external shuffle service as part of 
> YARN NM aux service.
> One issue we saw with Spark external shuffle service is the various timeout 
> experienced by the clients on either registering executor with local shuffle 
> server or establish connection to remote shuffle server.
> Example of a timeout for establishing connection with remote shuffle server:
> {code:java}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout 
> waiting for task.
>   at 
> org.spark_project.guava.base.Throwables.propagate(Throwables.java:160)
>   at 
> org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:288)
>   at 
> org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:80)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:248)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:106)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:115)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:182)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.org$apache$spark$storage$ShuffleBlockFetcherIterator$$send$1(ShuffleBlockFetcherIterator.scala:396)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:391)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:345)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:57)
> {code}
> Example of a timeout for registering executor with local shuffle server:
> {code:java}
> ava.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout 
> waiting for task.
>   at 
> org.spark-project.guava.base.Throwables.propagate(Throwables.java:160)
>   at 
> org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:278)
>   at 
> org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:80)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:181)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:141)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:218)
> {code}
> While patches such as SPARK-20640 and config parameters such as 
> spark.shuffle.registration.timeout and spark.shuffle.sasl.timeout (when 
> spark.authenticate is set to true) could help to alleviate this type of 
> problems, it does not solve the fundamental issue.
> We have observed that, when the shuffle workload gets very busy in peak 
> hours, the client requests could timeout even after configuring these 
> parameters to very high values. Further investigating this issue revealed the 
> following issue:
> Right now, the default server side netty handler threads is 2 * # cores, and 
> can be further configured with parameter spark.shuffle.io.serverThreads.
> In order to process a client request, it would require one available server 
> netty handler thread.
> However, when the server netty handler threads start to process 
> ChunkFetchRequests, they will be blocked on disk I/O, mostly due to disk 
> contentions from the random read operations initiated by all the 
> ChunkFetchRequests received from clients.
> As a result, when the shuffle server is serving many concurre

[jira] [Commented] (SPARK-24448) File not found on the address SparkFiles.get returns on standalone cluster

2018-05-31 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497548#comment-16497548
 ] 

Saisai Shao commented on SPARK-24448:
-

Does it only happen in standalone cluster mode, have you tried client mode?

> File not found on the address SparkFiles.get returns on standalone cluster
> --
>
> Key: SPARK-24448
> URL: https://issues.apache.org/jira/browse/SPARK-24448
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Pritpal Singh
>Priority: Major
>
> I want to upload a file on all worker nodes in a standalone cluster and 
> retrieve the location of file. Here is my code
>  
> val tempKeyStoreLoc = System.getProperty("java.io.tmpdir") + "/keystore.jks"
> val file = new File(tempKeyStoreLoc)
> sparkContext.addFile(file.getAbsolutePath)
> val keyLoc = SparkFiles.get("keystore.jks")
>  
> SparkFiles.get returns a random location where keystore.jks does not exist. I 
> submit the job in cluster mode. In fact the location Spark.Files returns does 
> not exist on any of the worker nodes (including the driver node). 
> I observed that Spark does load keystore.jks files on worker nodes at 
> /work///keystore.jks. The partition_id 
> changes from one worker node to another.
> My requirement is to upload a file on all nodes of a cluster and retrieve its 
> location. I'm expecting the location to be common across all worker nodes.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (LIVY-475) Support of Hadoop CredentialProvider API for livy.keystore.password

2018-05-31 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-475:


Assignee: Ivan Dzikovsky

> Support of Hadoop CredentialProvider API for livy.keystore.password
> ---
>
> Key: LIVY-475
> URL: https://issues.apache.org/jira/browse/LIVY-475
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Ivan Dzikovsky
>Assignee: Ivan Dzikovsky
>Priority: Major
> Fix For: 0.6.0
>
>
> It would be good to have an option to get "livy.keystore.password" and 
> "livy.key-password" by using Hadoop CredentialProvider API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-475) Support of Hadoop CredentialProvider API for livy.keystore.password

2018-05-31 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-475.
--
   Resolution: Fixed
Fix Version/s: 0.6.0

Issue resolved by pull request 99
[https://github.com/apache/incubator-livy/pull/99]

> Support of Hadoop CredentialProvider API for livy.keystore.password
> ---
>
> Key: LIVY-475
> URL: https://issues.apache.org/jira/browse/LIVY-475
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Ivan Dzikovsky
>Priority: Major
> Fix For: 0.6.0
>
>
> It would be good to have an option to get "livy.keystore.password" and 
> "livy.key-password" by using Hadoop CredentialProvider API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

2018-05-31 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496187#comment-16496187
 ] 

Saisai Shao commented on SPARK-18673:
-

I created a PR for this issue [https://github.com/JoshRosen/hive/pull/2]

Actually one line fix is enough, most of other changes in Hive is related to 
HBase, which is not required for us.

I'm not sure what is our plan for Hive support, are we still planning to use 
1.2.1spark2 as a built-in version for Spark in future, or we plan to upgrade to 
latest Hive version. If we plan to upgrade to the latest one, then this PR is 
not necessary.

> Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version
> --
>
> Key: SPARK-18673
> URL: https://issues.apache.org/jira/browse/SPARK-18673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Spark built with -Dhadoop.version=3.0.0-alpha2-SNAPSHOT 
>Reporter: Steve Loughran
>Priority: Major
>
> Spark Dataframes fail to run on Hadoop 3.0.x, because hive.jar's shimloader 
> considers 3.x to be an unknown Hadoop version.
> Hive itself will have to fix this; as Spark uses its own hive 1.2.x JAR, it 
> will need to be updated to match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23991) data loss when allocateBlocksToBatch

2018-05-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-23991.
-
   Resolution: Fixed
Fix Version/s: 2.3.1
   2.4.0

Issue resolved by pull request 21430
[https://github.com/apache/spark/pull/21430]

> data loss when allocateBlocksToBatch
> 
>
> Key: SPARK-23991
> URL: https://issues.apache.org/jira/browse/SPARK-23991
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Input/Output
>Affects Versions: 2.2.0
> Environment: spark 2.11
>Reporter: kevin fu
>Priority: Major
> Fix For: 2.4.0, 2.3.1
>
>
> with checkpoint and WAL enabled, driver will write the allocation of blocks 
> to batch into hdfs. however, if it fails as following, the blocks of this 
> batch cannot be computed by the DAG. Because the blocks have been dequeued 
> from the receivedBlockQueue and get lost.
> {panel:title=error log}
> 18/04/15 11:11:25 WARN ReceivedBlockTracker: Exception thrown while writing 
> record: BatchAllocationEvent(152376548 ms,AllocatedBlocks(Map(0 -> 
> ArrayBuffer( to the WriteAheadLog. org.apache.spark.SparkException: 
> Exception thrown in awaitResult: at 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:83)
>  at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:234)
>  at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.allocateBlocksToBatch(ReceivedBlockTracker.scala:118)
>  at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.allocateBlocksToBatch(ReceiverTracker.scala:213)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
>  at scala.util.Try$.apply(Try.scala:192) at 
> org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:247)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:183)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Caused 
> by: java.util.concurrent.TimeoutException: Futures timed out after [5000 
> milliseconds] at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at 
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>  at scala.concurrent.Await$.result(package.scala:190) at 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190) ... 12 
> more 18/04/15 11:11:25 INFO ReceivedBlockTracker: Possibly processed batch 
> 152376548 ms needs to be processed again in WAL recovery{panel}
> the concerning codes are showed below:
> {code}
>   /**
>* Allocate all unallocated blocks to the given batch.
>* This event will get written to the write ahead log (if enabled).
>*/
>   def allocateBlocksToBatch(batchTime: Time): Unit = synchronized {
> if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) 
> {
>   val streamIdToBlocks = streamIds.map { streamId =>
>   (streamId, getReceivedBlockQueue(streamId).dequeueAll(x => true))
>   }.toMap
>   val allocatedBlocks = AllocatedBlocks(streamIdToBlocks)
>   if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) {
> timeToAllocatedBlocks.put(batchTime, allocatedBlocks)
> lastAllocatedBatchTime = batchTime
>   } else {
> logInfo(s"Possibly processed batch $batchTime needs to be processed 
> again in WAL recovery")
>   }
> } else {
>   // This situation occurs when:
>   // 1. WAL is ended with BatchAllocationEvent, but without 
> BatchCleanupEvent,
>   // possibly processed batch job or half-processed batch job need to be 
> processed again,
>   // so the batchTime will be equal to lastAllocatedBatchTime.
>   // 2. Slow checkpointing makes recovered batch time older than WAL 
> recovered
>   // lastAllocatedBatchTime.
>   // This situation will only occurs in recovery time.
>   logInfo(s"Possibly processed batch $batchTime needs to be processed 
> again in WAL recovery")
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe,

[jira] [Assigned] (SPARK-23991) data loss when allocateBlocksToBatch

2018-05-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-23991:
---

Assignee: Gabor Somogyi

> data loss when allocateBlocksToBatch
> 
>
> Key: SPARK-23991
> URL: https://issues.apache.org/jira/browse/SPARK-23991
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Input/Output
>Affects Versions: 2.2.0
> Environment: spark 2.11
>Reporter: kevin fu
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> with checkpoint and WAL enabled, driver will write the allocation of blocks 
> to batch into hdfs. however, if it fails as following, the blocks of this 
> batch cannot be computed by the DAG. Because the blocks have been dequeued 
> from the receivedBlockQueue and get lost.
> {panel:title=error log}
> 18/04/15 11:11:25 WARN ReceivedBlockTracker: Exception thrown while writing 
> record: BatchAllocationEvent(152376548 ms,AllocatedBlocks(Map(0 -> 
> ArrayBuffer( to the WriteAheadLog. org.apache.spark.SparkException: 
> Exception thrown in awaitResult: at 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:83)
>  at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:234)
>  at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.allocateBlocksToBatch(ReceivedBlockTracker.scala:118)
>  at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.allocateBlocksToBatch(ReceiverTracker.scala:213)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
>  at scala.util.Try$.apply(Try.scala:192) at 
> org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:247)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:183)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
>  at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Caused 
> by: java.util.concurrent.TimeoutException: Futures timed out after [5000 
> milliseconds] at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at 
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>  at scala.concurrent.Await$.result(package.scala:190) at 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190) ... 12 
> more 18/04/15 11:11:25 INFO ReceivedBlockTracker: Possibly processed batch 
> 152376548 ms needs to be processed again in WAL recovery{panel}
> the concerning codes are showed below:
> {code}
>   /**
>* Allocate all unallocated blocks to the given batch.
>* This event will get written to the write ahead log (if enabled).
>*/
>   def allocateBlocksToBatch(batchTime: Time): Unit = synchronized {
> if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) 
> {
>   val streamIdToBlocks = streamIds.map { streamId =>
>   (streamId, getReceivedBlockQueue(streamId).dequeueAll(x => true))
>   }.toMap
>   val allocatedBlocks = AllocatedBlocks(streamIdToBlocks)
>   if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) {
> timeToAllocatedBlocks.put(batchTime, allocatedBlocks)
> lastAllocatedBatchTime = batchTime
>   } else {
> logInfo(s"Possibly processed batch $batchTime needs to be processed 
> again in WAL recovery")
>   }
> } else {
>   // This situation occurs when:
>   // 1. WAL is ended with BatchAllocationEvent, but without 
> BatchCleanupEvent,
>   // possibly processed batch job or half-processed batch job need to be 
> processed again,
>   // so the batchTime will be equal to lastAllocatedBatchTime.
>   // 2. Slow checkpointing makes recovered batch time older than WAL 
> recovered
>   // lastAllocatedBatchTime.
>   // This situation will only occurs in recovery time.
>   logInfo(s"Possibly processed batch $batchTime needs to be processed 
> again in WAL recovery")
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h..

[jira] [Resolved] (LIVY-473) Minor refactor of integration test to remove some old codes

2018-05-28 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-473.
--
   Resolution: Fixed
Fix Version/s: 0.6.0

Issue resolved by pull request 97
[https://github.com/apache/incubator-livy/pull/97]

> Minor refactor of integration test to remove some old codes
> ---
>
> Key: LIVY-473
> URL: https://issues.apache.org/jira/browse/LIVY-473
> Project: Livy
>  Issue Type: Improvement
>Reporter: Saisai Shao
>Priority: Minor
> Fix For: 0.6.0
>
>
> Integration test has some legacy codes related to different cluster type 
> (mini or real). Since now we already use real Spark package to do test and 
> partially removed that code, we should refactor to remove all the unused 
> legacy code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (LIVY-473) Minor refactor of integration test to remove some old codes

2018-05-28 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-473:


Assignee: Saisai Shao

> Minor refactor of integration test to remove some old codes
> ---
>
> Key: LIVY-473
> URL: https://issues.apache.org/jira/browse/LIVY-473
> Project: Livy
>  Issue Type: Improvement
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 0.6.0
>
>
> Integration test has some legacy codes related to different cluster type 
> (mini or real). Since now we already use real Spark package to do test and 
> partially removed that code, we should refactor to remove all the unused 
> legacy code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-473) Minor refactor of integration test to remove some old codes

2018-05-28 Thread Saisai Shao (JIRA)
Saisai Shao created LIVY-473:


 Summary: Minor refactor of integration test to remove some old 
codes
 Key: LIVY-473
 URL: https://issues.apache.org/jira/browse/LIVY-473
 Project: Livy
  Issue Type: Improvement
Reporter: Saisai Shao


Integration test has some legacy codes related to different cluster type (mini 
or real). Since now we already use real Spark package to do test and partially 
removed that code, we should refactor to remove all the unused legacy code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-472) Improve the logs for fail-to-create session

2018-05-24 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-472.
--
   Resolution: Fixed
Fix Version/s: 0.6.0
   0.5.1

Issue resolved by pull request 96
[https://github.com/apache/incubator-livy/pull/96]

> Improve the logs for fail-to-create session
> ---
>
> Key: LIVY-472
> URL: https://issues.apache.org/jira/browse/LIVY-472
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Priority: Minor
> Fix For: 0.5.1, 0.6.0
>
>
> Livy currently doesn't give a very clear log about the fail-to-create 
> session, it only says that session related app tag cannot be found in RM, but 
> doesn't tell user how to search and get the true root cause. So here change 
> the logs to make it more clear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (LIVY-472) Improve the logs for fail-to-create session

2018-05-24 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-472:


Assignee: Saisai Shao

> Improve the logs for fail-to-create session
> ---
>
> Key: LIVY-472
> URL: https://issues.apache.org/jira/browse/LIVY-472
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 0.5.1, 0.6.0
>
>
> Livy currently doesn't give a very clear log about the fail-to-create 
> session, it only says that session related app tag cannot be found in RM, but 
> doesn't tell user how to search and get the true root cause. So here change 
> the logs to make it more clear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SPARK-24377) Make --py-files work in non pyspark application

2018-05-23 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-24377:
---

 Summary: Make --py-files work in non pyspark application
 Key: SPARK-24377
 URL: https://issues.apache.org/jira/browse/SPARK-24377
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.3.0
Reporter: Saisai Shao


For some Spark applications, though they're a java program, they require not 
only jar dependencies, but also python dependencies. One example is Livy remote 
SparkContext application, this application is actually a embedded REPL for 
Scala/Python/R, so it will not only load in jar dependencies, but also python 
and R deps.

Currently for a Spark application, --py-files can only be worked for a pyspark 
application, so it will not be worked in the above case. So here propose to 
remove such restriction.

Also we tested that "spark.submit.pyFiles" only supports quite limited scenario 
(client mode with local deps), so here also expand the usage of 
"spark.submit.pyFiles" to be alternative of --py-files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (LIVY-471) New session creation API set to support resource uploading

2018-05-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486896#comment-16486896
 ] 

Saisai Shao commented on LIVY-471:
--

{quote}Using file name provided by user is bad idea. Maybe it is not so 
significant in Livy but I had many unreadable files just because invalid 
characters or simply very long file name.
{quote}
I don't think your proposal also solves this problem.

 
{quote}Storing files locally is against possible future HA Livy (see LIVY-11).

If server crash (without cleaning) there is possibility to read old session's 
files.
{quote}
This should not be a problem if you really understand current Livy's session 
recovery mechanism and my implementation. For the feature thing, we can plan it 
in future.

 

Frankly, I don't want to upload resources to HDFS for Spark, because:
 # Spark knows how and when to upload resources based on different cluster 
manager, we don't have to make decision for Spark.
 # In future if we want to support different cluster managers for Livy, like 
standalone, k8s, it would be better to let Spark to decide how to handle 
resources.

> New session creation API set to support resource uploading
> --
>
> Key: LIVY-471
> URL: https://issues.apache.org/jira/browse/LIVY-471
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Priority: Major
>
> Already post in mail list.
> In our current API design to create interactive / batch session, we assume 
> end user should upload jars, pyFiles and related dependencies to HDFS before 
> creating the session, and we use one POST request to create session. But 
> usually end user may not have the permission to access the HDFS in their 
> submission machine, so it makes them hard to create new sessions. So the 
> requirement here is that if Livy could offer APIs to upload resources during 
> session creation. One implementation is proposed in
> [https://github.com/apache/incubator-livy/pull/91|https://github.com/apache/incubator-livy/pull/91.]
> This add a field in session creation request to delay the session creation, 
> then adding a bunch of APIs to support resource upload, finally adding an API 
> to start creating the session. This seems a feasible solution, but also a 
> little hack to support such scenario. So I was thinking if we could a set of 
> new APIs to support such scenarios, rather than hack the existing APIs.
> To borrow the concept from yarn application submission, we can have 3 APIs to 
> create session.
>  * requesting a new session id from Livy Server.
>  * uploading resources associate with this session id.
>  * submitting request to create session.
> This is similar to YARN's process to submit application, and we can bump the 
> supported API version for newly added APIs.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-472) Improve the logs for fail-to-create session

2018-05-22 Thread Saisai Shao (JIRA)
Saisai Shao created LIVY-472:


 Summary: Improve the logs for fail-to-create session
 Key: LIVY-472
 URL: https://issues.apache.org/jira/browse/LIVY-472
 Project: Livy
  Issue Type: Improvement
  Components: Server
Affects Versions: 0.5.0
Reporter: Saisai Shao


Livy currently doesn't give a very clear log about the fail-to-create session, 
it only says that session related app tag cannot be found in RM, but doesn't 
tell user how to search and get the true root cause. So here change the logs to 
make it more clear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (LIVY-468) Reverse proxy support for webui

2018-05-22 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-468:


Assignee: Michal Wcislo

> Reverse proxy support for webui
> ---
>
> Key: LIVY-468
> URL: https://issues.apache.org/jira/browse/LIVY-468
> Project: Livy
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 0.6.0
>Reporter: Michal Wcislo
>Assignee: Michal Wcislo
>Priority: Minor
> Fix For: 0.6.0
>
>
> Support for making Livy work behind reverse proxy (e.g.: Kong). The problem 
> is that Livy webui is working on "/" path. The idea is to make it work on for 
> example "/my-livy/" path.
> It is not working by default because html/css/js is generating urls using 
> "/". Code is executed in user browser which then make redirection/requests to 
> non-existing sources.
>  
> Proposed solution is to add parameter in livy.conf to setup base path on livy 
> startup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-468) Reverse proxy support for webui

2018-05-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486615#comment-16486615
 ] 

Saisai Shao commented on LIVY-468:
--

Ok, I will do it.

> Reverse proxy support for webui
> ---
>
> Key: LIVY-468
> URL: https://issues.apache.org/jira/browse/LIVY-468
> Project: Livy
>  Issue Type: New Feature
>  Components: Server
>Affects Versions: 0.6.0
>Reporter: Michal Wcislo
>Priority: Minor
> Fix For: 0.6.0
>
>
> Support for making Livy work behind reverse proxy (e.g.: Kong). The problem 
> is that Livy webui is working on "/" path. The idea is to make it work on for 
> example "/my-livy/" path.
> It is not working by default because html/css/js is generating urls using 
> "/". Code is executed in user browser which then make redirection/requests to 
> non-existing sources.
>  
> Proposed solution is to add parameter in livy.conf to setup base path on livy 
> startup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-452) Differentiate FAILED and KILLED states

2018-05-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482107#comment-16482107
 ] 

Saisai Shao commented on LIVY-452:
--

Issue resolved by pull request 92
[https://github.com/apache/incubator-livy/pull/|https://github.com/apache/incubator-livy/pull/86]92

> Differentiate FAILED and KILLED states
> --
>
> Key: LIVY-452
> URL: https://issues.apache.org/jira/browse/LIVY-452
> Project: Livy
>  Issue Type: Improvement
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 0.6.0
>
>
> Now it's not possible to distinguish between two states - 
> _SparkApp.State.KILLED_ and  _SparkApp.State.FAILED._ In both cases the 
> session state will be _SessionState.Dead()._
> In our use case it's important to distinguish whether job was failed or 
> killed. So, I propose to add new _SessionState.Killed()_ which will be used 
> when job was actually killed by user. 
> If this idea will be approved then I can submit PR about that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-452) Differentiate FAILED and KILLED states

2018-05-20 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-452.
--
   Resolution: Fixed
Fix Version/s: 0.6.0

> Differentiate FAILED and KILLED states
> --
>
> Key: LIVY-452
> URL: https://issues.apache.org/jira/browse/LIVY-452
> Project: Livy
>  Issue Type: Improvement
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 0.6.0
>
>
> Now it's not possible to distinguish between two states - 
> _SparkApp.State.KILLED_ and  _SparkApp.State.FAILED._ In both cases the 
> session state will be _SessionState.Dead()._
> In our use case it's important to distinguish whether job was failed or 
> killed. So, I propose to add new _SessionState.Killed()_ which will be used 
> when job was actually killed by user. 
> If this idea will be approved then I can submit PR about that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (LIVY-452) Differentiate FAILED and KILLED states

2018-05-20 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-452:


Assignee: Alexey Romanenko

> Differentiate FAILED and KILLED states
> --
>
> Key: LIVY-452
> URL: https://issues.apache.org/jira/browse/LIVY-452
> Project: Livy
>  Issue Type: Improvement
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 0.6.0
>
>
> Now it's not possible to distinguish between two states - 
> _SparkApp.State.KILLED_ and  _SparkApp.State.FAILED._ In both cases the 
> session state will be _SessionState.Dead()._
> In our use case it's important to distinguish whether job was failed or 
> killed. So, I propose to add new _SessionState.Killed()_ which will be used 
> when job was actually killed by user. 
> If this idea will be approved then I can submit PR about that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-471) New session creation API set to support resource uploading

2018-05-18 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481379#comment-16481379
 ] 

Saisai Shao commented on LIVY-471:
--

Requesting a new session id doesn't involve any session creation, so there's no 
any footprint for new session/session state, which means the session state you 
concerned is not valid.

> New session creation API set to support resource uploading
> --
>
> Key: LIVY-471
> URL: https://issues.apache.org/jira/browse/LIVY-471
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Priority: Major
>
> Already post in mail list.
> In our current API design to create interactive / batch session, we assume 
> end user should upload jars, pyFiles and related dependencies to HDFS before 
> creating the session, and we use one POST request to create session. But 
> usually end user may not have the permission to access the HDFS in their 
> submission machine, so it makes them hard to create new sessions. So the 
> requirement here is that if Livy could offer APIs to upload resources during 
> session creation. One implementation is proposed in
> [https://github.com/apache/incubator-livy/pull/91|https://github.com/apache/incubator-livy/pull/91.]
> This add a field in session creation request to delay the session creation, 
> then adding a bunch of APIs to support resource upload, finally adding an API 
> to start creating the session. This seems a feasible solution, but also a 
> little hack to support such scenario. So I was thinking if we could a set of 
> new APIs to support such scenarios, rather than hack the existing APIs.
> To borrow the concept from yarn application submission, we can have 3 APIs to 
> create session.
>  * requesting a new session id from Livy Server.
>  * uploading resources associate with this session id.
>  * submitting request to create session.
> This is similar to YARN's process to submit application, and we can bump the 
> supported API version for newly added APIs.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-471) New session creation API set to support resource uploading

2018-05-18 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481375#comment-16481375
 ] 

Saisai Shao commented on LIVY-471:
--

I have a local PR for solution 2, which was simpler and straightforward. It 
only adds new APIs while keep the consistency of old API. Also it handles both 
batch and interactive session. I can submit a WIP pr, in which you can see the 
implementation more clearly.

> New session creation API set to support resource uploading
> --
>
> Key: LIVY-471
> URL: https://issues.apache.org/jira/browse/LIVY-471
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Priority: Major
>
> Already post in mail list.
> In our current API design to create interactive / batch session, we assume 
> end user should upload jars, pyFiles and related dependencies to HDFS before 
> creating the session, and we use one POST request to create session. But 
> usually end user may not have the permission to access the HDFS in their 
> submission machine, so it makes them hard to create new sessions. So the 
> requirement here is that if Livy could offer APIs to upload resources during 
> session creation. One implementation is proposed in
> [https://github.com/apache/incubator-livy/pull/91|https://github.com/apache/incubator-livy/pull/91.]
> This add a field in session creation request to delay the session creation, 
> then adding a bunch of APIs to support resource upload, finally adding an API 
> to start creating the session. This seems a feasible solution, but also a 
> little hack to support such scenario. So I was thinking if we could a set of 
> new APIs to support such scenarios, rather than hack the existing APIs.
> To borrow the concept from yarn application submission, we can have 3 APIs to 
> create session.
>  * requesting a new session id from Livy Server.
>  * uploading resources associate with this session id.
>  * submitting request to create session.
> This is similar to YARN's process to submit application, and we can bump the 
> supported API version for newly added APIs.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-471) New session creation API set to support resource uploading

2018-05-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-471:
-
Description: 
Already post in mail list.

In our current API design to create interactive / batch session, we assume end 
user should upload jars, pyFiles and related dependencies to HDFS before 
creating the session, and we use one POST request to create session. But 
usually end user may not have the permission to access the HDFS in their 
submission machine, so it makes them hard to create new sessions. So the 
requirement here is that if Livy could offer APIs to upload resources during 
session creation. One implementation is proposed in

[https://github.com/apache/incubator-livy/pull/91|https://github.com/apache/incubator-livy/pull/91.]

This add a field in session creation request to delay the session creation, 
then adding a bunch of APIs to support resource upload, finally adding an API 
to start creating the session. This seems a feasible solution, but also a 
little hack to support such scenario. So I was thinking if we could a set of 
new APIs to support such scenarios, rather than hack the existing APIs.

To borrow the concept from yarn application submission, we can have 3 APIs to 
create session.
 * requesting a new session id from Livy Server.
 * uploading resources associate with this session id.
 * submitting request to create session.

This is similar to YARN's process to submit application, and we can bump the 
supported API version for newly added APIs.

 

  was:
Already post in mail list.

In our current API design to create interactive / batch session, we assume end 
user should upload jars, pyFiles and related dependencies to HDFS before 
creating the session, and we use one POST request to create session. But 
usually end user may not have the permission to access the HDFS in their 
submission machine, so it makes them hard to create new sessions. So the 
requirement here is that if Livy could offer APIs to upload resources during 
session creation. One implementation is proposed in

[https://github.com/apache/incubator-livy/pull/91.]

This add a field in session creation request to delay the session creation, 
then adding a bunch of APIs to support resource upload, finally adding an API 
to start creating the session. This seems a feasible solution, but also a 
little hack to support such scenario. So I was thinking if we could a set of 
new APIs to support such scenarios, rather than hack the existing APIs.

To borrow the concept from yarn application submission, we can have 3 APIs to 
create session.
 * requesting a new session id from Livy Server.
 * uploading resources associate with this session id.
 * submitting request to create session.

This is similar to YARN's process to submit application, and we can bump the 
supported API version for newly added APIs.

 


> New session creation API set to support resource uploading
> --
>
> Key: LIVY-471
> URL: https://issues.apache.org/jira/browse/LIVY-471
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Priority: Major
>
> Already post in mail list.
> In our current API design to create interactive / batch session, we assume 
> end user should upload jars, pyFiles and related dependencies to HDFS before 
> creating the session, and we use one POST request to create session. But 
> usually end user may not have the permission to access the HDFS in their 
> submission machine, so it makes them hard to create new sessions. So the 
> requirement here is that if Livy could offer APIs to upload resources during 
> session creation. One implementation is proposed in
> [https://github.com/apache/incubator-livy/pull/91|https://github.com/apache/incubator-livy/pull/91.]
> This add a field in session creation request to delay the session creation, 
> then adding a bunch of APIs to support resource upload, finally adding an API 
> to start creating the session. This seems a feasible solution, but also a 
> little hack to support such scenario. So I was thinking if we could a set of 
> new APIs to support such scenarios, rather than hack the existing APIs.
> To borrow the concept from yarn application submission, we can have 3 APIs to 
> create session.
>  * requesting a new session id from Livy Server.
>  * uploading resources associate with this session id.
>  * submitting request to create session.
> This is similar to YARN's process to submit application, and we can bump the 
> supported API version for newly added APIs.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-471) New session creation API set to support resource uploading

2018-05-17 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-471:
-
Description: 
Already post in mail list.

In our current API design to create interactive / batch session, we assume end 
user should upload jars, pyFiles and related dependencies to HDFS before 
creating the session, and we use one POST request to create session. But 
usually end user may not have the permission to access the HDFS in their 
submission machine, so it makes them hard to create new sessions. So the 
requirement here is that if Livy could offer APIs to upload resources during 
session creation. One implementation is proposed in

[https://github.com/apache/incubator-livy/pull/91.]

This add a field in session creation request to delay the session creation, 
then adding a bunch of APIs to support resource upload, finally adding an API 
to start creating the session. This seems a feasible solution, but also a 
little hack to support such scenario. So I was thinking if we could a set of 
new APIs to support such scenarios, rather than hack the existing APIs.

To borrow the concept from yarn application submission, we can have 3 APIs to 
create session.
 * requesting a new session id from Livy Server.
 * uploading resources associate with this session id.
 * submitting request to create session.

This is similar to YARN's process to submit application, and we can bump the 
supported API version for newly added APIs.

 

  was:
Already post in mail list.

In our current API design to create interactive / batch session, we assume end 
user should upload jars, pyFiles and related dependencies to HDFS before 
creating the session, and we use one POST request to create session. But 
usually end user may not have the permission to access the HDFS in their 
submission machine, so it makes them hard to create new sessions. So the 
requirement here is that if Livy could offer APIs to upload resources during 
session creation. One implementation is proposed in

[https://github.com/apache/incubator-livy/pull/91.]

This add a field in session creation request to delay the session creation, 
then adding a bunch of APIs to support resource upload, finally adding an API 
to start creating the session. This seems a feasible solution, but also a 
little hack to support such scenario. So I was thinking if we could a set of 
new APIs to support such scenarios, rather than hack the existing APIs.

To borrow the concept from yarn application submission, we can have 3 APIs to 
create session.
 * requesting a new session id from Livy Server.
 * uploading resources associate with this session id.
 * submitting request to create session. This is similar to YARN's process to 
submit application, and we can bump the supported API version for newly added 
APIs.

 


> New session creation API set to support resource uploading
> --
>
> Key: LIVY-471
> URL: https://issues.apache.org/jira/browse/LIVY-471
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Priority: Major
>
> Already post in mail list.
> In our current API design to create interactive / batch session, we assume 
> end user should upload jars, pyFiles and related dependencies to HDFS before 
> creating the session, and we use one POST request to create session. But 
> usually end user may not have the permission to access the HDFS in their 
> submission machine, so it makes them hard to create new sessions. So the 
> requirement here is that if Livy could offer APIs to upload resources during 
> session creation. One implementation is proposed in
> [https://github.com/apache/incubator-livy/pull/91.]
> This add a field in session creation request to delay the session creation, 
> then adding a bunch of APIs to support resource upload, finally adding an API 
> to start creating the session. This seems a feasible solution, but also a 
> little hack to support such scenario. So I was thinking if we could a set of 
> new APIs to support such scenarios, rather than hack the existing APIs.
> To borrow the concept from yarn application submission, we can have 3 APIs to 
> create session.
>  * requesting a new session id from Livy Server.
>  * uploading resources associate with this session id.
>  * submitting request to create session.
> This is similar to YARN's process to submit application, and we can bump the 
> supported API version for newly added APIs.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-471) New session creation API set to support resource uploading

2018-05-17 Thread Saisai Shao (JIRA)
Saisai Shao created LIVY-471:


 Summary: New session creation API set to support resource uploading
 Key: LIVY-471
 URL: https://issues.apache.org/jira/browse/LIVY-471
 Project: Livy
  Issue Type: Improvement
  Components: Server
Affects Versions: 0.5.0
Reporter: Saisai Shao


Already post in mail list.

In our current API design to create interactive / batch session, we assume end 
user should upload jars, pyFiles and related dependencies to HDFS before 
creating the session, and we use one POST request to create session. But 
usually end user may not have the permission to access the HDFS in their 
submission machine, so it makes them hard to create new sessions. So the 
requirement here is that if Livy could offer APIs to upload resources during 
session creation. One implementation is proposed in

[https://github.com/apache/incubator-livy/pull/91.]

This add a field in session creation request to delay the session creation, 
then adding a bunch of APIs to support resource upload, finally adding an API 
to start creating the session. This seems a feasible solution, but also a 
little hack to support such scenario. So I was thinking if we could a set of 
new APIs to support such scenarios, rather than hack the existing APIs.

To borrow the concept from yarn application submission, we can have 3 APIs to 
create session.
 * requesting a new session id from Livy Server.
 * uploading resources associate with this session id.
 * submitting request to create session. This is similar to YARN's process to 
submit application, and we can bump the supported API version for newly added 
APIs.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-24241) Do not fail fast when dynamic resource allocation enabled with 0 executor

2018-05-15 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475480#comment-16475480
 ] 

Saisai Shao commented on SPARK-24241:
-

Issue resolved by pull request 21290
https://github.com/apache/spark/pull/21290

> Do not fail fast when dynamic resource allocation enabled with 0 executor
> -
>
> Key: SPARK-24241
> URL: https://issues.apache.org/jira/browse/SPARK-24241
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 2.4.0
>
>
> {code:java}
> ~/spark-2.3.0-bin-hadoop2.7$ bin/spark-sql --num-executors 0 --conf 
> spark.dynamicAllocation.enabled=true
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=1024m; 
> support was removed in 8.0
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=1024m; 
> support was removed in 8.0
> Error: Number of executors must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
> Actually, we could start up with min executor number with 0 before 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24241) Do not fail fast when dynamic resource allocation enabled with 0 executor

2018-05-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24241.
-
Resolution: Fixed

> Do not fail fast when dynamic resource allocation enabled with 0 executor
> -
>
> Key: SPARK-24241
> URL: https://issues.apache.org/jira/browse/SPARK-24241
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> {code:java}
> ~/spark-2.3.0-bin-hadoop2.7$ bin/spark-sql --num-executors 0 --conf 
> spark.dynamicAllocation.enabled=true
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=1024m; 
> support was removed in 8.0
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=1024m; 
> support was removed in 8.0
> Error: Number of executors must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
> Actually, we could start up with min executor number with 0 before 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24241) Do not fail fast when dynamic resource allocation enabled with 0 executor

2018-05-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24241:

Fix Version/s: 2.4.0

> Do not fail fast when dynamic resource allocation enabled with 0 executor
> -
>
> Key: SPARK-24241
> URL: https://issues.apache.org/jira/browse/SPARK-24241
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 2.4.0
>
>
> {code:java}
> ~/spark-2.3.0-bin-hadoop2.7$ bin/spark-sql --num-executors 0 --conf 
> spark.dynamicAllocation.enabled=true
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=1024m; 
> support was removed in 8.0
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=1024m; 
> support was removed in 8.0
> Error: Number of executors must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
> Actually, we could start up with min executor number with 0 before 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24241) Do not fail fast when dynamic resource allocation enabled with 0 executor

2018-05-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24241:
---

Assignee: Kent Yao

> Do not fail fast when dynamic resource allocation enabled with 0 executor
> -
>
> Key: SPARK-24241
> URL: https://issues.apache.org/jira/browse/SPARK-24241
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> {code:java}
> ~/spark-2.3.0-bin-hadoop2.7$ bin/spark-sql --num-executors 0 --conf 
> spark.dynamicAllocation.enabled=true
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=1024m; 
> support was removed in 8.0
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=1024m; 
> support was removed in 8.0
> Error: Number of executors must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
> Actually, we could start up with min executor number with 0 before 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (LIVY-431) Livy batch mode ignore proxyUser parameter

2018-05-14 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-431.
--
Resolution: Not A Problem

> Livy batch mode ignore proxyUser parameter
> --
>
> Key: LIVY-431
> URL: https://issues.apache.org/jira/browse/LIVY-431
> Project: Livy
>  Issue Type: Bug
>  Components: API, Core
>Affects Versions: 0.4.0
> Environment: Cloudera
>Reporter: Benoit de Rancourt
>Priority: Major
>
> Hello,
> We test Livy at work using it only in batch mode in order to have a spark 
> gateway. We use the last tagged version 0.4.0-incubating. We use the REST 
> API, as described here : 
> [http://livy.incubator.apache.org/docs/latest/rest-api.html]
> In the POST /batches request, we set the proxyUser parameter but Livy seems 
> to ignore it.
> I don't speak scala so I'm a little lost to see if there is a bug in the 
> source code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-431) Livy batch mode ignore proxyUser parameter

2018-05-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475255#comment-16475255
 ] 

Saisai Shao commented on LIVY-431:
--

I think you should enable impersonation first in Livy 
(livy.impersonation.enabled).

> Livy batch mode ignore proxyUser parameter
> --
>
> Key: LIVY-431
> URL: https://issues.apache.org/jira/browse/LIVY-431
> Project: Livy
>  Issue Type: Bug
>  Components: API, Core
>Affects Versions: 0.4.0
> Environment: Cloudera
>Reporter: Benoit de Rancourt
>Priority: Major
>
> Hello,
> We test Livy at work using it only in batch mode in order to have a spark 
> gateway. We use the last tagged version 0.4.0-incubating. We use the REST 
> API, as described here : 
> [http://livy.incubator.apache.org/docs/latest/rest-api.html]
> In the POST /batches request, we set the proxyUser parameter but Livy seems 
> to ignore it.
> I don't speak scala so I'm a little lost to see if there is a bug in the 
> source code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-469) "shared" session kind is undocumented

2018-05-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475221#comment-16475221
 ] 

Saisai Shao commented on LIVY-469:
--

It is by intention. Starting from 0.5 session kind is actually useless, we 
should encourage user to not specify session kind,  instead we should set code 
kind when submitting statement.

> "shared" session kind is undocumented
> -
>
> Key: LIVY-469
> URL: https://issues.apache.org/jira/browse/LIVY-469
> Project: Livy
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: 0.5.0
>Reporter: Tim Harsch
>Priority: Major
>
> From the docs:
> {quote}Starting with version 0.5.0-incubating, each session can support all 
> four Scala, Python and R interpreters with newly added SQL interpreter. The 
> {{kind}}field in session creation is no longer required, instead users should 
> specify code kind (spark, pyspark, sparkr or sql) during statement submission.
> To be compatible with previous versions, users can still specify {{kind}} in 
> session creation, while ignoring {{kind}} in statement submission. Livy will 
> then use this session {{kind}} as default kind for all the submitted 
> statements.{quote}
>  
> 1.  I've found a 5th value for Session kind (only 4 are documented: 
> https://livy.incubator.apache.org/docs/latest/rest-api.html#session-kind )
> 2. In 'shared' case,  "users *cannot* still specify {{kind}} in session 
> creation, while ignoring {{kind}}" due to the error that is received 
> demonstrated below.
> {code}
> harsch@mint64 ~ $ curl -s -X POST --data '{}'   -H "Content-Type: 
> application/json" localhost:8998/sessions | python -m json.tool
> {
>     "appId": null,
>     "appInfo": {
>         "driverLogUrl": null,
>         "sparkUiUrl": null
>     },
>     "id": 0,
>     "kind": "shared",
>     "log": [
>         "stdout: ",
>         "\nstderr: "
>     ],
>     "owner": null,
>     "proxyUser": null,
>     "state": "starting"
> }
> {code}
> Executing this:
> {code}
>  curl -s -X POST --data '{"code":"1 + 1"}'  -H "Content-Type: 
> application/json" localhost:8998/sessions/0/statements| python -m json.tool 
> {code}
> will produce this in the logs:
> {noformat}
> Caused by: org.apache.livy.rsc.rpc.RpcException: 
> java.lang.IllegalArgumentException: Code type should be specified if session 
> kind is shared
> {noformat}
> I do not know how a 'shared' session kind is different from "spark" session 
> kind, or I would suggest some further documentation that would distinguish 
> them.  Perhaps one of the dev team could come up with some useful 
> documentation in this regard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (SPARK-24182) Improve error message for client mode when AM fails

2018-05-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24182:
---

Assignee: Marcelo Vanzin

> Improve error message for client mode when AM fails
> ---
>
> Key: SPARK-24182
> URL: https://issues.apache.org/jira/browse/SPARK-24182
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 2.4.0
>
>
> Today, when the client AM fails, there's not a lot of useful information 
> printed on the output. Depending on the type of failure, the information 
> provided by the YARN AM is also not very useful. For example, you'd see this 
> in the Spark shell:
> {noformat}
> 18/05/04 11:07:38 ERROR spark.SparkContext: Error initializing SparkContext.
> org.apache.spark.SparkException: Yarn application has already ended! It might 
> have been killed or unable to launch application master.
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:86)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
> at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  [long stack trace]
> {noformat}
> Similarly, on the YARN RM, for certain failures you see a generic error like 
> this:
> {noformat}
> ExitCodeException exitCode=10: at 
> org.apache.hadoop.util.Shell.runCommand(Shell.java:543) at 
> org.apache.hadoop.util.Shell.run(Shell.java:460) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:366)
>  at 
> [blah blah blah]
> {noformat}
> It would be nice if we could provide a more accurate description of what went 
> wrong when possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24182) Improve error message for client mode when AM fails

2018-05-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24182.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21243
[https://github.com/apache/spark/pull/21243]

> Improve error message for client mode when AM fails
> ---
>
> Key: SPARK-24182
> URL: https://issues.apache.org/jira/browse/SPARK-24182
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 2.4.0
>
>
> Today, when the client AM fails, there's not a lot of useful information 
> printed on the output. Depending on the type of failure, the information 
> provided by the YARN AM is also not very useful. For example, you'd see this 
> in the Spark shell:
> {noformat}
> 18/05/04 11:07:38 ERROR spark.SparkContext: Error initializing SparkContext.
> org.apache.spark.SparkException: Yarn application has already ended! It might 
> have been killed or unable to launch application master.
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:86)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
> at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  [long stack trace]
> {noformat}
> Similarly, on the YARN RM, for certain failures you see a generic error like 
> this:
> {noformat}
> ExitCodeException exitCode=10: at 
> org.apache.hadoop.util.Shell.runCommand(Shell.java:543) at 
> org.apache.hadoop.util.Shell.run(Shell.java:460) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:366)
>  at 
> [blah blah blah]
> {noformat}
> It would be nice if we could provide a more accurate description of what went 
> wrong when possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24219) Improve the docker build script to avoid copying everything in example

2018-05-08 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-24219:
---

 Summary: Improve the docker build script to avoid copying 
everything in example
 Key: SPARK-24219
 URL: https://issues.apache.org/jira/browse/SPARK-24219
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Saisai Shao


Current docker build script will copy everything under example folder to docker 
image if it is invoked in dev path, this unnecessarily copies too many files 
like building temporary files into the docker image. So here propose to improve 
the script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24188) /api/v1/version not working

2018-05-07 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24188.
-
   Resolution: Fixed
Fix Version/s: 2.3.1
   2.4.0

Issue resolved by pull request 21245
[https://github.com/apache/spark/pull/21245]

> /api/v1/version not working
> ---
>
> Key: SPARK-24188
> URL: https://issues.apache.org/jira/browse/SPARK-24188
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 2.4.0, 2.3.1
>
>
> That URI from the REST API is currently returning a 404.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24188) /api/v1/version not working

2018-05-07 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24188:
---

Assignee: Marcelo Vanzin

> /api/v1/version not working
> ---
>
> Key: SPARK-24188
> URL: https://issues.apache.org/jira/browse/SPARK-24188
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
>
> That URI from the REST API is currently returning a 404.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24174) Expose Hadoop config as part of /environment API

2018-05-04 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463575#comment-16463575
 ] 

Saisai Shao commented on SPARK-24174:
-

I believe Hadoop web UI already expose such configurations. It seems not so 
proper and necessary to expose here in the Spark side, this potentially mixed 
things up.

> Expose Hadoop config as part of /environment API
> 
>
> Key: SPARK-24174
> URL: https://issues.apache.org/jira/browse/SPARK-24174
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Nikolay Sokolov
>Priority: Minor
>  Labels: features, usability
>
> Currently, /environment API call exposes only system properties and 
> SparkConf. However, in some cases when Spark is used in conjunction with 
> Hadoop, it is useful to know Hadoop configuration properties. For example, 
> HDFS or GS buffer sizes, hive metastore settings, and so on.
> So it would be good to have hadoop properties being exposed in /environment 
> API, for example:
> {code:none}
> GET .../application_1525395994996_5/environment
> {
>"runtime": {"javaVersion": "1.8.0_131 (Oracle Corporation)", ...}
>"sparkProperties": ["java.io.tmpdir","/tmp", ...],
>"systemProperties": [["spark.yarn.jars", "local:/usr/lib/spark/jars/*"], 
> ...],
>"classpathEntries": [["/usr/lib/hadoop/hadoop-annotations.jar","System 
> Classpath"], ...],
>"hadoopProperties": [["dfs.stream-buffer-size": 4096], ...],
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24136) MemoryStreamDataReader.next should skip sleeping if record is available

2018-05-04 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24136:
---

Assignee: Arun Mahadevan

> MemoryStreamDataReader.next should skip sleeping if record is available
> ---
>
> Key: SPARK-24136
> URL: https://issues.apache.org/jira/browse/SPARK-24136
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Arun Mahadevan
>Assignee: Arun Mahadevan
>Priority: Minor
> Fix For: 2.4.0
>
>
> Currently the code sleeps 10ms for each invocation of the next even if the 
> record is available.
> {code:java}
> override def next(): Boolean = {
> current = None
>  while (current.isEmpty) {
>  Thread.sleep(10)
> current = endpoint.askSync[Option[Row]](
> GetRecord(ContinuousMemoryStreamPartitionOffset(partition, currentOffset)))
>  }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24136) MemoryStreamDataReader.next should skip sleeping if record is available

2018-05-04 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24136.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21207
[https://github.com/apache/spark/pull/21207]

> MemoryStreamDataReader.next should skip sleeping if record is available
> ---
>
> Key: SPARK-24136
> URL: https://issues.apache.org/jira/browse/SPARK-24136
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Arun Mahadevan
>Priority: Minor
> Fix For: 2.4.0
>
>
> Currently the code sleeps 10ms for each invocation of the next even if the 
> record is available.
> {code:java}
> override def next(): Boolean = {
> current = None
>  while (current.isEmpty) {
>  Thread.sleep(10)
> current = endpoint.askSync[Option[Row]](
> GetRecord(ContinuousMemoryStreamPartitionOffset(partition, currentOffset)))
>  }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (LIVY-464) Cannot connect to Datastax standalone cluster DSE 5.1.5

2018-05-04 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463465#comment-16463465
 ] 

Saisai Shao commented on LIVY-464:
--

For now Livy can only support local mode and yarn mode. You can configure to 
run on Standalone/Mesos cluster manager, but we don't verify it, so don't 
guarantee the correct behavior.

>From the description, seems you're using some customized cluster manager 
>(dse://xxx), I don't think Livy could support it every well currently.

> Cannot connect to Datastax standalone cluster DSE 5.1.5
> ---
>
> Key: LIVY-464
> URL: https://issues.apache.org/jira/browse/LIVY-464
> Project: Livy
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.5.0, 0.6.0
> Environment: OSX and Ubuntu 16.04 in docker
>Reporter: Brian Konzman
>Priority: Major
>
> Using 
> livy.spark.master = dse://:9042 
>  The new way to define a datastax cluster as the spark master in livy.conf 
> returns
> {{Error: Master must either be yarn or start with spark, mesos, local}}
> livy.spark.master = spark://:9042 
>  
> {{18/04/30 17:21:51 INFO StandaloneAppClient$ClientEndpoint: Connecting to 
> master spark://:9042...}}
> {{18/04/30 17:21:51 INFO TransportClientFactory: Successfully created 
> connection to :9042 after 19 ms (0 ms spent in bootstraps)}}
> {{18/04/30 17:21:51 WARN TransportChannelHandler: Exception in connection 
> from :9042 java.lang.IllegalArgumentException: Frame length should be 
> positive: -8858580467037765640}}
>  
> livy.spark.master = spark://:7077
>  
> {{18/04/30 17:23:31 INFO StandaloneAppClient$ClientEndpoint: Connecting to 
> master spark://:7077... }}
> {{18/04/30 17:23:31 INFO TransportClientFactory: Successfully created 
> connection to :7077 after 15 ms (0 ms spent in bootstraps) 
> }}{{18/04/30 17:23:51 INFO StandaloneAppClient$ClientEndpoint: Connecting to 
> master spark://ccwa-csssp-dev1-02:7077... }}
> {{18/04/30 17:24:11 INFO StandaloneAppClient$ClientEndpoint: Connecting to 
> master spark://:7077... }}
> {{18/04/30 17:24:31 ERROR StandaloneSchedulerBackend: Application has been 
> killed. Reason: All masters are unresponsive! Giving up.}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-466) RSCDriver throws exception during RPC shutdown

2018-05-02 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-466.
--
   Resolution: Fixed
Fix Version/s: 0.6.0
   0.5.1

Issue resolved by pull request 90
[https://github.com/apache/incubator-livy/pull/90]

> RSCDriver throws exception during RPC shutdown
> --
>
> Key: LIVY-466
> URL: https://issues.apache.org/jira/browse/LIVY-466
> Project: Livy
>  Issue Type: Bug
>  Components: RSC
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
> Fix For: 0.5.1, 0.6.0
>
>
> During RSCDriver's shutdown, it will first shutdown RPC server, and then all 
> the RPC clients. When RPC client is closed, it will register a timeout to 
> avoid orphaned RSCDriver, but this is not necessary during RSCDriver's 
> shutdown, and will throw an exception as show below, so here fixing this 
> issue.
> {noformat}
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: 18/05/02 14:03:53 WARN 
> DefaultPromise: An exception was thrown by 
> org.apache.livy.rsc.Utils$2.operationComplete()
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: 
> java.util.concurrent.RejectedExecutionException: event executor terminated
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:821)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:327)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:320)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:746)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:195)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:140)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.AbstractEventExecutorGroup.schedule(AbstractEventExecutorGroup.java:50)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> org.apache.livy.rsc.driver.RSCDriver.setupIdleTimeout(RSCDriver.java:238)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> org.apache.livy.rsc.driver.RSCDriver.access$100(RSCDriver.java:70)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> org.apache.livy.rsc.driver.RSCDriver$2.onSuccess(RSCDriver.java:220)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> org.apache.livy.rsc.driver.RSCDriver$2.onSuccess(RSCDriver.java:216)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> org.apache.livy.rsc.Utils$2.operationComplete(Utils.java:108)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1148)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:764)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:740)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:611)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1301)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
> 18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
> io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73)
> 18/05/02 14:03:53 INFO u

[jira] [Resolved] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer

2018-05-02 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24110.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21178
[https://github.com/apache/spark/pull/21178]

> Avoid calling UGI loginUserFromKeytab in ThriftServer
> -
>
> Key: SPARK-24110
> URL: https://issues.apache.org/jira/browse/SPARK-24110
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. 
> This is unnecessary and will cause various potential problems, like Hadoop 
> IPC failure after 7 days, or RM failover issue and so on.
> So here we need to remove all the unnecessary login logics and make sure UGI 
> in the context never be created again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer

2018-05-02 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24110:
---

Assignee: Saisai Shao

> Avoid calling UGI loginUserFromKeytab in ThriftServer
> -
>
> Key: SPARK-24110
> URL: https://issues.apache.org/jira/browse/SPARK-24110
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. 
> This is unnecessary and will cause various potential problems, like Hadoop 
> IPC failure after 7 days, or RM failover issue and so on.
> So here we need to remove all the unnecessary login logics and make sure UGI 
> in the context never be created again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (LIVY-466) RSCDriver throws exception during RPC shutdown

2018-05-02 Thread Saisai Shao (JIRA)
Saisai Shao created LIVY-466:


 Summary: RSCDriver throws exception during RPC shutdown
 Key: LIVY-466
 URL: https://issues.apache.org/jira/browse/LIVY-466
 Project: Livy
  Issue Type: Bug
  Components: RSC
Affects Versions: 0.5.0
Reporter: Saisai Shao
Assignee: Saisai Shao


During RSCDriver's shutdown, it will first shutdown RPC server, and then all 
the RPC clients. When RPC client is closed, it will register a timeout to avoid 
orphaned RSCDriver, but this is not necessary during RSCDriver's shutdown, and 
will throw an exception as show below, so here fixing this issue.
{noformat}
18/05/02 14:03:53 INFO utils.LineBufferedStream: 18/05/02 14:03:53 WARN 
DefaultPromise: An exception was thrown by 
org.apache.livy.rsc.Utils$2.operationComplete()
18/05/02 14:03:53 INFO utils.LineBufferedStream: 
java.util.concurrent.RejectedExecutionException: event executor terminated
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:821)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:327)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:320)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:746)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:195)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:140)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.AbstractEventExecutorGroup.schedule(AbstractEventExecutorGroup.java:50)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
org.apache.livy.rsc.driver.RSCDriver.setupIdleTimeout(RSCDriver.java:238)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
org.apache.livy.rsc.driver.RSCDriver.access$100(RSCDriver.java:70)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
org.apache.livy.rsc.driver.RSCDriver$2.onSuccess(RSCDriver.java:220)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
org.apache.livy.rsc.driver.RSCDriver$2.onSuccess(RSCDriver.java:216)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
org.apache.livy.rsc.Utils$2.operationComplete(Utils.java:108)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1148)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:764)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:740)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:611)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1301)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:465)
18/05/02 14:03:53 INFO utils.LineBufferedStream: at 
io.netty.channel.DefaultChannelPipeline.clos

[jira] [Resolved] (LIVY-461) Cannot upload Jar file using LivyClientBuilder in Scala

2018-05-01 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-461.
--
Resolution: Cannot Reproduce

> Cannot upload Jar file using LivyClientBuilder in Scala
> ---
>
> Key: LIVY-461
> URL: https://issues.apache.org/jira/browse/LIVY-461
> Project: Livy
>  Issue Type: Question
>  Components: Batch
>Reporter: Liana Napalkova
>Priority: Major
>
> I am using Livy on Docker and then I submit Spark job to Livy server from 
> Scala:
> {{scalaClient = new LivyClientBuilder()}}
> {{ .setURI("http://0.0.0.0:8998";).  // checked localhost as well}}
> {{ .build()}}
> {{println("> Uploading Jar file")}}
> {{scalaClient.uploadJar(new File(myLocalJarPath)).get()}}
>  
> The step "Uploading Jar file" takes forever.
> How can I figure out what's happening?
>  
> I checked that [http://localhost:8998|http://localhost:8998/] outputs the 
> following. So, Apache Livy seems to be up and running. Spark master and 
> workers are up as well.
> h1. Operational Menu
>  * [Metrics|http://localhost:8998/metrics?pretty=true]
>  * [Ping|http://localhost:8998/ping]
>  * [Threads|http://localhost:8998/threads]
>  * [Healthcheck|http://localhost:8998/healthcheck?pretty=true]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-461) Cannot upload Jar file using LivyClientBuilder in Scala

2018-05-01 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460558#comment-16460558
 ] 

Saisai Shao commented on LIVY-461:
--

I just verified with PiApp example included in Livy, it seems work fine. 

> Cannot upload Jar file using LivyClientBuilder in Scala
> ---
>
> Key: LIVY-461
> URL: https://issues.apache.org/jira/browse/LIVY-461
> Project: Livy
>  Issue Type: Question
>  Components: Batch
>Reporter: Liana Napalkova
>Priority: Major
>
> I am using Livy on Docker and then I submit Spark job to Livy server from 
> Scala:
> {{scalaClient = new LivyClientBuilder()}}
> {{ .setURI("http://0.0.0.0:8998";).  // checked localhost as well}}
> {{ .build()}}
> {{println("> Uploading Jar file")}}
> {{scalaClient.uploadJar(new File(myLocalJarPath)).get()}}
>  
> The step "Uploading Jar file" takes forever.
> How can I figure out what's happening?
>  
> I checked that [http://localhost:8998|http://localhost:8998/] outputs the 
> following. So, Apache Livy seems to be up and running. Spark master and 
> workers are up as well.
> h1. Operational Menu
>  * [Metrics|http://localhost:8998/metrics?pretty=true]
>  * [Ping|http://localhost:8998/ping]
>  * [Threads|http://localhost:8998/threads]
>  * [Healthcheck|http://localhost:8998/healthcheck?pretty=true]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-465) Client closed before SASL negotiation finished

2018-05-01 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460370#comment-16460370
 ] 

Saisai Shao commented on LIVY-465:
--

# RSC configuration should be set through session creation request {{conf}} 
field. It doesn't support setting though static Livy conf, because Rsc 
configuration is a per-session configuration.
 # You can set a very large number of the RPC size, but we cannot guarantee if 
it is worked or not, or will it lead to unexpected behavior.

> Client closed before SASL negotiation finished
> --
>
> Key: LIVY-465
> URL: https://issues.apache.org/jira/browse/LIVY-465
> Project: Livy
>  Issue Type: Question
>  Components: Core, RSC, Server
>Affects Versions: 0.4.0
> Environment: Ubuntu 16
> Hadoop 2.7.3
> Hive 2.3.2
> Spark 2.3
>Reporter: Ozioma Ihekwoaba
>Priority: Minor
>
> Hi all,
> I have been trying to get Livy working for over a week with no results...yet.
> First stump was this error:
> {{Message (nnn bytes) exceeds maximum allowed size (nnn bytes).}}
>  
> I set the following in livy.conf and was amazed to discover that Livy DOES 
> NOT load config values from the config file! Or am I missing something?
>  {{livy.rsc.rpc.max.size = 102400}}
> {{livy.rpc.max.size = 102400}}
> {{rpc.max.size = 102400}}
>  
> NONE of the above was picked up.
> I had to manually edit RSCConf.java and modified thus
>  
> {{RPC_MAX_MESSAGE_SIZE("rpc.max.size", 50 * 10 * 1024 * 1024),}}
>  
> That took care of that error, now I'm stuck with this
> {{utils.LineBufferedStream: stdout: Exception in thread "main" 
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:}}
> {{ Client closed before SASL negotiation finished.}}
> I have 3 builds now for Spark 1.6 and 2.2, still with the same endless cycle 
> of exceptions.
> What could be wrong?
> Is it really hard to get Livy working a machine?
> And WHY is Livy not picking config values from livy.conf???
>  
> Thanks,
> Ozioma



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-465) Client closed before SASL negotiation finished

2018-05-01 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-465:
-
Priority: Minor  (was: Blocker)

> Client closed before SASL negotiation finished
> --
>
> Key: LIVY-465
> URL: https://issues.apache.org/jira/browse/LIVY-465
> Project: Livy
>  Issue Type: Bug
>  Components: Core, RSC, Server
>Affects Versions: 0.4.0
> Environment: Ubuntu 16
> Hadoop 2.7.3
> Hive 2.3.2
> Spark 2.3
>Reporter: Ozioma Ihekwoaba
>Priority: Minor
>
> Hi all,
> I have been trying to get Livy working for over a week with no results...yet.
> First stump was this error:
> {{Message (nnn bytes) exceeds maximum allowed size (nnn bytes).}}
>  
> I set the following in livy.conf and was amazed to discover that Livy DOES 
> NOT load config values from the config file! Or am I missing something?
>  {{livy.rsc.rpc.max.size = 102400}}
> {{livy.rpc.max.size = 102400}}
> {{rpc.max.size = 102400}}
>  
> NONE of the above was picked up.
> I had to manually edit RSCConf.java and modified thus
>  
> {{RPC_MAX_MESSAGE_SIZE("rpc.max.size", 50 * 10 * 1024 * 1024),}}
>  
> That took care of that error, now I'm stuck with this
> {{utils.LineBufferedStream: stdout: Exception in thread "main" 
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:}}
> {{ Client closed before SASL negotiation finished.}}
> I have 3 builds now for Spark 1.6 and 2.2, still with the same endless cycle 
> of exceptions.
> What could be wrong?
> Is it really hard to get Livy working a machine?
> And WHY is Livy not picking config values from livy.conf???
>  
> Thanks,
> Ozioma



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-465) Client closed before SASL negotiation finished

2018-05-01 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-465:
-
Issue Type: Question  (was: Bug)

> Client closed before SASL negotiation finished
> --
>
> Key: LIVY-465
> URL: https://issues.apache.org/jira/browse/LIVY-465
> Project: Livy
>  Issue Type: Question
>  Components: Core, RSC, Server
>Affects Versions: 0.4.0
> Environment: Ubuntu 16
> Hadoop 2.7.3
> Hive 2.3.2
> Spark 2.3
>Reporter: Ozioma Ihekwoaba
>Priority: Minor
>
> Hi all,
> I have been trying to get Livy working for over a week with no results...yet.
> First stump was this error:
> {{Message (nnn bytes) exceeds maximum allowed size (nnn bytes).}}
>  
> I set the following in livy.conf and was amazed to discover that Livy DOES 
> NOT load config values from the config file! Or am I missing something?
>  {{livy.rsc.rpc.max.size = 102400}}
> {{livy.rpc.max.size = 102400}}
> {{rpc.max.size = 102400}}
>  
> NONE of the above was picked up.
> I had to manually edit RSCConf.java and modified thus
>  
> {{RPC_MAX_MESSAGE_SIZE("rpc.max.size", 50 * 10 * 1024 * 1024),}}
>  
> That took care of that error, now I'm stuck with this
> {{utils.LineBufferedStream: stdout: Exception in thread "main" 
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:}}
> {{ Client closed before SASL negotiation finished.}}
> I have 3 builds now for Spark 1.6 and 2.2, still with the same endless cycle 
> of exceptions.
> What could be wrong?
> Is it really hard to get Livy working a machine?
> And WHY is Livy not picking config values from livy.conf???
>  
> Thanks,
> Ozioma



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-463) Port for accessing livy server

2018-05-01 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460333#comment-16460333
 ] 

Saisai Shao commented on LIVY-463:
--

Such information is already logged on the log file. User could also get such 
information from log file. If we want to print such information, I think we 
should do it from Livy Server Java process, not the launching script. But I 
also think it is not super necessary to do it.

> Port for accessing livy server 
> ---
>
> Key: LIVY-463
> URL: https://issues.apache.org/jira/browse/LIVY-463
> Project: Livy
>  Issue Type: Documentation
>  Components: Docs
>Affects Versions: 0.5.0
>Reporter: Appunni M
>Priority: Trivial
>
> Better Way to improve the experience for a newcomer to this project would be 
> to be able to know the port number and a message displaying the server has 
> been started while running the livy server. So before detaching from the 
> terminal, it can display a message like that of Apache Solr. This would 
> definitely go a long way as the user would know if the port has been changed 
> in the config. And can avoid going back and forth to check configs or 
> documentation to get the port. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-463) Port for accessing livy server

2018-05-01 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-463:
-
Fix Version/s: (was: 0.5.1)

> Port for accessing livy server 
> ---
>
> Key: LIVY-463
> URL: https://issues.apache.org/jira/browse/LIVY-463
> Project: Livy
>  Issue Type: Documentation
>  Components: Docs
>Affects Versions: 0.5.0
>Reporter: Appunni M
>Priority: Trivial
>
> Better Way to improve the experience for a newcomer to this project would be 
> to be able to know the port number and a message displaying the server has 
> been started while running the livy server. So before detaching from the 
> terminal, it can display a message like that of Apache Solr. This would 
> definitely go a long way as the user would know if the port has been changed 
> in the config. And can avoid going back and forth to check configs or 
> documentation to get the port. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (SPARK-23688) Refactor tests away from rate source

2018-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-23688:
---

Assignee: Jungtaek Lim

> Refactor tests away from rate source
> 
>
> Key: SPARK-23688
> URL: https://issues.apache.org/jira/browse/SPARK-23688
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.0
>
>
> Most continuous processing tests currently use a rate source, since that was 
> what was available at the time of implementation. This forces us to do a lot 
> of awkward things to work around the fact that the data in the sink is not 
> perfectly predictable. We should refactor to use a memory stream once it's 
> implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23688) Refactor tests away from rate source

2018-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-23688.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21152
[https://github.com/apache/spark/pull/21152]

> Refactor tests away from rate source
> 
>
> Key: SPARK-23688
> URL: https://issues.apache.org/jira/browse/SPARK-23688
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
> Fix For: 2.4.0
>
>
> Most continuous processing tests currently use a rate source, since that was 
> what was available at the time of implementation. This forces us to do a lot 
> of awkward things to work around the fact that the data in the sink is not 
> perfectly predictable. We should refactor to use a memory stream once it's 
> implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23830) Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object

2018-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-23830:
---

Assignee: Eric Maynard

> Spark on YARN in cluster deploy mode fail with NullPointerException when a 
> Spark application is a Scala class not object
> 
>
> Key: SPARK-23830
> URL: https://issues.apache.org/jira/browse/SPARK-23830
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Jacek Laskowski
>Assignee: Eric Maynard
>Priority: Trivial
> Fix For: 2.4.0
>
>
> As reported on StackOverflow in [Why does Spark on YARN fail with “Exception 
> in thread ”Driver“ 
> java.lang.NullPointerException”?|https://stackoverflow.com/q/49564334/1305344]
>  the following Spark application fails with {{Exception in thread "Driver" 
> java.lang.NullPointerException}} with Spark on YARN in cluster deploy mode:
> {code}
> class MyClass {
>   def main(args: Array[String]): Unit = {
> val c = new MyClass()
> c.process()
>   }
>   def process(): Unit = {
> val sparkConf = new SparkConf().setAppName("my-test")
> val sparkSession: SparkSession = 
> SparkSession.builder().config(sparkConf).getOrCreate()
> import sparkSession.implicits._
> 
>   }
>   ...
> }
> {code}
> The exception is as follows:
> {code}
> 18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a 
> separate Thread
> 18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context 
> initialization...
> Exception in thread "Driver" java.lang.NullPointerException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> {code}
> I think the reason for the exception {{Exception in thread "Driver" 
> java.lang.NullPointerException}} is due to [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L700-L701]:
> {code}
> val mainMethod = userClassLoader.loadClass(args.userClass)
>   .getMethod("main", classOf[Array[String]])
> {code}
> So when {{mainMethod}} is used in [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L706]
>  it simply gives NPE.
> {code}
> mainMethod.invoke(null, userArgs.toArray)
> {code}
> That could be easily avoided with an extra check if the {{mainMethod}} is 
> initialized and give a user a message what may have been a reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23830) Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object

2018-04-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-23830.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21168
[https://github.com/apache/spark/pull/21168]

> Spark on YARN in cluster deploy mode fail with NullPointerException when a 
> Spark application is a Scala class not object
> 
>
> Key: SPARK-23830
> URL: https://issues.apache.org/jira/browse/SPARK-23830
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Jacek Laskowski
>Priority: Trivial
> Fix For: 2.4.0
>
>
> As reported on StackOverflow in [Why does Spark on YARN fail with “Exception 
> in thread ”Driver“ 
> java.lang.NullPointerException”?|https://stackoverflow.com/q/49564334/1305344]
>  the following Spark application fails with {{Exception in thread "Driver" 
> java.lang.NullPointerException}} with Spark on YARN in cluster deploy mode:
> {code}
> class MyClass {
>   def main(args: Array[String]): Unit = {
> val c = new MyClass()
> c.process()
>   }
>   def process(): Unit = {
> val sparkConf = new SparkConf().setAppName("my-test")
> val sparkSession: SparkSession = 
> SparkSession.builder().config(sparkConf).getOrCreate()
> import sparkSession.implicits._
> 
>   }
>   ...
> }
> {code}
> The exception is as follows:
> {code}
> 18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a 
> separate Thread
> 18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context 
> initialization...
> Exception in thread "Driver" java.lang.NullPointerException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> {code}
> I think the reason for the exception {{Exception in thread "Driver" 
> java.lang.NullPointerException}} is due to [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L700-L701]:
> {code}
> val mainMethod = userClassLoader.loadClass(args.userClass)
>   .getMethod("main", classOf[Array[String]])
> {code}
> So when {{mainMethod}} is used in [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L706]
>  it simply gives NPE.
> {code}
> mainMethod.invoke(null, userArgs.toArray)
> {code}
> That could be easily avoided with an extra check if the {{mainMethod}} is 
> initialized and give a user a message what may have been a reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer

2018-04-27 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-24110:
---

 Summary: Avoid calling UGI loginUserFromKeytab in ThriftServer
 Key: SPARK-24110
 URL: https://issues.apache.org/jira/browse/SPARK-24110
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Saisai Shao


Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. 
This is unnecessary and will cause various potential problems, like Hadoop IPC 
failure after 7 days, or RM failover issue and so on.

So here we need to remove all the unnecessary login logics and make sure UGI in 
the context never be created again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23830) Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object

2018-04-26 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16455621#comment-16455621
 ] 

Saisai Shao commented on SPARK-23830:
-

I agree [~emaynard].

> Spark on YARN in cluster deploy mode fail with NullPointerException when a 
> Spark application is a Scala class not object
> 
>
> Key: SPARK-23830
> URL: https://issues.apache.org/jira/browse/SPARK-23830
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> As reported on StackOverflow in [Why does Spark on YARN fail with “Exception 
> in thread ”Driver“ 
> java.lang.NullPointerException”?|https://stackoverflow.com/q/49564334/1305344]
>  the following Spark application fails with {{Exception in thread "Driver" 
> java.lang.NullPointerException}} with Spark on YARN in cluster deploy mode:
> {code}
> class MyClass {
>   def main(args: Array[String]): Unit = {
> val c = new MyClass()
> c.process()
>   }
>   def process(): Unit = {
> val sparkConf = new SparkConf().setAppName("my-test")
> val sparkSession: SparkSession = 
> SparkSession.builder().config(sparkConf).getOrCreate()
> import sparkSession.implicits._
> 
>   }
>   ...
> }
> {code}
> The exception is as follows:
> {code}
> 18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a 
> separate Thread
> 18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context 
> initialization...
> Exception in thread "Driver" java.lang.NullPointerException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> {code}
> I think the reason for the exception {{Exception in thread "Driver" 
> java.lang.NullPointerException}} is due to [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L700-L701]:
> {code}
> val mainMethod = userClassLoader.loadClass(args.userClass)
>   .getMethod("main", classOf[Array[String]])
> {code}
> So when {{mainMethod}} is used in [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L706]
>  it simply gives NPE.
> {code}
> mainMethod.invoke(null, userArgs.toArray)
> {code}
> That could be easily avoided with an extra check if the {{mainMethod}} is 
> initialized and give a user a message what may have been a reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24062) SASL encryption cannot be worked in ThriftServer

2018-04-25 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453521#comment-16453521
 ] 

Saisai Shao commented on SPARK-24062:
-

Issue resolved by pull request 21138
[https://github.com/apache/spark/pull/21138|https://github.com/apache/spark/pull/21138]

> SASL encryption cannot be worked in ThriftServer
> 
>
> Key: SPARK-24062
> URL: https://issues.apache.org/jira/browse/SPARK-24062
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> Spark thrift server will throw an exception when SASL encryption is used.
>  
> {noformat}
> 18/04/16 14:36:46 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() on RPC id 8384069538832556183
> java.lang.IllegalArgumentException: A secret key must be specified via the 
> spark.authenticate.secret config
> at 
> org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
> at 
> org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:509)
> at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:551)
> at 
> org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166)
> at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
> at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
> at 
> org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119)
> at 
> org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:103)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:187)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111){noformat}
> To investigate it, the issue is:
> Spark on Yarn stores SASL secret in current UGI's credentials, this 
> credentials will be distributed to AM and executors, so that executors and 
> drive share the same secret to communicate. But STS/Hive library code will 
> refresh the current UGI by UGI's loginFromKeytab(), this will create a new 
> UGI in the current context with empty tokens and secret keys, so secret key 
> is lost in the current context's UGI, that's why Spark driver throws secret 
> key not found exception.
> In Spark 2.2 code, Spark also stores this secret key in {{SecurityManager}}'s 
> class variable, so even UGI is refreshed, the secret is still existed in the 
> object, so STS with SASL can still be worked in Spark 2.2. But in Spark 2.3, 
> we always search key from current UGI, which makes it fail to work in Spark 
> 2.3.
> To fix this issue, there're two possible solutions:
> 1. Fix in STS/Hive library, when a new UGI is refreshed, copy the secret key 
> from original UGI to the new one. The difficulty is that some codes to 
> refresh the UGI is existed in Hive library, which makes us hard to change the 
> code.
> 2. Roll back the logics in SecurityManager to match Spark 2.2, so that this 
> issue can be fixed.
> 2nd solution seems a simple one. So I will propose a PR with 2nd solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24062) SASL encryption cannot be worked in ThriftServer

2018-04-25 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24062.
-
   Resolution: Fixed
 Assignee: Saisai Shao
Fix Version/s: 2.4.0
   2.3.1

> SASL encryption cannot be worked in ThriftServer
> 
>
> Key: SPARK-24062
> URL: https://issues.apache.org/jira/browse/SPARK-24062
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> Spark thrift server will throw an exception when SASL encryption is used.
>  
> {noformat}
> 18/04/16 14:36:46 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() on RPC id 8384069538832556183
> java.lang.IllegalArgumentException: A secret key must be specified via the 
> spark.authenticate.secret config
> at 
> org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
> at 
> org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:509)
> at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:551)
> at 
> org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166)
> at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
> at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
> at 
> org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119)
> at 
> org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:103)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:187)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111){noformat}
> To investigate it, the issue is:
> Spark on Yarn stores SASL secret in current UGI's credentials, this 
> credentials will be distributed to AM and executors, so that executors and 
> drive share the same secret to communicate. But STS/Hive library code will 
> refresh the current UGI by UGI's loginFromKeytab(), this will create a new 
> UGI in the current context with empty tokens and secret keys, so secret key 
> is lost in the current context's UGI, that's why Spark driver throws secret 
> key not found exception.
> In Spark 2.2 code, Spark also stores this secret key in {{SecurityManager}}'s 
> class variable, so even UGI is refreshed, the secret is still existed in the 
> object, so STS with SASL can still be worked in Spark 2.2. But in Spark 2.3, 
> we always search key from current UGI, which makes it fail to work in Spark 
> 2.3.
> To fix this issue, there're two possible solutions:
> 1. Fix in STS/Hive library, when a new UGI is refreshed, copy the secret key 
> from original UGI to the new one. The difficulty is that some codes to 
> refresh the UGI is existed in Hive library, which makes us hard to change the 
> code.
> 2. Roll back the logics in SecurityManager to match Spark 2.2, so that this 
> issue can be fixed.
> 2nd solution seems a simple one. So I will propose a PR with 2nd solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (LIVY-460) when using livy with spark2.3, something wrong with the stdout and stderr

2018-04-25 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-460:


Assignee: Saisai Shao

> when using livy with spark2.3, something wrong with the stdout and stderr
> -
>
> Key: LIVY-460
> URL: https://issues.apache.org/jira/browse/LIVY-460
> Project: Livy
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Albert Cheng
>Assignee: Saisai Shao
>Priority: Trivial
> Fix For: 0.6.0
>
> Attachments: image-2018-04-24-16-19-42-211.png
>
>
> When using livy with spark2.3, something wrong with the stdout and stderr.
> The stderr log of spark is redirect to stdout in livy.
> !image-2018-04-24-16-19-42-211.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-460) when using livy with spark2.3, something wrong with the stdout and stderr

2018-04-25 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-460.
--
   Resolution: Fixed
Fix Version/s: 0.6.0

Issue resolved by pull request 89
[https://github.com/apache/incubator-livy/pull/89]

> when using livy with spark2.3, something wrong with the stdout and stderr
> -
>
> Key: LIVY-460
> URL: https://issues.apache.org/jira/browse/LIVY-460
> Project: Livy
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Albert Cheng
>Priority: Trivial
> Fix For: 0.6.0
>
> Attachments: image-2018-04-24-16-19-42-211.png
>
>
> When using livy with spark2.3, something wrong with the stdout and stderr.
> The stderr log of spark is redirect to stdout in livy.
> !image-2018-04-24-16-19-42-211.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-460) when using livy with spark2.3, something wrong with the stdout and stderr

2018-04-24 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-460:
-
Priority: Trivial  (was: Major)

> when using livy with spark2.3, something wrong with the stdout and stderr
> -
>
> Key: LIVY-460
> URL: https://issues.apache.org/jira/browse/LIVY-460
> Project: Livy
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Albert Cheng
>Priority: Trivial
> Attachments: image-2018-04-24-16-19-42-211.png
>
>
> When using livy with spark2.3, something wrong with the stdout and stderr.
> The stderr log of spark is redirect to stdout in livy.
> !image-2018-04-24-16-19-42-211.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-460) when using livy with spark2.3, something wrong with the stdout and stderr

2018-04-24 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-460:
-
Issue Type: Improvement  (was: Bug)

> when using livy with spark2.3, something wrong with the stdout and stderr
> -
>
> Key: LIVY-460
> URL: https://issues.apache.org/jira/browse/LIVY-460
> Project: Livy
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Albert Cheng
>Priority: Major
> Attachments: image-2018-04-24-16-19-42-211.png
>
>
> When using livy with spark2.3, something wrong with the stdout and stderr.
> The stderr log of spark is redirect to stdout in livy.
> !image-2018-04-24-16-19-42-211.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-460) when using livy with spark2.3, something wrong with the stdout and stderr

2018-04-24 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451522#comment-16451522
 ] 

Saisai Shao commented on LIVY-460:
--

I think the word "stdout" is misleading, actually it still captured from 
stderr. I think we can improving wording things.

> when using livy with spark2.3, something wrong with the stdout and stderr
> -
>
> Key: LIVY-460
> URL: https://issues.apache.org/jira/browse/LIVY-460
> Project: Livy
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Albert Cheng
>Priority: Major
> Attachments: image-2018-04-24-16-19-42-211.png
>
>
> When using livy with spark2.3, something wrong with the stdout and stderr.
> The stderr log of spark is redirect to stdout in livy.
> !image-2018-04-24-16-19-42-211.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SPARK-24062) SASL encryption cannot be worked in ThriftServer

2018-04-24 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-24062:
---

 Summary: SASL encryption cannot be worked in ThriftServer
 Key: SPARK-24062
 URL: https://issues.apache.org/jira/browse/SPARK-24062
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 2.3.0
Reporter: Saisai Shao


Spark thrift server will throw an exception when SASL encryption is used.

 
{noformat}
18/04/16 14:36:46 ERROR TransportRequestHandler: Error while invoking 
RpcHandler#receive() on RPC id 8384069538832556183
java.lang.IllegalArgumentException: A secret key must be specified via the 
spark.authenticate.secret config
at 
org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
at 
org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:509)
at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:551)
at 
org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166)
at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
at 
org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119)
at org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:103)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:187)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111){noformat}
To investigate it, the issue is:

Spark on Yarn stores SASL secret in current UGI's credentials, this credentials 
will be distributed to AM and executors, so that executors and drive share the 
same secret to communicate. But STS/Hive library code will refresh the current 
UGI by UGI's loginFromKeytab(), this will create a new UGI in the current 
context with empty tokens and secret keys, so secret key is lost in the current 
context's UGI, that's why Spark driver throws secret key not found exception.

In Spark 2.2 code, Spark also stores this secret key in {{SecurityManager}}'s 
class variable, so even UGI is refreshed, the secret is still existed in the 
object, so STS with SASL can still be worked in Spark 2.2. But in Spark 2.3, we 
always search key from current UGI, which makes it fail to work in Spark 2.3.

To fix this issue, there're two possible solutions:

1. Fix in STS/Hive library, when a new UGI is refreshed, copy the secret key 
from original UGI to the new one. The difficulty is that some codes to refresh 
the UGI is existed in Hive library, which makes us hard to change the code.
2. Roll back the logics in SecurityManager to match Spark 2.2, so that this 
issue can be fixed.

2nd solution seems a simple one. So I will propose a PR with 2nd solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22372) Make YARN client extend SparkApplication

2018-04-22 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-22372:
---

Assignee: Marcelo Vanzin

> Make YARN client extend SparkApplication
> 
>
> Key: SPARK-22372
> URL: https://issues.apache.org/jira/browse/SPARK-22372
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 2.3.0
>
>
> For SPARK-11035 to work well, at least in cluster mode, YARN needs to 
> implement {{SparkApplication}} so that it doesn't use system properties to 
> propagate Spark configuration from spark-submit.
> There is a second complication, that YARN uses system properties to propagate 
> {{SPARK_YARN_MODE}} on top of other Spark configs. We should take a look at 
> either change that to a configuration, or remove {{SPARK_YARN_MODE}} 
> altogether if possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24016) Yarn does not update node blacklist in static allocation

2018-04-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445252#comment-16445252
 ] 

Saisai Shao commented on SPARK-24016:
-

I think this can be useful if we enabled 
"spark.blacklist.killBlacklistedExecutors". NM could avoid relaunching 
executors on the bad nodes.

> Yarn does not update node blacklist in static allocation
> 
>
> Key: SPARK-24016
> URL: https://issues.apache.org/jira/browse/SPARK-24016
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, YARN
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Task-based blacklisting keeps track of bad nodes, and updates YARN with that 
> set of nodes so that Spark will not receive more containers on that node.  
> However, that only happens with dynamic allocation.  Though its far more 
> important with dynamic allocation, even with static allocation this matters; 
> if executors die, or if the cluster was too busy at the original resource 
> request to give all the containers, the spark application will add new 
> containers in the middle.  And we want an updated node blacklist for that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (LIVY-459) First submission via livy-python-api fails with KryoException, subsequent retries succeed

2018-04-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445239#comment-16445239
 ] 

Saisai Shao commented on LIVY-459:
--

Thanks for the report. Are you going to fix this issue, or just report it?

> First submission via livy-python-api fails with KryoException, subsequent 
> retries succeed
> -
>
> Key: LIVY-459
> URL: https://issues.apache.org/jira/browse/LIVY-459
> Project: Livy
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.6.0
> Environment: Red Hat Enterprise Linux Server release 7.3 (Maipo)
> java version "1.8.0_141"
>Reporter: Matthias Wolf
>Priority: Major
>
> I have the following code:
> {code}
> from livy.client import HttpClient
> def foobar(ctx):
> return ctx.sc.parallelize(range(101)).mean()
> client = HttpClient('http://r2i0n33:8998')
> try:
> print(client.submit(foobar).result())
> except Exception as e:
> print(e)
> print(client.submit(foobar).result())
> client.stop(True)
> {code}
> failing with the following error:
> {code:java}
> org.apache.livy.shaded.kryo.kryo.KryoException: Encountered unregistered 
> class ID: 510
> org.apache.livy.shaded.kryo.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
> org.apache.livy.shaded.kryo.kryo.Kryo.readClass(Kryo.java:656)
> org.apache.livy.shaded.kryo.kryo.Kryo.readClassAndObject(Kryo.java:767)
> org.apache.livy.client.common.Serializer.deserialize(Serializer.java:63)
> org.apache.livy.rsc.driver.BypassJob.call(BypassJob.java:39)
> org.apache.livy.rsc.driver.BypassJob.call(BypassJob.java:27)
> org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:57)
> org.apache.livy.rsc.driver.BypassJobWrapper.call(BypassJobWrapper.java:42)
> org.apache.livy.rsc.driver.BypassJobWrapper.call(BypassJobWrapper.java:27)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> 50.0
> {code}
> Is this to be expected? I would have thought that the first submission should 
> work already. I did not find this behavior online/in the mailing list 
> archives. Any pointers how to resolve this?
> This happens in both the livy-0.5.0 binaries and the github master, on RedHat 
> EL 7.3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-424) Session memory keeps growing as session persists

2018-04-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/LIVY-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443705#comment-16443705
 ] 

Saisai Shao commented on LIVY-424:
--

This seems like a Scala issue, and looks like no way to fix it in the Livy side.

> Session memory keeps growing as session persists
> 
>
> Key: LIVY-424
> URL: https://issues.apache.org/jira/browse/LIVY-424
> Project: Livy
>  Issue Type: Bug
>  Components: REPL
>Affects Versions: 0.4.0
>Reporter: Lauren Spiegel
>Priority: Major
> Attachments: commonrootpath.png, leaksuspects.png, suspect1.png, 
> suspect2.png
>
>
> I am using Spark 2.1.1 and Scala 2.11 with Livy. When I have a session open 
> for about an hour or so, I see the memory of the Spark driver growing out of 
> control.  
> I think it is related to this since a session is a spark shell/scala REPL: 
> https://issues.scala-lang.org/browse/SI-4331
> See attached analysis from Eclipse Memory Analyzer Tool.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-24008) SQL/Hive Context fails with NullPointerException

2018-04-18 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443495#comment-16443495
 ] 

Saisai Shao commented on SPARK-24008:
-

The case you provided seems not so valid. You're trying to broadcast SQL entry 
point and RDDs, which should not use broadcasting. I'm not sure if the issue 
here is due to invalid usage pattern. If you met such issue, would you please 
provide a more meaningful case. 

> SQL/Hive Context fails with NullPointerException 
> -
>
> Key: SPARK-24008
> URL: https://issues.apache.org/jira/browse/SPARK-24008
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.3
>Reporter: Prabhu Joseph
>Priority: Major
> Attachments: Repro
>
>
> SQL / Hive Context fails with NullPointerException while getting 
> configuration from SQLConf. This happens when the MemoryStore is filled with 
> lot of broadcast and started dropping and then SQL / Hive Context is created 
> and broadcast. When using this Context to access a table fails with below 
> NullPointerException.
> Repro is attached - the Spark Example which fills the MemoryStore with 
> broadcasts and then creates and accesses a SQL Context.
> {code}
> java.lang.NullPointerException
> at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638)
> at 
> org.apache.spark.sql.SQLConf.defaultDataSourceName(SQLConf.scala:558)
> at 
> org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:362)
> at org.apache.spark.sql.SQLContext.read(SQLContext.scala:623)
> at SparkHiveExample$.main(SparkHiveExample.scala:76)
> at SparkHiveExample.main(SparkHiveExample.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> 18/04/06 14:17:42 ERROR ApplicationMaster: User class threw exception: 
> java.lang.NullPointerException 
> java.lang.NullPointerException 
> at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) 
> at org.apache.spark.sql.SQLContext.getConf(SQLContext.scala:153) 
> at 
> org.apache.spark.sql.hive.HiveContext.hiveMetastoreVersion(HiveContext.scala:166)
>  
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:258)
>  
> at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255) 
> at 
> org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:475) 
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:475)
>  
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:474) 
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:90) 
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831) 
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) 
> {code}
> MemoryStore got filled and started dropping the blocks.
> {code}
> 18/04/17 08:03:43 INFO MemoryStore: 2 blocks selected for dropping
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14 stored as values in 
> memory (estimated size 78.1 MB, free 64.4 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes 
> in memory (estimated size 1522.0 B, free 64.4 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15 stored as values in 
> memory (estimated size 350.9 KB, free 64.1 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes 
> in memory (estimated size 29.9 KB, free 64.0 MB)
> 18/04/17 08:03:43 INFO MemoryStore: 10 blocks selected for dropping
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16 stored as values in 
> memory (estimated size 78.1 MB, free 64.7 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes 
> in memory (estimated size 1522.0 B, free 64.7 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 136.0 B, free 64.7 MB)
> 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared
> 18/04/17 08:03:20 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
> 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared
> 18/04/17 08:03:57 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
> 18/04/17 08:04:23 INFO MemoryStore: MemoryStore cleared
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24008) SQL/Hive Context fails with NullPointerException

2018-04-18 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443475#comment-16443475
 ] 

Saisai Shao commented on SPARK-24008:
-

Why do you need to broadcast {{SQLContext}} or {{HiveContext}}?

> SQL/Hive Context fails with NullPointerException 
> -
>
> Key: SPARK-24008
> URL: https://issues.apache.org/jira/browse/SPARK-24008
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.3
>Reporter: Prabhu Joseph
>Priority: Major
> Attachments: Repro
>
>
> SQL / Hive Context fails with NullPointerException while getting 
> configuration from SQLConf. This happens when the MemoryStore is filled with 
> lot of broadcast and started dropping and then SQL / Hive Context is created 
> and broadcast. When using this Context to access a table fails with below 
> NullPointerException.
> Repro is attached - the Spark Example which fills the MemoryStore with 
> broadcasts and then creates and accesses a SQL Context.
> {code}
> java.lang.NullPointerException
> at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638)
> at 
> org.apache.spark.sql.SQLConf.defaultDataSourceName(SQLConf.scala:558)
> at 
> org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:362)
> at org.apache.spark.sql.SQLContext.read(SQLContext.scala:623)
> at SparkHiveExample$.main(SparkHiveExample.scala:76)
> at SparkHiveExample.main(SparkHiveExample.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> 18/04/06 14:17:42 ERROR ApplicationMaster: User class threw exception: 
> java.lang.NullPointerException 
> java.lang.NullPointerException 
> at org.apache.spark.sql.SQLConf.getConf(SQLConf.scala:638) 
> at org.apache.spark.sql.SQLContext.getConf(SQLContext.scala:153) 
> at 
> org.apache.spark.sql.hive.HiveContext.hiveMetastoreVersion(HiveContext.scala:166)
>  
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:258)
>  
> at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255) 
> at 
> org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:475) 
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:475)
>  
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:474) 
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:90) 
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831) 
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) 
> {code}
> MemoryStore got filled and started dropping the blocks.
> {code}
> 18/04/17 08:03:43 INFO MemoryStore: 2 blocks selected for dropping
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14 stored as values in 
> memory (estimated size 78.1 MB, free 64.4 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes 
> in memory (estimated size 1522.0 B, free 64.4 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15 stored as values in 
> memory (estimated size 350.9 KB, free 64.1 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes 
> in memory (estimated size 29.9 KB, free 64.0 MB)
> 18/04/17 08:03:43 INFO MemoryStore: 10 blocks selected for dropping
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16 stored as values in 
> memory (estimated size 78.1 MB, free 64.7 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes 
> in memory (estimated size 1522.0 B, free 64.7 MB)
> 18/04/17 08:03:43 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 136.0 B, free 64.7 MB)
> 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared
> 18/04/17 08:03:20 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
> 18/04/17 08:03:44 INFO MemoryStore: MemoryStore cleared
> 18/04/17 08:03:57 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
> 18/04/17 08:04:23 INFO MemoryStore: MemoryStore cleared
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24014) Add onStreamingStarted method to StreamingListener

2018-04-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24014.
-
   Resolution: Fixed
Fix Version/s: 2.3.1
   2.4.0

Issue resolved by pull request 21098
[https://github.com/apache/spark/pull/21098]

> Add onStreamingStarted method to StreamingListener
> --
>
> Key: SPARK-24014
> URL: https://issues.apache.org/jira/browse/SPARK-24014
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Trivial
> Fix For: 2.4.0, 2.3.1
>
>
> The {{StreamingListener}} in PySpark side seems to be lack of 
> {{onStreamingStarted}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24014) Add onStreamingStarted method to StreamingListener

2018-04-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24014:
---

Assignee: Liang-Chi Hsieh

> Add onStreamingStarted method to StreamingListener
> --
>
> Key: SPARK-24014
> URL: https://issues.apache.org/jira/browse/SPARK-24014
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Trivial
>
> The {{StreamingListener}} in PySpark side seems to be lack of 
> {{onStreamingStarted}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (LIVY-458) Upgrade jackson version from 2.9.2 to 2.9.5

2018-04-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-458.
--
   Resolution: Fixed
Fix Version/s: 0.6.0

Issue resolved by pull request 87
[https://github.com/apache/incubator-livy/pull/87]

> Upgrade jackson version from 2.9.2 to 2.9.5
> ---
>
> Key: LIVY-458
> URL: https://issues.apache.org/jira/browse/LIVY-458
> Project: Livy
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
> Fix For: 0.6.0
>
>
> Due to several security issues of jackson databind module (CVE-2018-5968, 
> CVE-2017-17485, CVE-2018-7489), here propose to upgrade jackson version 2.9.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441957#comment-16441957
 ] 

Saisai Shao commented on SPARK-23989:
-

I've no idea what are you trying to express.

What specific issue are you seeing? I don't think you can comment out some 
codes and say "hey, this UT is failed"...

> When using `SortShuffleWriter`, the data will be overwritten
> 
>
> Key: SPARK-23989
> URL: https://issues.apache.org/jira/browse/SPARK-23989
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: liuxian
>Priority: Critical
> Attachments: 无标题2.png
>
>
> {color:#33}When using `SortShuffleWriter`, we only insert  
> '{color}{color:#cc7832}AnyRef{color}{color:#33}' into 
> '{color}PartitionedAppendOnlyMap{color:#33}' or 
> '{color}PartitionedPairBuffer{color:#33}'.{color}
> {color:#33}For this function:{color}
> {color:#cc7832}override def {color}{color:#ffc66d}write{color}(records: 
> {color:#4e807d}Iterator{color}[Product2[{color:#4e807d}K{color}{color:#cc7832},
>  {color}{color:#4e807d}V{color}]])
> the value of 'records' is `UnsafeRow`, so  the value will be overwritten
> {color:#33} {color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23843) Deploy yarn meets incorrect LOCALIZED_CONF_DIR

2018-04-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441930#comment-16441930
 ] 

Saisai Shao commented on SPARK-23843:
-

I think this issue is due to your "new Hadoop-compatible filesystem". I cannot 
reproduce with issue with HDFS. 

> Deploy yarn meets incorrect LOCALIZED_CONF_DIR
> --
>
> Key: SPARK-23843
> URL: https://issues.apache.org/jira/browse/SPARK-23843
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.0
> Environment: spark-2.3.0-bin-hadoop2.7
>Reporter: zhoutai.zt
>Priority: Major
>
> We have implement a new Hadoop-compatible filesystem and run spark on it. The 
> commands is:
> {quote}./bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
> yarn --deploy-mode cluster --executor-memory 1G --num-executors 1 
> /home/hadoop/app/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
>  10
> {quote}
> The result is:
> {quote}Exception in thread "main" org.apache.spark.SparkException: 
> Application application_1522399820301_0020 finishe
> d with failed status
>  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1159)
> {quote}
> We set log level to DEBUG and find:
> {quote}2018-04-02 09:36:09,603 DEBUG org.apache.spark.deploy.yarn.Client: 
> __app__.jar -> resource \{ scheme: "dfs" host: 
> "f-63a47d43wh98.cn-neimeng-env10-d01.dfs.aliyuncs.com" port: 10290 file: 
> "/user/hadoop/.sparkStaging/application_1522399820301_0006/spark-examples_2.11-2.3.0.jar"
>  } size: 1997548 timestamp: 1522632978000 type: FILE visibility: PRIVATE
> 2018-04-02 09:36:09,603 DEBUG org.apache.spark.deploy.yarn.Client: 
> __spark_libs__ -> resource \{ scheme: "dfs" host: 
> "f-63a47d43wh98.cn-neimeng-env10-d01.dfs.aliyuncs.com" port: 10290 file: 
> "/user/hadoop/.sparkStaging/application_1522399820301_0006/__spark_libs__924155631753698276.zip"
>  } size: 242801307 timestamp: 1522632977000 type: ARCHIVE visibility: PRIVATE
> 2018-04-02 09:36:09,603 DEBUG org.apache.spark.deploy.yarn.Client: 
> __spark_conf__ -> resource \{ port: -1 file: 
> "/user/hadoop/.sparkStaging/application_1522399820301_0006/__spark_conf__.zip"
>  } size: 185531 timestamp: 1522632978000 type: ARCHIVE visibility: PRIVATE
> {quote}
> As shown, __app__.jar and __spark_libs__ ‘s information are all correct. BUT 
> __spark_conf__ has no port, scheme.
> We explore the source code, addResource appears two times in Client.scala
> {code:java}
> val destPath = copyFileToRemote(destDir, localPath, replication, symlinkCache)
> val destFs = FileSystem.get(destPath.toUri(), hadoopConf)
> distCacheMgr.addResource(
> destFs, hadoopConf, destPath, localResources, resType, linkname, statCache,
> appMasterOnly = appMasterOnly)
> {code}
> {code:java}
>  
> val remoteConfArchivePath = new Path(destDir, LOCALIZED_CONF_ARCHIVE) val 
> remoteFs = FileSystem.get(remoteConfArchivePath.toUri(), hadoopConf) 
> sparkConf.set(CACHED_CONF_ARCHIVE, remoteConfArchivePath.toString()) val 
> localConfArchive = new Path(createConfArchive().toURI()) 
> copyFileToRemote(destDir, localConfArchive, replication, symlinkCache, force 
> = true, destName = Some(LOCALIZED_CONF_ARCHIVE)) // Manually add the config 
> archive to the cache manager so that the AM is launched with // the proper 
> files set up. 
> distCacheMgr.addResource( remoteFs, hadoopConf, remoteConfArchivePath, 
> localResources, LocalResourceType.ARCHIVE, LOCALIZED_CONF_DIR, statCache, 
> appMasterOnly = false)
> {code}
> As shown in the source code, the destPaths are differently constructed. And 
> this is confirmed by self added debug log
> {quote}2018-04-02 15:18:46,357 ERROR 
> org.apache.hadoop.yarn.util.ConverterUtils: getYarnUrlFromURI 
> URI:/user/root/.sparkStaging/application_1522399820301_0020/__spark_conf__.zip
> 2018-04-02 15:18:46,357 ERROR org.apache.hadoop.yarn.util.ConverterUtils: 
> getYarnUrlFromURI URL:null; 
> null;-1;null;/user/root/.sparkStaging/application_1522399820301_0020/__spark_conf__.zip{quote}
> Log messages on YARN NM:
> {quote}2018-04-02 09:36:11,958 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Failed to parse resource-request
> java.net.URISyntaxException: Expected scheme name at index 0: 
> :///user/hadoop/.sparkStaging/application_1522399820301_0006/__spark_conf__.zip
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23830) Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object

2018-04-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441919#comment-16441919
 ] 

Saisai Shao commented on SPARK-23830:
-

What is the reason to use {{class}} instead of {{object}}, this seems doesn't 
follow our convention about writing a Spark application. I don't think we need 
to support this.

> Spark on YARN in cluster deploy mode fail with NullPointerException when a 
> Spark application is a Scala class not object
> 
>
> Key: SPARK-23830
> URL: https://issues.apache.org/jira/browse/SPARK-23830
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> As reported on StackOverflow in [Why does Spark on YARN fail with “Exception 
> in thread ”Driver“ 
> java.lang.NullPointerException”?|https://stackoverflow.com/q/49564334/1305344]
>  the following Spark application fails with {{Exception in thread "Driver" 
> java.lang.NullPointerException}} with Spark on YARN in cluster deploy mode:
> {code}
> class MyClass {
>   def main(args: Array[String]): Unit = {
> val c = new MyClass()
> c.process()
>   }
>   def process(): Unit = {
> val sparkConf = new SparkConf().setAppName("my-test")
> val sparkSession: SparkSession = 
> SparkSession.builder().config(sparkConf).getOrCreate()
> import sparkSession.implicits._
> 
>   }
>   ...
> }
> {code}
> The exception is as follows:
> {code}
> 18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a 
> separate Thread
> 18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context 
> initialization...
> Exception in thread "Driver" java.lang.NullPointerException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> {code}
> I think the reason for the exception {{Exception in thread "Driver" 
> java.lang.NullPointerException}} is due to [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L700-L701]:
> {code}
> val mainMethod = userClassLoader.loadClass(args.userClass)
>   .getMethod("main", classOf[Array[String]])
> {code}
> So when {{mainMethod}} is used in [the following 
> code|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L706]
>  it simply gives NPE.
> {code}
> mainMethod.invoke(null, userArgs.toArray)
> {code}
> That could be easily avoided with an extra check if the {{mainMethod}} is 
> initialized and give a user a message what may have been a reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24001) Multinode cluster

2018-04-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441918#comment-16441918
 ] 

Saisai Shao commented on SPARK-24001:
-

Question should go to mail list.

> Multinode cluster 
> --
>
> Key: SPARK-24001
> URL: https://issues.apache.org/jira/browse/SPARK-24001
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Direselign
>Priority: Major
> Attachments: Screenshot from 2018-04-17 22-47-39.png
>
>
> I was trying to configure Apache spark cluster on two ubuntu 16.04 machines 
> using yarn but it is not working when I submit a task to the clusters



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >