RE: Kicking off the process around Spark 2.2.1

2017-11-02 Thread Kevin Grealish
Any update on expected 2.2.1 (or 2.3.0) release process?

From: Felix Cheung [mailto:felixcheun...@hotmail.com]
Sent: Thursday, October 26, 2017 10:04 AM
To: Sean Owen ; Holden Karau 
Cc: dev@spark.apache.org
Subject: Re: Kicking off the process around Spark 2.2.1

Yes! I can take on RM for 2.2.1.

We are still working out what to do with temp files created by Hive and Java 
that cause the policy issue with CRAN and will report back shortly, hopefully.


From: Sean Owen >
Sent: Wednesday, October 25, 2017 4:39:15 AM
To: Holden Karau
Cc: Felix Cheung; dev@spark.apache.org
Subject: Re: Kicking off the process around Spark 2.2.1

It would be reasonably consistent with the timing of other x.y.1 releases, and 
more release managers sounds useful, yeah.

Note also that in theory the code freeze for 2.3.0 starts in about 2 weeks.

On Wed, Oct 25, 2017 at 12:29 PM Holden Karau 
> wrote:
Now that Spark 2.1.2 is out it seems like now is a good time to get started on 
the Spark 2.2.1 release. There are some streaming fixes I'm aware of that would 
be good to get into a release, is there anything else people are working on for 
2.2.1 we should be tracking?

To switch it up I'd like to suggest Felix to be the RM for this since there are 
also likely some R packaging changes to be included in the release. This also 
gives us a chance to see if my updated release documentation if enough for a 
new RM to get started from.

What do folks think?
--
Twitter: 
https://twitter.com/holdenkarau


RE: regression: no longer able to use HDFS wasbs:// path for additional python files on LIVY batch submit

2016-10-03 Thread Kevin Grealish
Great. Thanks for the pointer. I see the fix is in 2.0.1-rc4.

Will there be a 1.6.3? If so, how are fixes considered for backporting?

From: Steve Loughran [mailto:ste...@hortonworks.com]
Sent: Monday, October 3, 2016 5:40 AM
To: Kevin Grealish <kevin...@microsoft.com>
Cc: Apache Spark Dev <dev@spark.apache.org>
Subject: Re: regression: no longer able to use HDFS wasbs:// path for 
additional python files on LIVY batch submit


On 1 Oct 2016, at 02:49, Kevin Grealish 
<kevin...@microsoft.com<mailto:kevin...@microsoft.com>> wrote:

I’m seeing a regression when submitting a batch PySpark program with additional 
files using LIVY. This is YARN cluster mode. The program files are placed into 
the mounted Azure Storage before making the call to LIVY. This is happening 
from an application which has credentials for the storage and the LIVY 
endpoint, but not local file systems on the cluster. This previously worked but 
now I’m getting the error below.

Seems this restriction was introduced with 
https://github.com/apache/spark/commit/5081a0a9d47ca31900ea4de570de2cbb0e063105<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fcommit%2F5081a0a9d47ca31900ea4de570de2cbb0e063105=01%7C01%7Ckevingre%40microsoft.com%7C6de8fd563cb143a4015108d3eb8a73a9%7C72f988bf86f141af91ab2d7cd011db47%7C1=YiYyvdkzUMPKAHC6hPzN2kKm6vkgJWsb4a6KpkSUa18%3D=0>
 (new in 1.6.2 and 2.0.0).

How should the scenario above be achieved now? Am I missing something?

This has been fixed in 
https://issues.apache.org/jira/browse/SPARK-17512<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-17512=01%7C01%7Ckevingre%40microsoft.com%7C6de8fd563cb143a4015108d3eb8a73a9%7C72f988bf86f141af91ab2d7cd011db47%7C1=zh7rOQL1s2ZSIdqW%2Fz0PktGPcFpMQ7HRFKETp5qIhJk%3D=0>
 ; I don't know if its in 2.0.1 though



Exception in thread "main" java.lang.IllegalArgumentException: Launching Python 
applications through spark-submit is currently only supported for local files: 
wasb://kevingreclust...@.blob.core.windows.net/x/xxx.py
at 
org.apache.spark.deploy.PythonRunner$.formatPath(PythonRunner.scala:104)
at 
org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
at 
org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at 
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.deploy.PythonRunner$.formatPaths(PythonRunner.scala:136)
at 
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:639)
at 
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:637)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:637)
at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.Exception: spark-submit exited with code 1}.



regression: no longer able to use HDFS wasbs:// path for additional python files on LIVY batch submit

2016-09-30 Thread Kevin Grealish
I'm seeing a regression when submitting a batch PySpark program with additional 
files using LIVY. This is YARN cluster mode. The program files are placed into 
the mounted Azure Storage before making the call to LIVY. This is happening 
from an application which has credentials for the storage and the LIVY 
endpoint, but not local file systems on the cluster. This previously worked but 
now I'm getting the error below.

Seems this restriction was introduced with 
https://github.com/apache/spark/commit/5081a0a9d47ca31900ea4de570de2cbb0e063105 
(new in 1.6.2 and 2.0.0).

How should the scenario above be achieved now? Am I missing something?


Exception in thread "main" java.lang.IllegalArgumentException: Launching Python 
applications through spark-submit is currently only supported for local files: 
wasb://kevingreclust...@.blob.core.windows.net/x/xxx.py
at 
org.apache.spark.deploy.PythonRunner$.formatPath(PythonRunner.scala:104)
at 
org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
at 
org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at 
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.deploy.PythonRunner$.formatPaths(PythonRunner.scala:136)
at 
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:639)
at 
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:637)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:637)
at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.Exception: spark-submit exited with code 1}.