[jira] [Resolved] (SPARK-23708) Comment of ShutdownHookManager.addShutdownHook is error

2018-03-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-23708.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 20845
[https://github.com/apache/spark/pull/20845]

> Comment of ShutdownHookManager.addShutdownHook is error
> ---
>
> Key: SPARK-23708
> URL: https://issues.apache.org/jira/browse/SPARK-23708
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Minor
> Fix For: 2.4.0
>
>
> Comment below is not right!
> {code:java}
> /**
>* Adds a shutdown hook with the given priority. Hooks with lower priority 
> values run
>* first.
>*
>* @param hook The code to run during shutdown.
>* @return A handle that can be used to unregister the shutdown hook.
>*/
>   def addShutdownHook(priority: Int)(hook: () => Unit): AnyRef = {
> shutdownHooks.add(priority, hook)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23708) Comment of ShutdownHookManager.addShutdownHook is error

2018-03-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-23708:
---

Assignee: zhoukang

> Comment of ShutdownHookManager.addShutdownHook is error
> ---
>
> Key: SPARK-23708
> URL: https://issues.apache.org/jira/browse/SPARK-23708
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Minor
> Fix For: 2.4.0
>
>
> Comment below is not right!
> {code:java}
> /**
>* Adds a shutdown hook with the given priority. Hooks with lower priority 
> values run
>* first.
>*
>* @param hook The code to run during shutdown.
>* @return A handle that can be used to unregister the shutdown hook.
>*/
>   def addShutdownHook(priority: Int)(hook: () => Unit): AnyRef = {
> shutdownHooks.add(priority, hook)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23734) InvalidSchemaException While Saving ALSModel

2018-03-18 Thread Stanley Poon (JIRA)
Stanley Poon created SPARK-23734:


 Summary: InvalidSchemaException While Saving ALSModel
 Key: SPARK-23734
 URL: https://issues.apache.org/jira/browse/SPARK-23734
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 2.3.0
 Environment: macOS 10.13.2

Scala 2.11.8

Spark 2.3.0
Reporter: Stanley Poon


After fitting an ALSModel, get following error while saving the model:

Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can 
not be empty. Parquet does not support empty group without leaves. Empty group: 
spark_schema

Exactly the same code ran ok on 2.2.1.

Same issue also occurs on other ALSModels we have.
h2. *To reproduce*

Get ALSExample: 
[https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala]
 and add the following line to save the model right before "spark.stop".
{quote}   model.write.overwrite().save("SparkExampleALSModel") 
{quote}
h2. Stack Trace
Exception in thread "main" java.lang.ExceptionInInitializerError
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.setSchema(ParquetWriteSupport.scala:444)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.prepareWrite(ParquetFileFormat.scala:112)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:140)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
at 
org.apache.spark.ml.recommendation.ALSModel$ALSModelWriter.saveImpl(ALS.scala:510)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103)
at com.vitalmove.model.ALSExample$.main(ALSExample.scala:83)
at com.vitalmove.model.ALSExample.main(ALSExample.scala)
Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can 
not be empty. Parquet does not support empty group without leaves. Empty group: 
spark_schema
at org.apache.parquet.schema.GroupType.(GroupType.java:92)
at org.apache.parquet.schema.GroupType.(GroupType.java:48)
at org.apache.parquet.schema.MessageType.(MessageType.java:50)
at org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:1256)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.(ParquetSchemaConverter.scala:567)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.(ParquetSchemaConverter.scala)
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23732) Broken link to scala source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404290#comment-16404290
 ] 

Yogesh Tewari commented on SPARK-23732:
---

Adding sourcepath to the code seems to fix the issue.
{code:java}
scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
 "-groups", // Group similar methods together based on the @group annotation.
 "-skip-packages", "org.apache.hadoop",
 "-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
) ++ ({code}
 

> Broken link to scala source code in Spark Scala api Scaladoc
> 
>
> Key: SPARK-23732
> URL: https://issues.apache.org/jira/browse/SPARK-23732
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation, Project Infra
>Affects Versions: 2.3.0, 2.3.1
> Environment: {code:java}
> ~/spark/docs$ cat /etc/*release*
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/;
> SUPPORT_URL="http://help.ubuntu.com/;
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> {code}
> Using spark packaged sbt.
> Other versions:
> {code:java}
> ~/spark/docs$ ruby -v 
> ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
> ~/spark/docs$ gem -v 
> 2.5.2.1 
> ~/spark/docs$ jekyll -v 
> jekyll 3.7.3  
> ~/spark/docs$ java -version 
> java version "1.8.0_112" Java(TM) SE Runtime Environment (build 
> 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed 
> mode)
> {code}
>Reporter: Yogesh Tewari
>Priority: Trivial
>  Labels: build, documentation, scaladocs
>
> Scala source code link in Spark api scaladoc is broken.
> Turns out instead of the relative path to the scala files the 
> "€\{FILE_PATH}.scala" expression in 
> [https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
> generating the absolute path from the developers computer. In this case, if I 
> try to access the source link on 
> [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
>  it tries to take me to 
> [https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala]
> where "/Users/sameera/dev/spark" portion of the URL is coming from the 
> developers macos home folder.
> There seems to be no change in the code responsible for generating this path 
> during the build in /project/SparkBuild.scala :
> Line # 252:
> {code:java}
> scalacOptions in Compile ++= Seq(
> s"-target:jvm-${scalacJVMVersion.value}",
> "-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
> for relative source links in scaladoc
> ),
> {code}
> Line # 726
> {code:java}
> // Use GitHub repository for Scaladoc source links
> unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,
> scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
> "-groups", // Group similar methods together based on the @group annotation.
> "-skip-packages", "org.apache.hadoop"
> ) ++ (
> // Add links to sources when generating Scaladoc for a non-snapshot release
> if (!isSnapshot.value) {
> Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
> } else {
> Seq()
> }
> ){code}
>  
> It seems more like a developers dev environment issue.
> I was successfully able to reproduce this in my dev environment. Environment 
> details attached. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23665) Add adaptive algorithm to select query result collect method

2018-03-18 Thread zhoukang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang resolved SPARK-23665.
--
Resolution: Won't Fix

> Add adaptive algorithm to select query result collect method
> 
>
> Key: SPARK-23665
> URL: https://issues.apache.org/jira/browse/SPARK-23665
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1, 2.3.0
>Reporter: zhoukang
>Priority: Major
>
> Currently, we use configuration like 
> {code:java}
> spark.sql.thriftServer.incrementalCollect
> {code}
> to specify query result collect method.
> Actually,we can estimate the size of the result and select collect method 
> automatically. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23732) Broken link to scala source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yogesh Tewari updated SPARK-23732:
--
Description: 
Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 

  was:
Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 


> Broken link to scala source code in Spark Scala api Scaladoc
> 
>
> Key: SPARK-23732
> URL: https://issues.apache.org/jira/browse/SPARK-23732
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation, Project Infra
>Affects Versions: 2.3.0, 2.3.1
> Environment: {code:java}
> ~/spark/docs$ cat /etc/*release*
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/;
> SUPPORT_URL="http://help.ubuntu.com/;
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> {code}
> Using spark packaged sbt.
> Other versions:
> {code:java}
> ~/spark/docs$ ruby -v 
> ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
> ~/spark/docs$ gem -v 
> 2.5.2.1 
> ~/spark/docs$ jekyll -v 
> jekyll 3.7.3  
> ~/spark/docs$ java -version 
> java version "1.8.0_112" Java(TM) SE Runtime Environment (build 
> 1.8.0_112-b15) Java HotSpot(TM) 64-Bit 

[jira] [Updated] (SPARK-23732) Broken link to scala source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yogesh Tewari updated SPARK-23732:
--
Description: 
Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 

  was:
Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala"
 class="external-link" 
rel="nofollow">https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 


> Broken link to scala source code in Spark Scala api Scaladoc
> 
>
> Key: SPARK-23732
> URL: https://issues.apache.org/jira/browse/SPARK-23732
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation, Project Infra
>Affects Versions: 2.3.0, 2.3.1
> Environment: {code:java}
> ~/spark/docs$ cat /etc/*release*
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/;
> SUPPORT_URL="http://help.ubuntu.com/;
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> {code}
> Using spark packaged sbt.
> Other versions:
> {code:java}
> ~/spark/docs$ ruby -v 
> ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
> ~/spark/docs$ 

[jira] [Updated] (SPARK-23733) Broken link to java source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yogesh Tewari updated SPARK-23733:
--
Description: 
Java source code link in Spark api scaladoc is broken.

The relative path expression "€\{FILE_PATH}.scala" in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] has 
".scala" hardcoded in the end. If I try to access the source link on 
[https://spark.apache.org/docs/1.6.3/api/scala/index.html#org.apache.spark.api.java.function.Function2],
 it tries to take me to 
[https://github.com/apache/spark/tree/v2.2.0/core/src/main/java/org/apache/spark/api/java/function/Function2.java.scala]

This is coming from /project/SparkBuild.scala :

Line # 720
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

 

  was:
Java source code link in Spark api scaladoc is broken.

The relative path expression "€\{FILE_PATH}.scala" in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] has 
".scala" hardcoded in the end. If I try to access the source link on 
[https://spark.apache.org/docs/1.6.3/api/scala/index.html#org.apache.spark.api.java.function.Function2],
 it tries to take me to 
[/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala"
 class="external-link" 
rel="nofollow">https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 


> Broken link to java source code in Spark Scala api Scaladoc
> ---
>
> Key: SPARK-23733
> URL: https://issues.apache.org/jira/browse/SPARK-23733
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation, Project Infra
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0
> Environment: {code:java}
> ~/spark/docs$ cat /etc/*release*
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/;
> SUPPORT_URL="http://help.ubuntu.com/;
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> {code}
> Using spark packaged sbt.
> Other versions:
> {code:java}
> ~/spark/docs$ ruby -v 
> ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
> ~/spark/docs$ gem -v 
> 2.5.2.1 
> ~/spark/docs$ jekyll -v 
> jekyll 3.7.3  
> ~/spark/docs$ java -version 
> java version "1.8.0_112" Java(TM) SE Runtime Environment (build 
> 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed 
> mode)
> {code}
>Reporter: Yogesh Tewari
>Priority: Trivial
>  Labels: build, documentation, scaladocs
>
> Java source code link in Spark api scaladoc is broken.
> The relative path expression "€\{FILE_PATH}.scala" in 
> [https://github.com/apache/spark/blob/master/project/SparkBuild.scala] has 
> ".scala" hardcoded in the end. If I try to access the source link on 
> [https://spark.apache.org/docs/1.6.3/api/scala/index.html#org.apache.spark.api.java.function.Function2],
>  it tries to take me to 
> 

[jira] [Updated] (SPARK-23733) Broken link to java source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yogesh Tewari updated SPARK-23733:
--
Affects Version/s: (was: 2.3.0)

> Broken link to java source code in Spark Scala api Scaladoc
> ---
>
> Key: SPARK-23733
> URL: https://issues.apache.org/jira/browse/SPARK-23733
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation, Project Infra
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0
> Environment: {code:java}
> ~/spark/docs$ cat /etc/*release*
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/;
> SUPPORT_URL="http://help.ubuntu.com/;
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> {code}
> Using spark packaged sbt.
> Other versions:
> {code:java}
> ~/spark/docs$ ruby -v 
> ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
> ~/spark/docs$ gem -v 
> 2.5.2.1 
> ~/spark/docs$ jekyll -v 
> jekyll 3.7.3  
> ~/spark/docs$ java -version 
> java version "1.8.0_112" Java(TM) SE Runtime Environment (build 
> 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed 
> mode)
> {code}
>Reporter: Yogesh Tewari
>Priority: Trivial
>  Labels: build, documentation, scaladocs
>
> Java source code link in Spark api scaladoc is broken.
> The relative path expression "€\{FILE_PATH}.scala" in 
> [https://github.com/apache/spark/blob/master/project/SparkBuild.scala] has 
> ".scala" hardcoded in the end. If I try to access the source link on 
> [https://spark.apache.org/docs/1.6.3/api/scala/index.html#org.apache.spark.api.java.function.Function2],
>  it tries to take me to 
> [/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala"
>  class="external-link" 
> rel="nofollow">https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]
> where "/Users/sameera/dev/spark" portion of the URL is coming from the 
> developers macos home folder.
> There seems to be no change in the code responsible for generating this path 
> during the build in /project/SparkBuild.scala :
> Line # 252:
> {code:java}
> scalacOptions in Compile ++= Seq(
> s"-target:jvm-${scalacJVMVersion.value}",
> "-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
> for relative source links in scaladoc
> ),
> {code}
> Line # 726
> {code:java}
> // Use GitHub repository for Scaladoc source links
> unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,
> scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
> "-groups", // Group similar methods together based on the @group annotation.
> "-skip-packages", "org.apache.hadoop"
> ) ++ (
> // Add links to sources when generating Scaladoc for a non-snapshot release
> if (!isSnapshot.value) {
> Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
> } else {
> Seq()
> }
> ){code}
>  
> It seems more like a developers dev environment issue.
> I was successfully able to reproduce this in my dev environment. Environment 
> details attached. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23733) Broken link to java source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yogesh Tewari updated SPARK-23733:
--
Description: 
Java source code link in Spark api scaladoc is broken.

The relative path expression "€\{FILE_PATH}.scala" in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] has 
".scala" hardcoded in the end. If I try to access the source link on 
[https://spark.apache.org/docs/1.6.3/api/scala/index.html#org.apache.spark.api.java.function.Function2],
 it tries to take me to 
[/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala"
 class="external-link" 
rel="nofollow">https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 

  was:
Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala"
 class="external-link" 
rel="nofollow">https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 


> Broken link to java source code in Spark Scala api Scaladoc
> ---
>
> Key: SPARK-23733
> URL: https://issues.apache.org/jira/browse/SPARK-23733
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation, Project Infra
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0
> Environment: {code:java}
> ~/spark/docs$ cat /etc/*release*
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/;
> SUPPORT_URL="http://help.ubuntu.com/;
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> {code}
> Using spark packaged sbt.
> Other versions:
> {code:java}
> ~/spark/docs$ ruby -v 
> ruby 

[jira] [Updated] (SPARK-23733) Broken link to java source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yogesh Tewari updated SPARK-23733:
--
Affects Version/s: (was: 2.3.1)
   1.6.3
   2.0.2
   2.1.2
   2.2.0

> Broken link to java source code in Spark Scala api Scaladoc
> ---
>
> Key: SPARK-23733
> URL: https://issues.apache.org/jira/browse/SPARK-23733
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation, Project Infra
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0, 2.3.0
> Environment: {code:java}
> ~/spark/docs$ cat /etc/*release*
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/;
> SUPPORT_URL="http://help.ubuntu.com/;
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> {code}
> Using spark packaged sbt.
> Other versions:
> {code:java}
> ~/spark/docs$ ruby -v 
> ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
> ~/spark/docs$ gem -v 
> 2.5.2.1 
> ~/spark/docs$ jekyll -v 
> jekyll 3.7.3  
> ~/spark/docs$ java -version 
> java version "1.8.0_112" Java(TM) SE Runtime Environment (build 
> 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed 
> mode)
> {code}
>Reporter: Yogesh Tewari
>Priority: Trivial
>  Labels: build, documentation, scaladocs
>
> Scala source code link in Spark api scaladoc is broken.
> Turns out instead of the relative path to the scala files the 
> "€\{FILE_PATH}.scala" expression in 
> [https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
> generating the absolute path from the developers computer. In this case, if I 
> try to access the source link on 
> [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
>  it tries to take me to 
> [/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala"
>  class="external-link" 
> rel="nofollow">https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]
> where "/Users/sameera/dev/spark" portion of the URL is coming from the 
> developers macos home folder.
> There seems to be no change in the code responsible for generating this path 
> during the build in /project/SparkBuild.scala :
> Line # 252:
> {code:java}
> scalacOptions in Compile ++= Seq(
> s"-target:jvm-${scalacJVMVersion.value}",
> "-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
> for relative source links in scaladoc
> ),
> {code}
> Line # 726
> {code:java}
> // Use GitHub repository for Scaladoc source links
> unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,
> scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
> "-groups", // Group similar methods together based on the @group annotation.
> "-skip-packages", "org.apache.hadoop"
> ) ++ (
> // Add links to sources when generating Scaladoc for a non-snapshot release
> if (!isSnapshot.value) {
> Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
> } else {
> Seq()
> }
> ){code}
>  
> It seems more like a developers dev environment issue.
> I was successfully able to reproduce this in my dev environment. Environment 
> details attached. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23733) Broken link to java source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)
Yogesh Tewari created SPARK-23733:
-

 Summary: Broken link to java source code in Spark Scala api 
Scaladoc
 Key: SPARK-23733
 URL: https://issues.apache.org/jira/browse/SPARK-23733
 Project: Spark
  Issue Type: Bug
  Components: Build, Documentation, Project Infra
Affects Versions: 2.3.0, 2.3.1
 Environment: {code:java}
~/spark/docs$ cat /etc/*release*
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/;
SUPPORT_URL="http://help.ubuntu.com/;
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
{code}
Using spark packaged sbt.

Other versions:
{code:java}
~/spark/docs$ ruby -v 
ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
~/spark/docs$ gem -v 
2.5.2.1 
~/spark/docs$ jekyll -v 
jekyll 3.7.3  
~/spark/docs$ java -version 
java version "1.8.0_112" Java(TM) SE Runtime Environment (build 1.8.0_112-b15) 
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)
{code}
Reporter: Yogesh Tewari


Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala"
 class="external-link" 
rel="nofollow">https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23732) Broken link to scala source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yogesh Tewari updated SPARK-23732:
--
Description: 
Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/Accumulable.scala"
 class="external-link" 
rel="nofollow">https://github.com/apache/spark/tree/v2.3.0{color:#ff}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala|https://github.com/apache/spark/tree/v2.3.0%3Cfont%20color=]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment. Environment 
details attached. 

 

  was:
Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[https://github.com/apache/spark/tree/v2.3.0{color:#FF}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment.

 


> Broken link to scala source code in Spark Scala api Scaladoc
> 
>
> Key: SPARK-23732
> URL: https://issues.apache.org/jira/browse/SPARK-23732
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation, Project Infra
>Affects Versions: 2.3.0, 2.3.1
> Environment: {code:java}
> ~/spark/docs$ cat /etc/*release*
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/;
> SUPPORT_URL="http://help.ubuntu.com/;
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> {code}
> Using spark packaged sbt.
> Other versions:
> {code:java}
> ~/spark/docs$ ruby -v 
> ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
> ~/spark/docs$ gem -v 
> 2.5.2.1 
> ~/spark/docs$ jekyll -v 
> jekyll 3.7.3  
> ~/spark/docs$ java -version 

[jira] [Created] (SPARK-23732) Broken link to scala source code in Spark Scala api Scaladoc

2018-03-18 Thread Yogesh Tewari (JIRA)
Yogesh Tewari created SPARK-23732:
-

 Summary: Broken link to scala source code in Spark Scala api 
Scaladoc
 Key: SPARK-23732
 URL: https://issues.apache.org/jira/browse/SPARK-23732
 Project: Spark
  Issue Type: Bug
  Components: Build, Documentation, Project Infra
Affects Versions: 2.3.0, 2.3.1
 Environment: {code:java}
~/spark/docs$ cat /etc/*release*
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/;
SUPPORT_URL="http://help.ubuntu.com/;
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
{code}
Using spark packaged sbt.

Other versions:
{code:java}
~/spark/docs$ ruby -v 
ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] 
~/spark/docs$ gem -v 
2.5.2.1 
~/spark/docs$ jekyll -v 
jekyll 3.7.3  
~/spark/docs$ java -version 
java version "1.8.0_112" Java(TM) SE Runtime Environment (build 1.8.0_112-b15) 
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)
{code}
Reporter: Yogesh Tewari


Scala source code link in Spark api scaladoc is broken.

Turns out instead of the relative path to the scala files the 
"€\{FILE_PATH}.scala" expression in 
[https://github.com/apache/spark/blob/master/project/SparkBuild.scala] is 
generating the absolute path from the developers computer. In this case, if I 
try to access the source link on 
[https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulable],
 it tries to take me to 
[https://github.com/apache/spark/tree/v2.3.0{color:#FF}/Users/sameera/dev/spark{color}/core/src/main/scala/org/apache/spark/Accumulable.scala]

where "/Users/sameera/dev/spark" portion of the URL is coming from the 
developers macos home folder.

There seems to be no change in the code responsible for generating this path 
during the build in /project/SparkBuild.scala :

Line # 252:
{code:java}
scalacOptions in Compile ++= Seq(
s"-target:jvm-${scalacJVMVersion.value}",
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required 
for relative source links in scaladoc
),
{code}
Line # 726
{code:java}
// Use GitHub repository for Scaladoc source links
unidocSourceBase := s"https://github.com/apache/spark/tree/v${version.value};,

scalacOptions in (ScalaUnidoc, unidoc) ++= Seq(
"-groups", // Group similar methods together based on the @group annotation.
"-skip-packages", "org.apache.hadoop"
) ++ (
// Add links to sources when generating Scaladoc for a non-snapshot release
if (!isSnapshot.value) {
Opts.doc.sourceUrl(unidocSourceBase.value + "€{FILE_PATH}.scala")
} else {
Seq()
}
){code}
 

It seems more like a developers dev environment issue.

I was successfully able to reproduce this in my dev environment.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23650) Slow SparkR udf (dapply)

2018-03-18 Thread Deepansh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404213#comment-16404213
 ] 

Deepansh edited comment on SPARK-23650 at 3/18/18 10:16 PM:


I tried reading the model in UDF, but for every new stream, the model is being 
read which is adding an overhead (~2s). IMO The problem here is that R 
environment inside the thread for applying UDF is not getting cached. It is 
created and destroyed with each query.
Attached - logs 

To overcome the problem, I was using broadcast, as technically broadcast is 
done only once to the executors.


was (Author: litup):
I tried reading the model in UDF, but for every new stream, the model is being 
read which is adding an overhead (~2s). IMO The problem here is the R 
environment is not getting cached. It is created and destroyed with each query.
Attached - logs 

To overcome the problem, I was using broadcast, as technically broadcast is 
done only once to the executors.

> Slow SparkR udf (dapply)
> 
>
> Key: SPARK-23650
> URL: https://issues.apache.org/jira/browse/SPARK-23650
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, SparkR, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Deepansh
>Priority: Major
> Attachments: read_model_in_udf.txt, sparkR_log2.txt, sparkRlag.txt
>
>
> For eg, I am getting streams from Kafka and I want to implement a model made 
> in R for those streams. For this, I am using dapply.
> My code is:
> iris_model <- readRDS("./iris_model.rds")
> randomBr <- SparkR:::broadcast(sc, iris_model)
> kafka <- read.stream("kafka",subscribe = "source", kafka.bootstrap.servers = 
> "localhost:9092", topic = "source")
> lines<- select(kafka, cast(kafka$value, "string"))
> schema<-schema(lines)
> df1<-dapply(lines,function(x){
> i_model<-SparkR:::value(randomMatBr)
> for (row in 1:nrow(x))
> { y<-fromJSON(as.character(x[row,"value"])) y$predict=predict(i_model,y) 
> y<-toJSON(y) x[row,"value"] = y }
> x
> },schema)
> Every time when Kafka streams are fetched the dapply method creates new 
> runner thread and ships the variables again, which causes a huge lag(~2s for 
> shipping model) every time. I even tried without broadcast variables but it 
> takes same time to ship variables. Can some other techniques be applied to 
> improve its performance?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23650) Slow SparkR udf (dapply)

2018-03-18 Thread Deepansh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepansh updated SPARK-23650:
-
Attachment: read_model_in_udf.txt

> Slow SparkR udf (dapply)
> 
>
> Key: SPARK-23650
> URL: https://issues.apache.org/jira/browse/SPARK-23650
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, SparkR, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Deepansh
>Priority: Major
> Attachments: read_model_in_udf.txt, sparkR_log2.txt, sparkRlag.txt
>
>
> For eg, I am getting streams from Kafka and I want to implement a model made 
> in R for those streams. For this, I am using dapply.
> My code is:
> iris_model <- readRDS("./iris_model.rds")
> randomBr <- SparkR:::broadcast(sc, iris_model)
> kafka <- read.stream("kafka",subscribe = "source", kafka.bootstrap.servers = 
> "localhost:9092", topic = "source")
> lines<- select(kafka, cast(kafka$value, "string"))
> schema<-schema(lines)
> df1<-dapply(lines,function(x){
> i_model<-SparkR:::value(randomMatBr)
> for (row in 1:nrow(x))
> { y<-fromJSON(as.character(x[row,"value"])) y$predict=predict(i_model,y) 
> y<-toJSON(y) x[row,"value"] = y }
> x
> },schema)
> Every time when Kafka streams are fetched the dapply method creates new 
> runner thread and ships the variables again, which causes a huge lag(~2s for 
> shipping model) every time. I even tried without broadcast variables but it 
> takes same time to ship variables. Can some other techniques be applied to 
> improve its performance?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23650) Slow SparkR udf (dapply)

2018-03-18 Thread Deepansh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404213#comment-16404213
 ] 

Deepansh commented on SPARK-23650:
--

I tried reading the model in UDF, but for every new stream, the model is being 
read which is adding an overhead (~2s). IMO The problem here is the R 
environment is not getting cached. It is created and destroyed with each query.
Attached - logs 

To overcome the problem, I was using broadcast, as technically broadcast is 
done only once to the executors.

> Slow SparkR udf (dapply)
> 
>
> Key: SPARK-23650
> URL: https://issues.apache.org/jira/browse/SPARK-23650
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, SparkR, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Deepansh
>Priority: Major
> Attachments: sparkR_log2.txt, sparkRlag.txt
>
>
> For eg, I am getting streams from Kafka and I want to implement a model made 
> in R for those streams. For this, I am using dapply.
> My code is:
> iris_model <- readRDS("./iris_model.rds")
> randomBr <- SparkR:::broadcast(sc, iris_model)
> kafka <- read.stream("kafka",subscribe = "source", kafka.bootstrap.servers = 
> "localhost:9092", topic = "source")
> lines<- select(kafka, cast(kafka$value, "string"))
> schema<-schema(lines)
> df1<-dapply(lines,function(x){
> i_model<-SparkR:::value(randomMatBr)
> for (row in 1:nrow(x))
> { y<-fromJSON(as.character(x[row,"value"])) y$predict=predict(i_model,y) 
> y<-toJSON(y) x[row,"value"] = y }
> x
> },schema)
> Every time when Kafka streams are fetched the dapply method creates new 
> runner thread and ships the variables again, which causes a huge lag(~2s for 
> shipping model) every time. I even tried without broadcast variables but it 
> takes same time to ship variables. Can some other techniques be applied to 
> improve its performance?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23650) Slow SparkR udf (dapply)

2018-03-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404198#comment-16404198
 ] 

Felix Cheung commented on SPARK-23650:
--

Is there a reason for the broadcast?

Could you instead distribute the .rds to all the executor and then call readRDS 
from within your UDF?

I understand this approach has been done quite a bit.



> Slow SparkR udf (dapply)
> 
>
> Key: SPARK-23650
> URL: https://issues.apache.org/jira/browse/SPARK-23650
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, SparkR, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Deepansh
>Priority: Major
> Attachments: sparkR_log2.txt, sparkRlag.txt
>
>
> For eg, I am getting streams from Kafka and I want to implement a model made 
> in R for those streams. For this, I am using dapply.
> My code is:
> iris_model <- readRDS("./iris_model.rds")
> randomBr <- SparkR:::broadcast(sc, iris_model)
> kafka <- read.stream("kafka",subscribe = "source", kafka.bootstrap.servers = 
> "localhost:9092", topic = "source")
> lines<- select(kafka, cast(kafka$value, "string"))
> schema<-schema(lines)
> df1<-dapply(lines,function(x){
> i_model<-SparkR:::value(randomMatBr)
> for (row in 1:nrow(x))
> { y<-fromJSON(as.character(x[row,"value"])) y$predict=predict(i_model,y) 
> y<-toJSON(y) x[row,"value"] = y }
> x
> },schema)
> Every time when Kafka streams are fetched the dapply method creates new 
> runner thread and ships the variables again, which causes a huge lag(~2s for 
> shipping model) every time. I even tried without broadcast variables but it 
> takes same time to ship variables. Can some other techniques be applied to 
> improve its performance?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23650) Slow SparkR udf (dapply)

2018-03-18 Thread Deepansh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404183#comment-16404183
 ] 

Deepansh commented on SPARK-23650:
--

Is there any other way to implement my use case with minimum(ms) overhead.
Use case - Input data from Kafka streams and apply a native R model for them 
and return the prediction to Kafka sink/any other sink.

> Slow SparkR udf (dapply)
> 
>
> Key: SPARK-23650
> URL: https://issues.apache.org/jira/browse/SPARK-23650
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, SparkR, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Deepansh
>Priority: Major
> Attachments: sparkR_log2.txt, sparkRlag.txt
>
>
> For eg, I am getting streams from Kafka and I want to implement a model made 
> in R for those streams. For this, I am using dapply.
> My code is:
> iris_model <- readRDS("./iris_model.rds")
> randomBr <- SparkR:::broadcast(sc, iris_model)
> kafka <- read.stream("kafka",subscribe = "source", kafka.bootstrap.servers = 
> "localhost:9092", topic = "source")
> lines<- select(kafka, cast(kafka$value, "string"))
> schema<-schema(lines)
> df1<-dapply(lines,function(x){
> i_model<-SparkR:::value(randomMatBr)
> for (row in 1:nrow(x))
> { y<-fromJSON(as.character(x[row,"value"])) y$predict=predict(i_model,y) 
> y<-toJSON(y) x[row,"value"] = y }
> x
> },schema)
> Every time when Kafka streams are fetched the dapply method creates new 
> runner thread and ships the variables again, which causes a huge lag(~2s for 
> shipping model) every time. I even tried without broadcast variables but it 
> takes same time to ship variables. Can some other techniques be applied to 
> improve its performance?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23645) pandas_udf can not be called with keyword arguments

2018-03-18 Thread Stu (Michael Stewart) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404126#comment-16404126
 ] 

Stu (Michael Stewart) commented on SPARK-23645:
---

{quote}Sounds a good to do if the change is minimal but if the change is big, I 
doubt if this is something we should support. Documenting this might be good 
enough for now.
{quote}
Definitely a nontrivial change after digging all the way down. I've updated PR.

> pandas_udf can not be called with keyword arguments
> ---
>
> Key: SPARK-23645
> URL: https://issues.apache.org/jira/browse/SPARK-23645
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: python 3.6 | pyspark 2.3.0 | Using Scala version 2.11.8, 
> OpenJDK 64-Bit Server VM, 1.8.0_141
>Reporter: Stu (Michael Stewart)
>Priority: Minor
>
> pandas_udf (all python udfs(?)) do not accept keyword arguments because 
> `pyspark/sql/udf.py` class `UserDefinedFunction` has __call__, and also 
> wrapper utility methods, that only accept args and not kwargs:
> @ line 168:
> {code:java}
> ...
> def __call__(self, *cols):
> judf = self._judf
> sc = SparkContext._active_spark_context
> return Column(judf.apply(_to_seq(sc, cols, _to_java_column)))
> # This function is for improving the online help system in the interactive 
> interpreter.
> # For example, the built-in help / pydoc.help. It wraps the UDF with the 
> docstring and
> # argument annotation. (See: SPARK-19161)
> def _wrapped(self):
> """
> Wrap this udf with a function and attach docstring from func
> """
> # It is possible for a callable instance without __name__ attribute or/and
> # __module__ attribute to be wrapped here. For example, 
> functools.partial. In this case,
> # we should avoid wrapping the attributes from the wrapped function to 
> the wrapper
> # function. So, we take out these attribute names from the default names 
> to set and
> # then manually assign it after being wrapped.
> assignments = tuple(
> a for a in functools.WRAPPER_ASSIGNMENTS if a != '__name__' and a != 
> '__module__')
> @functools.wraps(self.func, assigned=assignments)
> def wrapper(*args):
> return self(*args)
> ...{code}
> as seen in:
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import pandas_udf, PandasUDFType, col, lit
> spark = SparkSession.builder.getOrCreate()
> df = spark.range(12).withColumn('b', col('id') * 2)
> def ok(a,b): return a*b
> df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b')).show()  
> # no problems
> df.withColumn('ok', pandas_udf(f=ok, 
> returnType='bigint')(a='id',b='b')).show()  # fail with ~no stacktrace thanks 
> to wrapper helper
> ---
> TypeError Traceback (most recent call last)
>  in ()
> > 1 df.withColumn('ok', pandas_udf(f=ok, 
> returnType='bigint')(a='id',b='b')).show()
> TypeError: wrapper() got an unexpected keyword argument 'a'{code}
>  
>  
> *discourse*: it isn't difficult to swap back in the kwargs, allowing the UDF 
> to be called as such, but the cols tuple that gets passed in the call method:
> {code:java}
> _to_seq(sc, cols, _to_java_column{code}
>  has to be in the right order based on the functions defined argument inputs, 
> or the function will return incorrect results. so, the challenge here is to:
> (a) make sure to reconstruct the proper order of the full args/kwargs
> --> args first, and then kwargs (not in the order passed but in the order 
> requested by the fn)
> (b) handle python2 and python3 `inspect` module inconsistencies 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23731) FileSourceScanExec throws NullPointerException in subexpression elimination

2018-03-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404067#comment-16404067
 ] 

Apache Spark commented on SPARK-23731:
--

User 'jaceklaskowski' has created a pull request for this issue:
https://github.com/apache/spark/pull/20856

> FileSourceScanExec throws NullPointerException in subexpression elimination
> ---
>
> Key: SPARK-23731
> URL: https://issues.apache.org/jira/browse/SPARK-23731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.0, 2.3.1
>Reporter: Jacek Laskowski
>Priority: Minor
>
> While working with a SQL with many {{CASE WHEN}} and {{ScalarSubqueries}} I 
> faced the following exception (in Spark 2.3.0):
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.(DataSourceScanExec.scala:167)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:502)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:158)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:257)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.semanticEquals(subquery.scala:58)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$Expr.equals(EquivalentExpressions.scala:36)
>   at scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:358)
>   at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:40)
>   at 
> scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:136)
>   at scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:132)
>   at scala.collection.mutable.HashMap.findEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.get(HashMap.scala:70)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:54)
>   at 
> 

[jira] [Assigned] (SPARK-23731) FileSourceScanExec throws NullPointerException in subexpression elimination

2018-03-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23731:


Assignee: Apache Spark

> FileSourceScanExec throws NullPointerException in subexpression elimination
> ---
>
> Key: SPARK-23731
> URL: https://issues.apache.org/jira/browse/SPARK-23731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.0, 2.3.1
>Reporter: Jacek Laskowski
>Assignee: Apache Spark
>Priority: Minor
>
> While working with a SQL with many {{CASE WHEN}} and {{ScalarSubqueries}} I 
> faced the following exception (in Spark 2.3.0):
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.(DataSourceScanExec.scala:167)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:502)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:158)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:257)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.semanticEquals(subquery.scala:58)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$Expr.equals(EquivalentExpressions.scala:36)
>   at scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:358)
>   at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:40)
>   at 
> scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:136)
>   at scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:132)
>   at scala.collection.mutable.HashMap.findEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.get(HashMap.scala:70)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTree(EquivalentExpressions.scala:95)
>   at 
> 

[jira] [Assigned] (SPARK-23731) FileSourceScanExec throws NullPointerException in subexpression elimination

2018-03-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23731:


Assignee: (was: Apache Spark)

> FileSourceScanExec throws NullPointerException in subexpression elimination
> ---
>
> Key: SPARK-23731
> URL: https://issues.apache.org/jira/browse/SPARK-23731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.0, 2.3.1
>Reporter: Jacek Laskowski
>Priority: Minor
>
> While working with a SQL with many {{CASE WHEN}} and {{ScalarSubqueries}} I 
> faced the following exception (in Spark 2.3.0):
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.(DataSourceScanExec.scala:167)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:502)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:158)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:257)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.semanticEquals(subquery.scala:58)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$Expr.equals(EquivalentExpressions.scala:36)
>   at scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:358)
>   at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:40)
>   at 
> scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:136)
>   at scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:132)
>   at scala.collection.mutable.HashMap.findEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.get(HashMap.scala:70)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTree(EquivalentExpressions.scala:95)
>   at 
> 

[jira] [Commented] (SPARK-23731) FileSourceScanExec throws NullPointerException in subexpression elimination

2018-03-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404062#comment-16404062
 ] 

Apache Spark commented on SPARK-23731:
--

User 'jaceklaskowski' has created a pull request for this issue:
https://github.com/apache/spark/pull/20855

> FileSourceScanExec throws NullPointerException in subexpression elimination
> ---
>
> Key: SPARK-23731
> URL: https://issues.apache.org/jira/browse/SPARK-23731
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.0, 2.3.1
>Reporter: Jacek Laskowski
>Priority: Minor
>
> While working with a SQL with many {{CASE WHEN}} and {{ScalarSubqueries}} I 
> faced the following exception (in Spark 2.3.0):
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.(DataSourceScanExec.scala:167)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:502)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:158)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:257)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.semanticEquals(subquery.scala:58)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$Expr.equals(EquivalentExpressions.scala:36)
>   at scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:358)
>   at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:40)
>   at 
> scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:136)
>   at scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:132)
>   at scala.collection.mutable.HashMap.findEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.get(HashMap.scala:70)
>   at 
> org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:54)
>   at 
> 

[jira] [Created] (SPARK-23731) FileSourceScanExec throws NullPointerException in subexpression elimination

2018-03-18 Thread Jacek Laskowski (JIRA)
Jacek Laskowski created SPARK-23731:
---

 Summary: FileSourceScanExec throws NullPointerException in 
subexpression elimination
 Key: SPARK-23731
 URL: https://issues.apache.org/jira/browse/SPARK-23731
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0, 2.2.1, 2.3.1
Reporter: Jacek Laskowski


While working with a SQL with many {{CASE WHEN}} and {{ScalarSubqueries}} I 
faced the following exception (in Spark 2.3.0):
{code:java}
Caused by: java.lang.NullPointerException
  at 
org.apache.spark.sql.execution.FileSourceScanExec.(DataSourceScanExec.scala:167)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:502)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:158)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:224)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:224)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:210)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:209)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:257)
  at 
org.apache.spark.sql.execution.ScalarSubquery.semanticEquals(subquery.scala:58)
  at 
org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$Expr.equals(EquivalentExpressions.scala:36)
  at scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:358)
  at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:40)
  at 
scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:136)
  at scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:132)
  at scala.collection.mutable.HashMap.findEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.get(HashMap.scala:70)
  at 
org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:54)
  at 
org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTree(EquivalentExpressions.scala:95)
  at 
org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$$anonfun$addExprTree$1.apply(EquivalentExpressions.scala:96)
  at 
org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$$anonfun$addExprTree$1.apply(EquivalentExpressions.scala:96)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at 
org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTree(EquivalentExpressions.scala:96)
  at 

[jira] [Assigned] (SPARK-23712) Investigate replacing Code Generated UnsafeRowJoiner with an Interpreted version

2018-03-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23712:


Assignee: Herman van Hovell  (was: Apache Spark)

> Investigate replacing Code Generated UnsafeRowJoiner with an Interpreted 
> version
> 
>
> Key: SPARK-23712
> URL: https://issues.apache.org/jira/browse/SPARK-23712
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
>
> We currently have a code generated UnsafeRowJoiner. This does not make a lot 
> of sense since we can write a perfectly good 'interpreted' version.
> We should definitely benchmark this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23712) Investigate replacing Code Generated UnsafeRowJoiner with an Interpreted version

2018-03-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23712:


Assignee: Apache Spark  (was: Herman van Hovell)

> Investigate replacing Code Generated UnsafeRowJoiner with an Interpreted 
> version
> 
>
> Key: SPARK-23712
> URL: https://issues.apache.org/jira/browse/SPARK-23712
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Apache Spark
>Priority: Major
>
> We currently have a code generated UnsafeRowJoiner. This does not make a lot 
> of sense since we can write a perfectly good 'interpreted' version.
> We should definitely benchmark this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23712) Investigate replacing Code Generated UnsafeRowJoiner with an Interpreted version

2018-03-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404001#comment-16404001
 ] 

Apache Spark commented on SPARK-23712:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/20854

> Investigate replacing Code Generated UnsafeRowJoiner with an Interpreted 
> version
> 
>
> Key: SPARK-23712
> URL: https://issues.apache.org/jira/browse/SPARK-23712
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
>
> We currently have a code generated UnsafeRowJoiner. This does not make a lot 
> of sense since we can write a perfectly good 'interpreted' version.
> We should definitely benchmark this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23646) pyspark DataFrameWriter ignores customized settings?

2018-03-18 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403985#comment-16403985
 ] 

Hyukjin Kwon commented on SPARK-23646:
--

Thanks for logging here. It should be helpful if anyone faces the same problem.

> pyspark DataFrameWriter ignores customized settings?
> 
>
> Key: SPARK-23646
> URL: https://issues.apache.org/jira/browse/SPARK-23646
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.1
>Reporter: Chuan-Heng Hsiao
>Priority: Major
>
> I am using spark-2.2.1-bin-hadoop2.7 with stand-alone mode.
> (python version: 3.5.2 from ubuntu 16.04)
> I intended to have DataFrame write to hdfs with customized block-size but 
> failed.
> However, the corresponding rdd can successfully write with the customized 
> block-size.
>  
>  
> The following is the test code:
> (dfs.namenode.fs-limits.min-block-size has been set as 131072 in hdfs)
>  
>  
> ##
> # init
> ##from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
>  
> import hdfs
> from hdfs import InsecureClient
> import os
>  
> import numpy as np
> import pandas as pd
> import logging
>  
> os.environ['SPARK_HOME'] = '/opt/spark-2.2.1-bin-hadoop2.7'
>  
> block_size = 512 * 1024
>  
> conf = 
> SparkConf().setAppName("DCSSpark").setMaster("spark://spark1[:7077|http://10.7.34.47:7077/];).set('spark.cores.max',
>  20).set("spark.executor.cores", 10).set("spark.executor.memory", 
> "10g").set("spark.hadoop.dfs.blocksize", 
> str(block_size)).set("spark.hadoop.dfs.block.size", str(block_size))
>  
> spark = SparkSession.builder.config(conf=conf).getOrCreate()
> spark.sparkContext._jsc.hadoopConfiguration().setInt("dfs.blocksize", 
> block_size)
> spark.sparkContext._jsc.hadoopConfiguration().setInt("dfs.block.size", 
> block_size)
>  
> ##
> # main
> ##
>  # create DataFrame
> df_txt = spark.createDataFrame([\{'temp': "hello"}, \{'temp': "world"}, 
> \{'temp': "!"}])
>  
> # save using DataFrameWriter, resulting 128MB-block-size
> df_txt.write.mode('overwrite').format('parquet').save('hdfs://spark1/tmp/temp_with_df')
>  
> # save using rdd, resulting 512k-block-size
> client = InsecureClient('[http://spark1:50070|http://spark1:50070/]')
> client.delete('/tmp/temp_with_rrd', recursive=True)
> df_txt.rdd.saveAsTextFile('hdfs://spark1/tmp/temp_with_rrd')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23706) spark.conf.get(value, default=None) should produce None in PySpark

2018-03-18 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23706.
--
   Resolution: Fixed
Fix Version/s: 2.4.0
   2.3.1

Fixed in https://github.com/apache/spark/pull/20841

> spark.conf.get(value, default=None) should produce None in PySpark
> --
>
> Key: SPARK-23706
> URL: https://issues.apache.org/jira/browse/SPARK-23706
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.3.1, 2.4.0
>
>
> Scala:
> {code}
> scala> spark.conf.get("hey")
> java.util.NoSuchElementException: hey
>   at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1600)
>   at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1600)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1600)
>   at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74)
>   ... 49 elided
> scala> spark.conf.get("hey", null)
> res1: String = null
> scala> spark.conf.get("spark.sql.sources.partitionOverwriteMode", null)
> res2: String = null
> {code}
> Python:
> {code}
> >>> spark.conf.get("hey")
> ...
> py4j.protocol.Py4JJavaError: An error occurred while calling o30.get.
> : java.util.NoSuchElementException: hey
> ...
> >>> spark.conf.get("hey", None)
> ...
> py4j.protocol.Py4JJavaError: An error occurred while calling o30.get.
> : java.util.NoSuchElementException: hey
> ...
> >>> spark.conf.get("spark.sql.sources.partitionOverwriteMode", None)
> u'STATIC'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23730) Save and expose "in bag" tracking for random forest model

2018-03-18 Thread Julian King (JIRA)
Julian King created SPARK-23730:
---

 Summary: Save and expose "in bag" tracking for random forest model
 Key: SPARK-23730
 URL: https://issues.apache.org/jira/browse/SPARK-23730
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 2.3.0
Reporter: Julian King


In a random forest model, it is often useful to be able to keep track of which 
samples ended up in each of the bootstrap replications (and how many times this 
happened). For instance, in the R randomForest package this is accomplished 
through the option keep.inbag=TRUE

Similar functionality in Spark ML's random forest would be helpful



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23710) Upgrade Hive to 2.3.2

2018-03-18 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403886#comment-16403886
 ] 

Yuming Wang commented on SPARK-23710:
-

Without {{-Phive}}  works fine, because {{$SPARK_HOME/jars}} contains 
{{hive-storage-api-2.4.0.jar}}.
{{nohive}} is to compatible with Hive 1.x, if still uses {{nohive}} after 
upgrading Hive to 2.3.2, there will be a lot of conflicts.

> Upgrade Hive to 2.3.2
> -
>
> Key: SPARK-23710
> URL: https://issues.apache.org/jira/browse/SPARK-23710
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> h1. Mainly changes
>  * Maven dependency:
>  hive.version from {{1.2.1.spark2}} to {{2.3.2}} and change 
> {{hive.classifier}} to {{core}}
>  calcite.version from {{1.2.0-incubating}} to {{1.10.0}}
>  datanucleus-core.version from {{3.2.10}} to {{4.1.17}}
>  remove {{orc.classifier}}, it means orc use the {{hive.storage.api}}, see: 
> ORC-174
>  add new dependency {{avatica}} and {{hive.storage.api}}
>  * ORC compatibility changes:
>  OrcColumnVector.java, OrcColumnarBatchReader.java, OrcDeserializer.scala, 
> OrcFilters.scala, OrcSerializer.scala, OrcFilterSuite.scala
>  * hive-thriftserver java file update:
>  update {{sql/hive-thriftserver/if/TCLIService.thrift}} to hive 2.3.2
>  update {{sql/hive-thriftserver/src/main/java/org/apache/hive/service/*}} to 
> hive 2.3.2
>  * TestSuite should update:
> ||TestSuite||Reason||
> |StatisticsSuite|HIVE-16098|
> |SessionCatalogSuite|Similar to [VersionsSuite.scala#L427|#L427]|
> |CliSuite, HiveThriftServer2Suites, HiveSparkSubmitSuite, HiveQuerySuite, 
> SQLQuerySuite|Update hive-hcatalog-core-0.13.1.jar to 
> hive-hcatalog-core-2.3.2.jar|
> |SparkExecuteStatementOperationSuite|Interface changed from 
> org.apache.hive.service.cli.Type.NULL_TYPE to 
> org.apache.hadoop.hive.serde2.thrift.Type.NULL_TYPE|
> |ClasspathDependenciesSuite|org.apache.hive.com.esotericsoftware.kryo.Kryo 
> change to com.esotericsoftware.kryo.Kryo|
> |HiveMetastoreCatalogSuite|Result format changed from Seq("1.1\t1", "2.1\t2") 
> to Seq("1.100\t1", "2.100\t2")|
> |HiveOrcFilterSuite|Result format changed|
> |HiveDDLSuite|Remove $ (This change needs to be reconsidered)|
> |HiveExternalCatalogVersionsSuite| java.lang.ClassCastException: 
> org.datanucleus.identity.DatastoreIdImpl cannot be cast to 
> org.datanucleus.identity.OID|
>  * Other changes:
> Close hive schema verification:  
> [HiveClientImpl.scala#L251|https://github.com/wangyum/spark/blob/75e4cc9e80f85517889e87a35da117bc361f2ff3/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L251]
>  and 
> [HiveExternalCatalog.scala#L58|https://github.com/wangyum/spark/blob/75e4cc9e80f85517889e87a35da117bc361f2ff3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L58]
> Update 
> [IsolatedClientLoader.scala#L189-L192|https://github.com/wangyum/spark/blob/75e4cc9e80f85517889e87a35da117bc361f2ff3/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L189-L192]
> Because Hive 2.3.2's {{org.apache.hadoop.hive.ql.metadata.Hive}} can't 
> connect to Hive 1.x metastore, We should use 
> {{HiveMetaStoreClient.getDelegationToken}} instead of 
> {{Hive.getDelegationToken}} and update {{HiveClientImpl.toHiveTable}}
> All changes can be found at 
> [PR-20659|https://github.com/apache/spark/pull/20659].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23729) Glob resolution breaks remote naming of files/archives

2018-03-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23729:


Assignee: (was: Apache Spark)

> Glob resolution breaks remote naming of files/archives
> --
>
> Key: SPARK-23729
> URL: https://issues.apache.org/jira/browse/SPARK-23729
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Mihaly Toth
>Priority: Major
>
> Given one uses {{spark-submit}} with either of the {{\-\-archives}} or the 
> {{\-\-files}} parameters, in case the file name actually contains glob 
> patterns, the rename part ({{...#nameAs}}) of the filename will eventually be 
> ignored.
> Thinking over the resolution cases, if the resolution results in multiple 
> files, it does not make sense to send all of them under the same remote name. 
> So this should then result in an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23729) Glob resolution breaks remote naming of files/archives

2018-03-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23729:


Assignee: Apache Spark

> Glob resolution breaks remote naming of files/archives
> --
>
> Key: SPARK-23729
> URL: https://issues.apache.org/jira/browse/SPARK-23729
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Mihaly Toth
>Assignee: Apache Spark
>Priority: Major
>
> Given one uses {{spark-submit}} with either of the {{\-\-archives}} or the 
> {{\-\-files}} parameters, in case the file name actually contains glob 
> patterns, the rename part ({{...#nameAs}}) of the filename will eventually be 
> ignored.
> Thinking over the resolution cases, if the resolution results in multiple 
> files, it does not make sense to send all of them under the same remote name. 
> So this should then result in an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23729) Glob resolution breaks remote naming of files/archives

2018-03-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403879#comment-16403879
 ] 

Apache Spark commented on SPARK-23729:
--

User 'misutoth' has created a pull request for this issue:
https://github.com/apache/spark/pull/20853

> Glob resolution breaks remote naming of files/archives
> --
>
> Key: SPARK-23729
> URL: https://issues.apache.org/jira/browse/SPARK-23729
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Mihaly Toth
>Priority: Major
>
> Given one uses {{spark-submit}} with either of the {{\-\-archives}} or the 
> {{\-\-files}} parameters, in case the file name actually contains glob 
> patterns, the rename part ({{...#nameAs}}) of the filename will eventually be 
> ignored.
> Thinking over the resolution cases, if the resolution results in multiple 
> files, it does not make sense to send all of them under the same remote name. 
> So this should then result in an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23729) Glob resolution breaks remote naming of files/archives

2018-03-18 Thread Mihaly Toth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403876#comment-16403876
 ] 

Mihaly Toth commented on SPARK-23729:
-

Already working on this. Will submit a PR shortly.

> Glob resolution breaks remote naming of files/archives
> --
>
> Key: SPARK-23729
> URL: https://issues.apache.org/jira/browse/SPARK-23729
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Mihaly Toth
>Priority: Major
>
> Given one uses {{spark-submit}} with either of the {{\-\-archives}} or the 
> {{\-\-files}} parameters, in case the file name actually contains glob 
> patterns, the rename part ({{...#nameAs}}) of the filename will eventually be 
> ignored.
> Thinking over the resolution cases, if the resolution results in multiple 
> files, it does not make sense to send all of them under the same remote name. 
> So this should then result in an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23729) Glob resolution breaks remote naming of files/archives

2018-03-18 Thread Mihaly Toth (JIRA)
Mihaly Toth created SPARK-23729:
---

 Summary: Glob resolution breaks remote naming of files/archives
 Key: SPARK-23729
 URL: https://issues.apache.org/jira/browse/SPARK-23729
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.3.0
Reporter: Mihaly Toth


Given one uses {{spark-submit}} with either of the {{\-\-archives}} or the 
{{\-\-files}} parameters, in case the file name actually contains glob 
patterns, the rename part ({{...#nameAs}}) of the filename will eventually be 
ignored.

Thinking over the resolution cases, if the resolution results in multiple 
files, it does not make sense to send all of them under the same remote name. 
So this should then result in an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org