[jira] [Commented] (SPARK-15665) spark-submit --kill and --status are not working

2017-03-25 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942141#comment-15942141
 ] 

Devaraj K commented on SPARK-15665:
---

[~samuel-soubeyran], This issue has been resolved, Please create another jira 
if you see any other problems.

> spark-submit --kill and --status are not working 
> -
>
> Key: SPARK-15665
> URL: https://issues.apache.org/jira/browse/SPARK-15665
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Devaraj K
>Assignee: Devaraj K
> Fix For: 2.0.0
>
>
> {code:xml}
> [devaraj@server2 spark-master]$ ./bin/spark-submit --kill 
> driver-20160531171222-  --master spark://xx.xx.xx.xx:6066
> Exception in thread "main" java.lang.IllegalArgumentException: Missing 
> application resource.
> at 
> org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
> at org.apache.spark.launcher.Main.main(Main.java:86)
> {code}
> {code:xml}
> [devaraj@server2 spark-master]$ ./bin/spark-submit --status 
> driver-20160531171222-  --master spark://xx.xx.xx.xx:6066
> Exception in thread "main" java.lang.IllegalArgumentException: Missing 
> application resource.
> at 
> org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
> at org.apache.spark.launcher.Main.main(Main.java:86)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20043) Decision Tree loader does not handle uppercase impurity param values

2017-03-25 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-20043:
--
Affects Version/s: 2.2.0

> Decision Tree loader does not handle uppercase impurity param values
> 
>
> Key: SPARK-20043
> URL: https://issues.apache.org/jira/browse/SPARK-20043
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Zied Sellami
>  Labels: starter
>
> I saved a CrossValidatorModel with a decision tree and a random forest. I use 
> Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not 
> able to load the saved model, when impurity are written not in lowercase. I 
> obtain an error from Spark "impurity Gini (Entropy) not recognized.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20043) Decision Tree loader does not handle uppercase impurity param values

2017-03-25 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-20043:
--
Target Version/s: 2.1.1, 2.2.0

> Decision Tree loader does not handle uppercase impurity param values
> 
>
> Key: SPARK-20043
> URL: https://issues.apache.org/jira/browse/SPARK-20043
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Zied Sellami
>  Labels: starter
>
> I saved a CrossValidatorModel with a decision tree and a random forest. I use 
> Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not 
> able to load the saved model, when impurity are written not in lowercase. I 
> obtain an error from Spark "impurity Gini (Entropy) not recognized.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20043) Decision Tree loader does not handle uppercase impurity param values

2017-03-25 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-20043:
--
Shepherd: Joseph K. Bradley

> Decision Tree loader does not handle uppercase impurity param values
> 
>
> Key: SPARK-20043
> URL: https://issues.apache.org/jira/browse/SPARK-20043
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Zied Sellami
>  Labels: starter
>
> I saved a CrossValidatorModel with a decision tree and a random forest. I use 
> Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not 
> able to load the saved model, when impurity are written not in lowercase. I 
> obtain an error from Spark "impurity Gini (Entropy) not recognized.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20043) Decision Tree loader does not handle uppercase impurity param values

2017-03-25 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-20043:
--
Summary: Decision Tree loader does not handle uppercase impurity param 
values  (was: Decision Tree loader does not recognize impurity "Gini" and 
"Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower 
case) are accepted)

> Decision Tree loader does not handle uppercase impurity param values
> 
>
> Key: SPARK-20043
> URL: https://issues.apache.org/jira/browse/SPARK-20043
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Zied Sellami
>  Labels: starter
>
> I saved a CrossValidatorModel with a decision tree and a random forest. I use 
> Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not 
> able to load the saved model, when impurity are written not in lowercase. I 
> obtain an error from Spark "impurity Gini (Entropy) not recognized.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20043) Decision Tree loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted

2017-03-25 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-20043:
--
Summary: Decision Tree loader does not recognize impurity "Gini" and 
"Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower 
case) are accepted  (was: CrossValidatorModel loader does not recognize 
impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and 
"entropy" (in lower case) are accepted)

> Decision Tree loader does not recognize impurity "Gini" and "Entropy" on ML 
> random forest and decision. Only "gini" and "entropy" (in lower case) are 
> accepted
> --
>
> Key: SPARK-20043
> URL: https://issues.apache.org/jira/browse/SPARK-20043
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Zied Sellami
>  Labels: starter
>
> I saved a CrossValidatorModel with a decision tree and a random forest. I use 
> Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not 
> able to load the saved model, when impurity are written not in lowercase. I 
> obtain an error from Spark "impurity Gini (Entropy) not recognized.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20099) Add transformSchema to pyspark.ml

2017-03-25 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942082#comment-15942082
 ] 

Joseph K. Bradley commented on SPARK-20099:
---

Linking [SPARK-15574] since it brought up a need for transformSchema in 
pyspark.ml as well

> Add transformSchema to pyspark.ml
> -
>
> Key: SPARK-20099
> URL: https://issues.apache.org/jira/browse/SPARK-20099
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.1.0
>Reporter: Joseph K. Bradley
>
> Python's ML API currently lacks the PipelineStage abstraction.  This 
> abstraction's main purpose is to provide transformSchema() for checking for 
> early failures in a Pipeline.
> As mentioned in https://github.com/apache/spark/pull/17218 it would also be 
> useful in Python for checking Params in Python wrapper for Scala 
> implementations; in these, transformSchema would involve passing Params in 
> Python to Scala, which would then be able to validate the Param values.  This 
> could prevent late failures from bad Param settings in Pipeline execution, 
> while still allowing us to check Param values on only the Scala side.
> This issue is for adding transformSchema() to pyspark.ml.  If it's 
> reasonable, we could create a PipelineStage abstraction.  But it'd probably 
> be fine to add transformSchema() directly to Transformer and Estimator, 
> rather than creating PipelineStage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20099) Add transformSchema to pyspark.ml

2017-03-25 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-20099:
-

 Summary: Add transformSchema to pyspark.ml
 Key: SPARK-20099
 URL: https://issues.apache.org/jira/browse/SPARK-20099
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Affects Versions: 2.1.0
Reporter: Joseph K. Bradley


Python's ML API currently lacks the PipelineStage abstraction.  This 
abstraction's main purpose is to provide transformSchema() for checking for 
early failures in a Pipeline.

As mentioned in https://github.com/apache/spark/pull/17218 it would also be 
useful in Python for checking Params in Python wrapper for Scala 
implementations; in these, transformSchema would involve passing Params in 
Python to Scala, which would then be able to validate the Param values.  This 
could prevent late failures from bad Param settings in Pipeline execution, 
while still allowing us to check Param values on only the Scala side.

This issue is for adding transformSchema() to pyspark.ml.  If it's reasonable, 
we could create a PipelineStage abstraction.  But it'd probably be fine to add 
transformSchema() directly to Transformer and Estimator, rather than creating 
PipelineStage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15665) spark-submit --kill and --status are not working

2017-03-25 Thread Samuel Soubeyran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941869#comment-15941869
 ] 

Samuel Soubeyran commented on SPARK-15665:
--

When trying to do the same thing using the SparkLauncher class, it doesn't work:

new SparkLauncher(sparkEnvMap).setSparkHome(sparkHome).addSparkArg("--kill", 
submissionId).launch().

This is because SparkLauncher call the empty constructor in 
SparkSubmitCommandBuilder, bypassing all the logic to handle the kill and 
status command present in the constructor 
SparkSubmitCommandBuilder(List args).

I also want to make the case that this logic shouldn't be present in the 
constructor in the first place since it's a builder pattern. Instead it should 
be in the buildSparkSubmitCommand. The default should be the empty constructor, 
the full constructor is just a shortcut. Else what the point of having a 
builder pattern in the first place.

I'd be happy to send a PR to solve this,

Also there might be a easier way to kill/get the status of a job (cluster mode) 
but i couldn't figure it out.

Thanks,
Sam

 

> spark-submit --kill and --status are not working 
> -
>
> Key: SPARK-15665
> URL: https://issues.apache.org/jira/browse/SPARK-15665
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Devaraj K
>Assignee: Devaraj K
> Fix For: 2.0.0
>
>
> {code:xml}
> [devaraj@server2 spark-master]$ ./bin/spark-submit --kill 
> driver-20160531171222-  --master spark://xx.xx.xx.xx:6066
> Exception in thread "main" java.lang.IllegalArgumentException: Missing 
> application resource.
> at 
> org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
> at org.apache.spark.launcher.Main.main(Main.java:86)
> {code}
> {code:xml}
> [devaraj@server2 spark-master]$ ./bin/spark-submit --status 
> driver-20160531171222-  --master spark://xx.xx.xx.xx:6066
> Exception in thread "main" java.lang.IllegalArgumentException: Missing 
> application resource.
> at 
> org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
> at org.apache.spark.launcher.Main.main(Main.java:86)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17670) Spark DataFrame/Dataset no longer supports Option[Map] in case classes

2017-03-25 Thread SaschaC (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941820#comment-15941820
 ] 

SaschaC commented on SPARK-17670:
-

I  can confirm that this bug is for real and it is a major issue.

Depending on how I varied my case classes I sometimes did NOT get a mismatch on 
Maps but instead of Lists. I used avrohugger to produce case classes from avro 
schemas. I have case classes that extend from SpecificRecordBase and varied 
whether the case classes would use Array, immutable.List as well as Map, 
collections.Map or collections.immutable.Map. The collection types both lists 
and maps are optional and sometimes spark complains a mismatch on lists but 
accepts the maps or vice versa, but I could never get spark to accept my 
schema, regardless of which case class varient I tried.

> Spark DataFrame/Dataset no longer supports Option[Map] in case classes
> --
>
> Key: SPARK-17670
> URL: https://issues.apache.org/jira/browse/SPARK-17670
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Daniel Williams
>
> Upon upgrading to Spark 2.0 I discovered that previously supported case 
> classes containing members of the type Option[Map] of any key/value binding, 
> mutable or immutable, were no longer supported and produced an exception 
> similar to the following.  Upon further testing I also noticed that Option 
> was support for Seq, case classes, and primitives.  Validating unit tests 
> included using spark-testing-base.
> {code}
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'wrapoption(staticinvoke(class 
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData$, ObjectType(interface 
> scala.collection.Map), toScalaMap, mapobjects(MapObjects_loopValue32, 
> MapObjects_loopIsNull33, StringType, lambdavariable(MapObjects_loopValue32, 
> MapObjects_loopIsNull33, StringType).toString, 
> cast(lambdavariable(MapObjects_loopValue30, MapObjects_loopIsNull31, 
> StructField(uuid,StringType,true), StructField(timestamp,TimestampType,true), 
> StructField(sourceSystem,StringType,true), 
> StructField(input,MapType(StringType,StringType,true),true)).input as 
> map).keyArray).array, mapobjects(MapObjects_loopValue34, 
> MapObjects_loopIsNull35, StringType, lambdavariable(MapObjects_loopValue34, 
> MapObjects_loopIsNull35, StringType).toString, 
> cast(lambdavariable(MapObjects_loopValue30, MapObjects_loopIsNull31, 
> StructField(uuid,StringType,true), StructField(timestamp,TimestampType,true), 
> StructField(sourceSystem,StringType,true), 
> StructField(input,MapType(StringType,StringType,true),true)).input as 
> map).valueArray).array, true), ObjectType(interface 
> scala.collection.immutable.Map))' due to data type mismatch: argument 1 
> requires scala.collection.immutable.Map type, however, 'staticinvoke(class 
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData$, ObjectType(interface 
> scala.collection.Map), toScalaMap, mapobjects(MapObjects_loopValue32, 
> MapObjects_loopIsNull33, StringType, lambdavariable(MapObjects_loopValue32, 
> MapObjects_loopIsNull33, StringType).toString, 
> cast(lambdavariable(MapObjects_loopValue30, MapObjects_loopIsNull31, 
> StructField(uuid,StringType,true), StructField(timestamp,TimestampType,true), 
> StructField(sourceSystem,StringType,true), 
> StructField(input,MapType(StringType,StringType,true),true)).input as 
> map).keyArray).array, mapobjects(MapObjects_loopValue34, 
> MapObjects_loopIsNull35, StringType, lambdavariable(MapObjects_loopValue34, 
> MapObjects_loopIsNull35, StringType).toString, 
> cast(lambdavariable(MapObjects_loopValue30, MapObjects_loopIsNull31, 
> StructField(uuid,StringType,true), StructField(timestamp,TimestampType,true), 
> StructField(sourceSystem,StringType,true), 
> StructField(input,MapType(StringType,StringType,true),true)).input as 
> map).valueArray).array, true)' is of scala.collection.Map 
> type.;
> at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:82)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301)
> {code}
> Unit tests:
> {code}
> import com.holdenkarau.spark.testing.{DataFrameSuiteBase, SharedSparkContext}
> import org.scalatest.{Matchers, FunSuite}
> import org.slf4j.LoggerFactory
> import scala.util.{Failure, Try, Success}
> case class ImmutableMapTest(data: Map[String, String])
> case class MapTest(data: 

[jira] [Resolved] (SPARK-17137) Add compressed support for multinomial logistic regression coefficients

2017-03-25 Thread DB Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai resolved SPARK-17137.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17426
[https://github.com/apache/spark/pull/17426]

> Add compressed support for multinomial logistic regression coefficients
> ---
>
> Key: SPARK-17137
> URL: https://issues.apache.org/jira/browse/SPARK-17137
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>Assignee: Seth Hendrickson
>Priority: Minor
> Fix For: 2.2.0
>
>
> For sparse coefficients in MLOR, such as when high L1 regularization, it may 
> be more efficient to store coefficients in compressed format. We can add this 
> option to MLOR and perhaps to do some performance tests to verify 
> improvements.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20060) Support Standalone visiting secured HDFS

2017-03-25 Thread Kent Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-20060:
-
Description: 
h1. Brief design

h2. Introductions
The basic issue for Standalone mode to visit kerberos secured HDFS or other 
kerberized Services is how to gather the delegated tokens on the driver side 
and deliver them to the executor side. 

When we run Spark on Yarn, we set the tokens to the container launch context to 
deliver them automatically and for long-term running issue caused by token 
expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS and 
updating the credential file and renewing them over and over.  

When run Spark On Standalone, we currently have no implementations like Yarn to 
get and deliver those tokens.

h2. Implementations

Firstly, we simply move the implementation of SPARK-14743 which is only for 
yarn to core module. And we use it to gather the credentials we need, and also 
we use it to update and renew with credential files on HDFS.

Secondly, credential files on secured HDFS are reachable for executors before 
they get the tokens. Here we add a sequence configuration 
`spark.deploy.credential. entities` which is used by the driver to put 
`token.encodeToUrlString()` before launching the executors, and used by the 
executors to fetch the credential as a string sequence during fetching the 
driver side spark properties, and then decode them to tokens.  Before setting 
up the `CoarseGrainedExecutorBackend` we set the credentials to current 
executor side ugi. 



  was:For **Spark on non-Yarn** mode on a  kerberized hdfs, we don't obtain 
credentials from hive metastore, hdfs, etc and just use the local kinited user 
to connecting them. But if we specify the --proxy-user argument on non-yarn 
mode, such as local, standalone, after we simply use `UGI.createProxyUser` to 
get a proxy ugi as the effective user and wrap the code in doAs, the proxy ugi 
fails to talk to hive metastore cause by no crendentials. Thus, we need to 
obtain credentials via the real user and add them to the proxy ugi.

Component/s: (was: Spark Submit)
 Spark Core
 Issue Type: New Feature  (was: Bug)
Summary: Support Standalone visiting secured HDFS   (was: Spark On 
Non-Yarn Mode with Kerberized HDFS ProxyUser Fails Talking to Hive MetaStore )

> Support Standalone visiting secured HDFS 
> -
>
> Key: SPARK-20060
> URL: https://issues.apache.org/jira/browse/SPARK-20060
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy, Spark Core
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>
> h1. Brief design
> h2. Introductions
> The basic issue for Standalone mode to visit kerberos secured HDFS or other 
> kerberized Services is how to gather the delegated tokens on the driver side 
> and deliver them to the executor side. 
> When we run Spark on Yarn, we set the tokens to the container launch context 
> to deliver them automatically and for long-term running issue caused by token 
> expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS 
> and updating the credential file and renewing them over and over.  
> When run Spark On Standalone, we currently have no implementations like Yarn 
> to get and deliver those tokens.
> h2. Implementations
> Firstly, we simply move the implementation of SPARK-14743 which is only for 
> yarn to core module. And we use it to gather the credentials we need, and 
> also we use it to update and renew with credential files on HDFS.
> Secondly, credential files on secured HDFS are reachable for executors before 
> they get the tokens. Here we add a sequence configuration 
> `spark.deploy.credential. entities` which is used by the driver to put 
> `token.encodeToUrlString()` before launching the executors, and used by the 
> executors to fetch the credential as a string sequence during fetching the 
> driver side spark properties, and then decode them to tokens.  Before setting 
> up the `CoarseGrainedExecutorBackend` we set the credentials to current 
> executor side ugi. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20098) DataType's typeName method returns with 'StructF' in case of StructField

2017-03-25 Thread Peter Szalai (JIRA)
Peter Szalai created SPARK-20098:


 Summary: DataType's typeName method returns with 'StructF' in case 
of StructField
 Key: SPARK-20098
 URL: https://issues.apache.org/jira/browse/SPARK-20098
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.1.0
Reporter: Peter Szalai


Currently, if you want to get the name of a DateType and the DateType is a 
`StructField`, you get `StructF`. 

http://spark.apache.org/docs/2.1.0/api/python/_modules/pyspark/sql/types.html 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19674) Ignore driver accumulator updates don't belong to the execution when merging all accumulator updates

2017-03-25 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-19674:

Fix Version/s: 2.1.1

> Ignore driver accumulator updates don't belong to the execution when merging 
> all accumulator updates
> 
>
> Key: SPARK-19674
> URL: https://issues.apache.org/jira/browse/SPARK-19674
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Carson Wang
>Assignee: Carson Wang
>Priority: Minor
> Fix For: 2.1.1, 2.2.0
>
>
> In SQLListener.getExecutionMetrics, driver accumulator updates don't belong 
> to the execution should be ignored when merging all accumulator updates to 
> prevent NoSuchElementException.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20096) Expose the real queue name not null while using --verbose

2017-03-25 Thread Kent Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941712#comment-15941712
 ] 

Kent Yao commented on SPARK-20096:
--

[~sowen] example added

> Expose the real queue name not null while using --verbose
> -
>
> Key: SPARK-20096
> URL: https://issues.apache.org/jira/browse/SPARK-20096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> while submit apps with -v or --verbose, we can print the right queue name, 
> but if we set a queue name with `spark.yarn.queue` by --conf or in the 
> spark-default.conf, we just got `null` for the queue in Parsed arguments.
> {code}
> bin/spark-shell -v --conf spark.yarn.queue=thequeue
> Using properties file: 
> /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf
> 
> Adding default property: spark.yarn.queue=default
> Parsed arguments:
>   master  yarn
>   deployMode  client
>   ...
>   queue   null
>   
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file 
> /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf:
>   spark.yarn.queue -> thequeue
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20096) Expose the real queue name not null while using --verbose

2017-03-25 Thread Kent Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-20096:
-
Description: 
while submit apps with -v or --verbose, we can print the right queue name, but 
if we set a queue name with `spark.yarn.queue` by --conf or in the 
spark-default.conf, we just got `null` for the queue in Parsed arguments.
{code}
bin/spark-shell -v --conf spark.yarn.queue=thequeue
Using properties file: 
/home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf

Adding default property: spark.yarn.queue=default
Parsed arguments:
  master  yarn
  deployMode  client
  ...
  queue   null
  
  verbose true
Spark properties used, including those specified through
 --conf and those from the properties file 
/home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf:
  spark.yarn.queue -> thequeue
  
{code}

  was:while submit apps with -v or --verbose, we can print the right queue 
name, but if we set a queue name with `spark.yarn.queue` by --conf or in the 
spark-default.conf, we just got `null`


> Expose the real queue name not null while using --verbose
> -
>
> Key: SPARK-20096
> URL: https://issues.apache.org/jira/browse/SPARK-20096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> while submit apps with -v or --verbose, we can print the right queue name, 
> but if we set a queue name with `spark.yarn.queue` by --conf or in the 
> spark-default.conf, we just got `null` for the queue in Parsed arguments.
> {code}
> bin/spark-shell -v --conf spark.yarn.queue=thequeue
> Using properties file: 
> /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf
> 
> Adding default property: spark.yarn.queue=default
> Parsed arguments:
>   master  yarn
>   deployMode  client
>   ...
>   queue   null
>   
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file 
> /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf:
>   spark.yarn.queue -> thequeue
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR

2017-03-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20097:


Assignee: Apache Spark

> Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and 
> GLR
> ---
>
> Key: SPARK-20097
> URL: https://issues.apache.org/jira/browse/SPARK-20097
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Benjamin Fradet
>Assignee: Apache Spark
>Priority: Trivial
>
> - numInstances is public in lr and regression private in glr
> - degreesOfFreedom is private in lr and public in glr



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR

2017-03-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20097:


Assignee: (was: Apache Spark)

> Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and 
> GLR
> ---
>
> Key: SPARK-20097
> URL: https://issues.apache.org/jira/browse/SPARK-20097
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Benjamin Fradet
>Priority: Trivial
>
> - numInstances is public in lr and regression private in glr
> - degreesOfFreedom is private in lr and public in glr



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR

2017-03-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941705#comment-15941705
 ] 

Apache Spark commented on SPARK-20097:
--

User 'BenFradet' has created a pull request for this issue:
https://github.com/apache/spark/pull/17431

> Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and 
> GLR
> ---
>
> Key: SPARK-20097
> URL: https://issues.apache.org/jira/browse/SPARK-20097
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Benjamin Fradet
>Priority: Trivial
>
> - numInstances is public in lr and regression private in glr
> - degreesOfFreedom is private in lr and public in glr



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR

2017-03-25 Thread Benjamin Fradet (JIRA)
Benjamin Fradet created SPARK-20097:
---

 Summary: Fix visibility discrepancy with numInstances and 
degreesOfFreedom in LR and GLR
 Key: SPARK-20097
 URL: https://issues.apache.org/jira/browse/SPARK-20097
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 2.1.0
Reporter: Benjamin Fradet
Priority: Trivial


- numInstances is public in lr and regression private in glr
- degreesOfFreedom is private in lr and public in glr



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20096) Expose the real queue name not null while using --verbose

2017-03-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20096:


Assignee: Apache Spark

> Expose the real queue name not null while using --verbose
> -
>
> Key: SPARK-20096
> URL: https://issues.apache.org/jira/browse/SPARK-20096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Minor
>
> while submit apps with -v or --verbose, we can print the right queue name, 
> but if we set a queue name with `spark.yarn.queue` by --conf or in the 
> spark-default.conf, we just got `null`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20096) Expose the real queue name not null while using --verbose

2017-03-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941701#comment-15941701
 ] 

Apache Spark commented on SPARK-20096:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/17430

> Expose the real queue name not null while using --verbose
> -
>
> Key: SPARK-20096
> URL: https://issues.apache.org/jira/browse/SPARK-20096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> while submit apps with -v or --verbose, we can print the right queue name, 
> but if we set a queue name with `spark.yarn.queue` by --conf or in the 
> spark-default.conf, we just got `null`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20096) Expose the real queue name not null while using --verbose

2017-03-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20096:


Assignee: (was: Apache Spark)

> Expose the real queue name not null while using --verbose
> -
>
> Key: SPARK-20096
> URL: https://issues.apache.org/jira/browse/SPARK-20096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> while submit apps with -v or --verbose, we can print the right queue name, 
> but if we set a queue name with `spark.yarn.queue` by --conf or in the 
> spark-default.conf, we just got `null`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20096) Expose the real queue name not null while using --verbose

2017-03-25 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941699#comment-15941699
 ] 

Sean Owen commented on SPARK-20096:
---

got null where? This needs more detail.

> Expose the real queue name not null while using --verbose
> -
>
> Key: SPARK-20096
> URL: https://issues.apache.org/jira/browse/SPARK-20096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> while submit apps with -v or --verbose, we can print the right queue name, 
> but if we set a queue name with `spark.yarn.queue` by --conf or in the 
> spark-default.conf, we just got `null`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20096) Expose the real queue name not null while using --verbose

2017-03-25 Thread Kent Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-20096:
-
Summary: Expose the real queue name not null while using --verbose  (was: 
Expose the real queue name not null using --verbose)

> Expose the real queue name not null while using --verbose
> -
>
> Key: SPARK-20096
> URL: https://issues.apache.org/jira/browse/SPARK-20096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> while submit apps with -v or --verbose, we can print the right queue name, 
> but if we set a queue name with `spark.yarn.queue` by --conf or in the 
> spark-default.conf, we just got `null`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20096) Expose the real queue name not null using --verbose

2017-03-25 Thread Kent Yao (JIRA)
Kent Yao created SPARK-20096:


 Summary: Expose the real queue name not null using --verbose
 Key: SPARK-20096
 URL: https://issues.apache.org/jira/browse/SPARK-20096
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.2.0
Reporter: Kent Yao
Priority: Minor


while submit apps with -v or --verbose, we can print the right queue name, but 
if we set a queue name with `spark.yarn.queue` by --conf or in the 
spark-default.conf, we just got `null`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20078) Mesos executor configurability for task name and labels

2017-03-25 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-20078:
-

Assignee: Kalvin Chau

> Mesos executor configurability for task name and labels
> ---
>
> Key: SPARK-20078
> URL: https://issues.apache.org/jira/browse/SPARK-20078
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Kalvin Chau
>Assignee: Kalvin Chau
>Priority: Minor
> Fix For: 2.2.0
>
>
> Add in the ability to configure the mesos task name as well as add labels to 
> the Mesos ExecutorInfo protobuf message.
> Currently all executors that are spun up are named Task X (where X is the 
> executor number). 
> For centralized logging it would be nice to be able to have SparkJob1 X then 
> Name, as well as allowing users to add any labels they would want.
> In this PR I chose "k1:v1,k2:v2" as the format, colons separating key-value 
> and commas to list out more than one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20078) Mesos executor configurability for task name and labels

2017-03-25 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20078.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17404
[https://github.com/apache/spark/pull/17404]

> Mesos executor configurability for task name and labels
> ---
>
> Key: SPARK-20078
> URL: https://issues.apache.org/jira/browse/SPARK-20078
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Kalvin Chau
>Priority: Minor
> Fix For: 2.2.0
>
>
> Add in the ability to configure the mesos task name as well as add labels to 
> the Mesos ExecutorInfo protobuf message.
> Currently all executors that are spun up are named Task X (where X is the 
> executor number). 
> For centralized logging it would be nice to be able to have SparkJob1 X then 
> Name, as well as allowing users to add any labels they would want.
> In this PR I chose "k1:v1,k2:v2" as the format, colons separating key-value 
> and commas to list out more than one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20095) A code bug in CodegenContext.withSubExprEliminationExprs

2017-03-25 Thread song fengfei (JIRA)
song fengfei created SPARK-20095:


 Summary: A code bug in CodegenContext.withSubExprEliminationExprs
 Key: SPARK-20095
 URL: https://issues.apache.org/jira/browse/SPARK-20095
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: song fengfei
Priority: Minor


In the function:CodegenContext.withSubExprEliminationExprs...{
  val oldsubExprEliminationExprs = subExprEliminationExprs
   subExprEliminationExprs.clear
  ...
// Restore previous subExprEliminationExprs
subExprEliminationExprs.clear
oldsubExprEliminationExprs.foreach(subExprEliminationExprs += _)
}
it seems that the oldsubExprEliminationExprs  and subExprEliminationExprs are 
the same instance,after the second subExprEliminationExprs.clear ,the 
oldsubExprEliminationExprs is also cleared,so,the  previous 
subExprEliminationExprs in CodegenContext will not be  Restored.
Is it a bug?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19999) Test failures in Spark Core due to java.nio.Bits.unaligned()

2017-03-25 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1.
---
Resolution: Won't Fix

I'm open to reopening this to special-case PPC if anyone cares enough to open a 
PR to explore whether that's all there is to it in Spark.

> Test failures in Spark Core due to java.nio.Bits.unaligned()
> 
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
> Environment: Ubuntu 14.04 ppc64le 
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>  Labels: ppc64le
>
> There are multiple test failures seen in Spark Core project with the 
> following error message :
> {code:borderStyle=solid}
> java.lang.IllegalArgumentException: requirement failed: No support for 
> unaligned Unsafe. Set spark.memory.offHeap.enabled to false.
> {code}
> These errors occur due to java.nio.Bits.unaligned(), which does not return 
> true for the ppc64le arch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20094) Putting predicate with IN subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr

2017-03-25 Thread Zhenhua Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-20094:
-
Summary: Putting predicate with IN subquery into join condition in 
ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr  (was: 
Putting predicate with subquery into join condition in ReorderJoin fails 
RewritePredicateSubquery.rewriteExistentialExpr)

> Putting predicate with IN subquery into join condition in ReorderJoin fails 
> RewritePredicateSubquery.rewriteExistentialExpr
> ---
>
> Key: SPARK-20094
> URL: https://issues.apache.org/jira/browse/SPARK-20094
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Zhenhua Wang
>
> ReorderJoin collects all predicates and try to put them into join condition 
> when creating ordered join. If a predicate with a subquery is in a join 
> condition instead of a filter condition, 
> `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the 
> subquery to an ExistenceJoin, and thus result in error.
> For example, tpcds q45 fails due to the above reason:
> {noformat}
> spark-sql> explain codegen
>  > SELECT
>  >   ca_zip,
>  >   ca_city,
>  >   sum(ws_sales_price)
>  > FROM web_sales, customer, customer_address, date_dim, item
>  > WHERE ws_bill_customer_sk = c_customer_sk
>  >   AND c_current_addr_sk = ca_address_sk
>  >   AND ws_item_sk = i_item_sk
>  >   AND (substr(ca_zip, 1, 5) IN
>  >   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', 
> '80348', '81792')
>  >   OR
>  >   i_item_id IN (SELECT i_item_id
>  >   FROM item
>  >   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
>  >   )
>  > )
>  >   AND ws_sold_date_sk = d_date_sk
>  >   AND d_qoy = 2 AND d_year = 2001
>  > GROUP BY ca_zip, ca_city
>  > ORDER BY ca_zip, ca_city
>  > LIMIT 100;
> 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen
>   
> SELECT
>   ca_zip,
>   ca_city,
>   sum(ws_sales_price)
> FROM web_sales, customer, customer_address, date_dim, item
> WHERE ws_bill_customer_sk = c_customer_sk
>   AND c_current_addr_sk = ca_address_sk
>   AND ws_item_sk = i_item_sk
>   AND (substr(ca_zip, 1, 5) IN
>   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', 
> '81792')
>   OR
>   i_item_id IN (SELECT i_item_id
>   FROM item
>   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
>   )
> )
>   AND ws_sold_date_sk = d_date_sk
>   AND d_qoy = 2 AND d_year = 2001
> GROUP BY ca_zip, ca_city
> ORDER BY ca_zip, ca_city
> LIMIT 100]
> java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 []
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
>   at 
> org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> 

[jira] [Updated] (SPARK-20094) Putting predicate with IN subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr

2017-03-25 Thread Zhenhua Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-20094:
-
Description: 
ReorderJoin collects all predicates and try to put them into join condition 
when creating ordered join. If a predicate with an IN subquery is in a join 
condition instead of a filter condition, 
`RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the 
subquery to an ExistenceJoin, and thus result in error.

For example, tpcds q45 fails due to the above reason:
{noformat}
spark-sql> explain codegen
 > SELECT
 >   ca_zip,
 >   ca_city,
 >   sum(ws_sales_price)
 > FROM web_sales, customer, customer_address, date_dim, item
 > WHERE ws_bill_customer_sk = c_customer_sk
 >   AND c_current_addr_sk = ca_address_sk
 >   AND ws_item_sk = i_item_sk
 >   AND (substr(ca_zip, 1, 5) IN
 >   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', 
'80348', '81792')
 >   OR
 >   i_item_id IN (SELECT i_item_id
 >   FROM item
 >   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
 >   )
 > )
 >   AND ws_sold_date_sk = d_date_sk
 >   AND d_qoy = 2 AND d_year = 2001
 > GROUP BY ca_zip, ca_city
 > ORDER BY ca_zip, ca_city
 > LIMIT 100;
17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen  
SELECT
  ca_zip,
  ca_city,
  sum(ws_sales_price)
FROM web_sales, customer, customer_address, date_dim, item
WHERE ws_bill_customer_sk = c_customer_sk
  AND c_current_addr_sk = ca_address_sk
  AND ws_item_sk = i_item_sk
  AND (substr(ca_zip, 1, 5) IN
  ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', 
'81792')
  OR
  i_item_id IN (SELECT i_item_id
  FROM item
  WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
  )
)
  AND ws_sold_date_sk = d_date_sk
  AND d_qoy = 2 AND d_year = 2001
GROUP BY ca_zip, ca_city
ORDER BY ca_zip, ca_city
LIMIT 100]
java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 []
at 
org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224)
at 
org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
at 
org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
at 
org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
at 
org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.getJoinCondition(BroadcastHashJoinExec.scala:174)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:199)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36)
at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68)
at 

[jira] [Assigned] (SPARK-20094) Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr

2017-03-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20094:


Assignee: Apache Spark

> Putting predicate with subquery into join condition in ReorderJoin fails 
> RewritePredicateSubquery.rewriteExistentialExpr
> 
>
> Key: SPARK-20094
> URL: https://issues.apache.org/jira/browse/SPARK-20094
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Zhenhua Wang
>Assignee: Apache Spark
>
> ReorderJoin collects all predicates and try to put them into join condition 
> when creating ordered join. If a predicate with a subquery is in a join 
> condition instead of a filter condition, 
> `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the 
> subquery to an ExistenceJoin, and thus result in error.
> For example, tpcds q45 fails due to the above reason:
> {noformat}
> spark-sql> explain codegen
>  > SELECT
>  >   ca_zip,
>  >   ca_city,
>  >   sum(ws_sales_price)
>  > FROM web_sales, customer, customer_address, date_dim, item
>  > WHERE ws_bill_customer_sk = c_customer_sk
>  >   AND c_current_addr_sk = ca_address_sk
>  >   AND ws_item_sk = i_item_sk
>  >   AND (substr(ca_zip, 1, 5) IN
>  >   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', 
> '80348', '81792')
>  >   OR
>  >   i_item_id IN (SELECT i_item_id
>  >   FROM item
>  >   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
>  >   )
>  > )
>  >   AND ws_sold_date_sk = d_date_sk
>  >   AND d_qoy = 2 AND d_year = 2001
>  > GROUP BY ca_zip, ca_city
>  > ORDER BY ca_zip, ca_city
>  > LIMIT 100;
> 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen
>   
> SELECT
>   ca_zip,
>   ca_city,
>   sum(ws_sales_price)
> FROM web_sales, customer, customer_address, date_dim, item
> WHERE ws_bill_customer_sk = c_customer_sk
>   AND c_current_addr_sk = ca_address_sk
>   AND ws_item_sk = i_item_sk
>   AND (substr(ca_zip, 1, 5) IN
>   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', 
> '81792')
>   OR
>   i_item_id IN (SELECT i_item_id
>   FROM item
>   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
>   )
> )
>   AND ws_sold_date_sk = d_date_sk
>   AND d_qoy = 2 AND d_year = 2001
> GROUP BY ca_zip, ca_city
> ORDER BY ca_zip, ca_city
> LIMIT 100]
> java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 []
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
>   at 
> org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> 

[jira] [Commented] (SPARK-20094) Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr

2017-03-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941625#comment-15941625
 ] 

Apache Spark commented on SPARK-20094:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/17428

> Putting predicate with subquery into join condition in ReorderJoin fails 
> RewritePredicateSubquery.rewriteExistentialExpr
> 
>
> Key: SPARK-20094
> URL: https://issues.apache.org/jira/browse/SPARK-20094
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Zhenhua Wang
>
> ReorderJoin collects all predicates and try to put them into join condition 
> when creating ordered join. If a predicate with a subquery is in a join 
> condition instead of a filter condition, 
> `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the 
> subquery to an ExistenceJoin, and thus result in error.
> For example, tpcds q45 fails due to the above reason:
> {noformat}
> spark-sql> explain codegen
>  > SELECT
>  >   ca_zip,
>  >   ca_city,
>  >   sum(ws_sales_price)
>  > FROM web_sales, customer, customer_address, date_dim, item
>  > WHERE ws_bill_customer_sk = c_customer_sk
>  >   AND c_current_addr_sk = ca_address_sk
>  >   AND ws_item_sk = i_item_sk
>  >   AND (substr(ca_zip, 1, 5) IN
>  >   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', 
> '80348', '81792')
>  >   OR
>  >   i_item_id IN (SELECT i_item_id
>  >   FROM item
>  >   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
>  >   )
>  > )
>  >   AND ws_sold_date_sk = d_date_sk
>  >   AND d_qoy = 2 AND d_year = 2001
>  > GROUP BY ca_zip, ca_city
>  > ORDER BY ca_zip, ca_city
>  > LIMIT 100;
> 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen
>   
> SELECT
>   ca_zip,
>   ca_city,
>   sum(ws_sales_price)
> FROM web_sales, customer, customer_address, date_dim, item
> WHERE ws_bill_customer_sk = c_customer_sk
>   AND c_current_addr_sk = ca_address_sk
>   AND ws_item_sk = i_item_sk
>   AND (substr(ca_zip, 1, 5) IN
>   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', 
> '81792')
>   OR
>   i_item_id IN (SELECT i_item_id
>   FROM item
>   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
>   )
> )
>   AND ws_sold_date_sk = d_date_sk
>   AND d_qoy = 2 AND d_year = 2001
> GROUP BY ca_zip, ca_city
> ORDER BY ca_zip, ca_city
> LIMIT 100]
> java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 []
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
>   at 
> org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> 

[jira] [Assigned] (SPARK-20094) Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr

2017-03-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20094:


Assignee: (was: Apache Spark)

> Putting predicate with subquery into join condition in ReorderJoin fails 
> RewritePredicateSubquery.rewriteExistentialExpr
> 
>
> Key: SPARK-20094
> URL: https://issues.apache.org/jira/browse/SPARK-20094
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Zhenhua Wang
>
> ReorderJoin collects all predicates and try to put them into join condition 
> when creating ordered join. If a predicate with a subquery is in a join 
> condition instead of a filter condition, 
> `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the 
> subquery to an ExistenceJoin, and thus result in error.
> For example, tpcds q45 fails due to the above reason:
> {noformat}
> spark-sql> explain codegen
>  > SELECT
>  >   ca_zip,
>  >   ca_city,
>  >   sum(ws_sales_price)
>  > FROM web_sales, customer, customer_address, date_dim, item
>  > WHERE ws_bill_customer_sk = c_customer_sk
>  >   AND c_current_addr_sk = ca_address_sk
>  >   AND ws_item_sk = i_item_sk
>  >   AND (substr(ca_zip, 1, 5) IN
>  >   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', 
> '80348', '81792')
>  >   OR
>  >   i_item_id IN (SELECT i_item_id
>  >   FROM item
>  >   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
>  >   )
>  > )
>  >   AND ws_sold_date_sk = d_date_sk
>  >   AND d_qoy = 2 AND d_year = 2001
>  > GROUP BY ca_zip, ca_city
>  > ORDER BY ca_zip, ca_city
>  > LIMIT 100;
> 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen
>   
> SELECT
>   ca_zip,
>   ca_city,
>   sum(ws_sales_price)
> FROM web_sales, customer, customer_address, date_dim, item
> WHERE ws_bill_customer_sk = c_customer_sk
>   AND c_current_addr_sk = ca_address_sk
>   AND ws_item_sk = i_item_sk
>   AND (substr(ca_zip, 1, 5) IN
>   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', 
> '81792')
>   OR
>   i_item_id IN (SELECT i_item_id
>   FROM item
>   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
>   )
> )
>   AND ws_sold_date_sk = d_date_sk
>   AND d_qoy = 2 AND d_year = 2001
> GROUP BY ca_zip, ca_city
> ORDER BY ca_zip, ca_city
> LIMIT 100]
> java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 []
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
>   at 
> org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> 

[jira] [Created] (SPARK-20094) Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr

2017-03-25 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-20094:


 Summary: Putting predicate with subquery into join condition in 
ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr
 Key: SPARK-20094
 URL: https://issues.apache.org/jira/browse/SPARK-20094
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Zhenhua Wang


ReorderJoin collects all predicates and try to put them into join condition 
when creating ordered join. If a predicate with a subquery is in a join 
condition instead of a filter condition, 
`RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the 
subquery to an ExistenceJoin, and thus result in error.

For example, tpcds q45 fails due to the above reason:
{noformat}
spark-sql> explain codegen
 > SELECT
 >   ca_zip,
 >   ca_city,
 >   sum(ws_sales_price)
 > FROM web_sales, customer, customer_address, date_dim, item
 > WHERE ws_bill_customer_sk = c_customer_sk
 >   AND c_current_addr_sk = ca_address_sk
 >   AND ws_item_sk = i_item_sk
 >   AND (substr(ca_zip, 1, 5) IN
 >   ('85669', '86197', '88274', '83405', '86475', '85392', '85460', 
'80348', '81792')
 >   OR
 >   i_item_id IN (SELECT i_item_id
 >   FROM item
 >   WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
 >   )
 > )
 >   AND ws_sold_date_sk = d_date_sk
 >   AND d_qoy = 2 AND d_year = 2001
 > GROUP BY ca_zip, ca_city
 > ORDER BY ca_zip, ca_city
 > LIMIT 100;
17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen  
SELECT
  ca_zip,
  ca_city,
  sum(ws_sales_price)
FROM web_sales, customer, customer_address, date_dim, item
WHERE ws_bill_customer_sk = c_customer_sk
  AND c_current_addr_sk = ca_address_sk
  AND ws_item_sk = i_item_sk
  AND (substr(ca_zip, 1, 5) IN
  ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', 
'81792')
  OR
  i_item_id IN (SELECT i_item_id
  FROM item
  WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
  )
)
  AND ws_sold_date_sk = d_date_sk
  AND d_qoy = 2 AND d_year = 2001
GROUP BY ca_zip, ca_city
ORDER BY ca_zip, ca_city
LIMIT 100]
java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 []
at 
org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224)
at 
org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
at 
org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
at 
org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
at 
org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.getJoinCondition(BroadcastHashJoinExec.scala:174)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:199)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 

[jira] [Created] (SPARK-20093) Exception when Joining dataframe with another dataframe generated by applying groupBy transformation on original one

2017-03-25 Thread Hosur Narahari (JIRA)
Hosur Narahari created SPARK-20093:
--

 Summary: Exception when Joining dataframe with another dataframe 
generated by applying groupBy transformation on original one
 Key: SPARK-20093
 URL: https://issues.apache.org/jira/browse/SPARK-20093
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0, 2.2.0
Reporter: Hosur Narahari


When we generate a dataframe by doing grouping, and perform join on original 
dataframe with aggregate column, we get AnalysisException. Below I've attached 
a piece of code and resulting exception to reproduce.

Code:

import org.apache.spark.sql.SparkSession


object App {

  lazy val spark = 
SparkSession.builder.appName("Test").master("local").getOrCreate

  def main(args: Array[String]): Unit = {
test1
  }

  private def test1 {
import org.apache.spark.sql.functions._
val df = spark.createDataFrame(Seq(("M",172,60), ("M", 170, 60), ("F", 155, 
56), ("M", 160, 55), ("F", 150, 53))).toDF("gender", "height", "weight")
val groupDF = df.groupBy("gender").agg(min("height").as("height"))
groupDF.show()
val out = groupDF.join(df, groupDF("height") <=> 
df("height")).select(df("gender"), df("height"), df("weight"))
out.show
  }
}

When I ran above code, I got below exception:

Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved 
attribute(s) height#8 missing from 
height#19,height#30,gender#29,weight#31,gender#7 in operator !Join Inner, 
(height#19 <=> height#8);;
!Join Inner, (height#19 <=> height#8)
:- Aggregate [gender#7], [gender#7, min(height#8) AS height#19]
:  +- Project [_1#0 AS gender#7, _2#1 AS height#8, _3#2 AS weight#9]
: +- LocalRelation [_1#0, _2#1, _3#2]
+- Project [_1#0 AS gender#29, _2#1 AS height#30, _3#2 AS weight#31]
   +- LocalRelation [_1#0, _2#1, _3#2]

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:90)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:342)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:90)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:53)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2831)
at org.apache.spark.sql.Dataset.join(Dataset.scala:843)
at org.apache.spark.sql.Dataset.join(Dataset.scala:807)
at App$.test1(App.scala:17)
at App$.main(App.scala:9)
at App.main(App.scala)

Please someone look into it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19518) IGNORE NULLS in first_value / last_value should be supported in SQL statements

2017-03-25 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941605#comment-15941605
 ] 

Hyukjin Kwon commented on SPARK-19518:
--

[~sabhyankar], Do you mind if I ask you are currently working on this? I am 
willing to take over this.

> IGNORE NULLS in first_value / last_value should be supported in SQL statements
> --
>
> Key: SPARK-19518
> URL: https://issues.apache.org/jira/browse/SPARK-19518
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ferenc Erdelyi
>
> https://issues.apache.org/jira/browse/SPARK-13049 was implemented in Spark2, 
> however it does not work in SQL statements as it is not implemented in Hive 
> yet: https://issues.apache.org/jira/browse/HIVE-11189



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API

2017-03-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20092:


Assignee: Apache Spark

> Trigger AppVeyor R tests for changes in Scala code related with R API
> -
>
> Key: SPARK-20092
> URL: https://issues.apache.org/jira/browse/SPARK-20092
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> We are currently detecting the changes in {{./R}} directory and then trigger 
> AppVeyor tests.
> It seems we need to tests when there are some changes in 
> {{./core/src/main/scala/org/apache/spark/r}} and 
> {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API

2017-03-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941594#comment-15941594
 ] 

Apache Spark commented on SPARK-20092:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/17427

> Trigger AppVeyor R tests for changes in Scala code related with R API
> -
>
> Key: SPARK-20092
> URL: https://issues.apache.org/jira/browse/SPARK-20092
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> We are currently detecting the changes in {{./R}} directory and then trigger 
> AppVeyor tests.
> It seems we need to tests when there are some changes in 
> {{./core/src/main/scala/org/apache/spark/r}} and 
> {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API

2017-03-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20092:


Assignee: (was: Apache Spark)

> Trigger AppVeyor R tests for changes in Scala code related with R API
> -
>
> Key: SPARK-20092
> URL: https://issues.apache.org/jira/browse/SPARK-20092
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> We are currently detecting the changes in {{./R}} directory and then trigger 
> AppVeyor tests.
> It seems we need to tests when there are some changes in 
> {{./core/src/main/scala/org/apache/spark/r}} and 
> {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API

2017-03-25 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20092:
-
Component/s: Project Infra

> Trigger AppVeyor R tests for changes in Scala code related with R API
> -
>
> Key: SPARK-20092
> URL: https://issues.apache.org/jira/browse/SPARK-20092
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> We are currently detecting the changes in {{./R}} directory and then trigger 
> AppVeyor tests.
> It seems we need to tests when there are some changes in 
> {{./core/src/main/scala/org/apache/spark/r}} and 
> {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API

2017-03-25 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-20092:


 Summary: Trigger AppVeyor R tests for changes in Scala code 
related with R API
 Key: SPARK-20092
 URL: https://issues.apache.org/jira/browse/SPARK-20092
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 2.2.0
Reporter: Hyukjin Kwon
Priority: Minor


We are currently detecting the changes in {{./R}} directory and then trigger 
AppVeyor tests.

It seems we need to tests when there are some changes in 
{{./core/src/main/scala/org/apache/spark/r}} and 
{{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org