[jira] [Commented] (SPARK-15665) spark-submit --kill and --status are not working
[ https://issues.apache.org/jira/browse/SPARK-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942141#comment-15942141 ] Devaraj K commented on SPARK-15665: --- [~samuel-soubeyran], This issue has been resolved, Please create another jira if you see any other problems. > spark-submit --kill and --status are not working > - > > Key: SPARK-15665 > URL: https://issues.apache.org/jira/browse/SPARK-15665 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Devaraj K >Assignee: Devaraj K > Fix For: 2.0.0 > > > {code:xml} > [devaraj@server2 spark-master]$ ./bin/spark-submit --kill > driver-20160531171222- --master spark://xx.xx.xx.xx:6066 > Exception in thread "main" java.lang.IllegalArgumentException: Missing > application resource. > at > org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) > at org.apache.spark.launcher.Main.main(Main.java:86) > {code} > {code:xml} > [devaraj@server2 spark-master]$ ./bin/spark-submit --status > driver-20160531171222- --master spark://xx.xx.xx.xx:6066 > Exception in thread "main" java.lang.IllegalArgumentException: Missing > application resource. > at > org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) > at org.apache.spark.launcher.Main.main(Main.java:86) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20043) Decision Tree loader does not handle uppercase impurity param values
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20043: -- Affects Version/s: 2.2.0 > Decision Tree loader does not handle uppercase impurity param values > > > Key: SPARK-20043 > URL: https://issues.apache.org/jira/browse/SPARK-20043 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0, 2.2.0 >Reporter: Zied Sellami > Labels: starter > > I saved a CrossValidatorModel with a decision tree and a random forest. I use > Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not > able to load the saved model, when impurity are written not in lowercase. I > obtain an error from Spark "impurity Gini (Entropy) not recognized. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20043) Decision Tree loader does not handle uppercase impurity param values
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20043: -- Target Version/s: 2.1.1, 2.2.0 > Decision Tree loader does not handle uppercase impurity param values > > > Key: SPARK-20043 > URL: https://issues.apache.org/jira/browse/SPARK-20043 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0, 2.2.0 >Reporter: Zied Sellami > Labels: starter > > I saved a CrossValidatorModel with a decision tree and a random forest. I use > Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not > able to load the saved model, when impurity are written not in lowercase. I > obtain an error from Spark "impurity Gini (Entropy) not recognized. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20043) Decision Tree loader does not handle uppercase impurity param values
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20043: -- Shepherd: Joseph K. Bradley > Decision Tree loader does not handle uppercase impurity param values > > > Key: SPARK-20043 > URL: https://issues.apache.org/jira/browse/SPARK-20043 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0, 2.2.0 >Reporter: Zied Sellami > Labels: starter > > I saved a CrossValidatorModel with a decision tree and a random forest. I use > Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not > able to load the saved model, when impurity are written not in lowercase. I > obtain an error from Spark "impurity Gini (Entropy) not recognized. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20043) Decision Tree loader does not handle uppercase impurity param values
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20043: -- Summary: Decision Tree loader does not handle uppercase impurity param values (was: Decision Tree loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted) > Decision Tree loader does not handle uppercase impurity param values > > > Key: SPARK-20043 > URL: https://issues.apache.org/jira/browse/SPARK-20043 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0, 2.2.0 >Reporter: Zied Sellami > Labels: starter > > I saved a CrossValidatorModel with a decision tree and a random forest. I use > Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not > able to load the saved model, when impurity are written not in lowercase. I > obtain an error from Spark "impurity Gini (Entropy) not recognized. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20043) Decision Tree loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20043: -- Summary: Decision Tree loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted (was: CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted) > Decision Tree loader does not recognize impurity "Gini" and "Entropy" on ML > random forest and decision. Only "gini" and "entropy" (in lower case) are > accepted > -- > > Key: SPARK-20043 > URL: https://issues.apache.org/jira/browse/SPARK-20043 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0, 2.2.0 >Reporter: Zied Sellami > Labels: starter > > I saved a CrossValidatorModel with a decision tree and a random forest. I use > Paramgrid to test "gini" and "entropy" impurity. CrossValidatorModel are not > able to load the saved model, when impurity are written not in lowercase. I > obtain an error from Spark "impurity Gini (Entropy) not recognized. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20099) Add transformSchema to pyspark.ml
[ https://issues.apache.org/jira/browse/SPARK-20099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942082#comment-15942082 ] Joseph K. Bradley commented on SPARK-20099: --- Linking [SPARK-15574] since it brought up a need for transformSchema in pyspark.ml as well > Add transformSchema to pyspark.ml > - > > Key: SPARK-20099 > URL: https://issues.apache.org/jira/browse/SPARK-20099 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.1.0 >Reporter: Joseph K. Bradley > > Python's ML API currently lacks the PipelineStage abstraction. This > abstraction's main purpose is to provide transformSchema() for checking for > early failures in a Pipeline. > As mentioned in https://github.com/apache/spark/pull/17218 it would also be > useful in Python for checking Params in Python wrapper for Scala > implementations; in these, transformSchema would involve passing Params in > Python to Scala, which would then be able to validate the Param values. This > could prevent late failures from bad Param settings in Pipeline execution, > while still allowing us to check Param values on only the Scala side. > This issue is for adding transformSchema() to pyspark.ml. If it's > reasonable, we could create a PipelineStage abstraction. But it'd probably > be fine to add transformSchema() directly to Transformer and Estimator, > rather than creating PipelineStage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20099) Add transformSchema to pyspark.ml
Joseph K. Bradley created SPARK-20099: - Summary: Add transformSchema to pyspark.ml Key: SPARK-20099 URL: https://issues.apache.org/jira/browse/SPARK-20099 Project: Spark Issue Type: Improvement Components: ML, PySpark Affects Versions: 2.1.0 Reporter: Joseph K. Bradley Python's ML API currently lacks the PipelineStage abstraction. This abstraction's main purpose is to provide transformSchema() for checking for early failures in a Pipeline. As mentioned in https://github.com/apache/spark/pull/17218 it would also be useful in Python for checking Params in Python wrapper for Scala implementations; in these, transformSchema would involve passing Params in Python to Scala, which would then be able to validate the Param values. This could prevent late failures from bad Param settings in Pipeline execution, while still allowing us to check Param values on only the Scala side. This issue is for adding transformSchema() to pyspark.ml. If it's reasonable, we could create a PipelineStage abstraction. But it'd probably be fine to add transformSchema() directly to Transformer and Estimator, rather than creating PipelineStage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15665) spark-submit --kill and --status are not working
[ https://issues.apache.org/jira/browse/SPARK-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941869#comment-15941869 ] Samuel Soubeyran commented on SPARK-15665: -- When trying to do the same thing using the SparkLauncher class, it doesn't work: new SparkLauncher(sparkEnvMap).setSparkHome(sparkHome).addSparkArg("--kill", submissionId).launch(). This is because SparkLauncher call the empty constructor in SparkSubmitCommandBuilder, bypassing all the logic to handle the kill and status command present in the constructor SparkSubmitCommandBuilder(List args). I also want to make the case that this logic shouldn't be present in the constructor in the first place since it's a builder pattern. Instead it should be in the buildSparkSubmitCommand. The default should be the empty constructor, the full constructor is just a shortcut. Else what the point of having a builder pattern in the first place. I'd be happy to send a PR to solve this, Also there might be a easier way to kill/get the status of a job (cluster mode) but i couldn't figure it out. Thanks, Sam > spark-submit --kill and --status are not working > - > > Key: SPARK-15665 > URL: https://issues.apache.org/jira/browse/SPARK-15665 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Devaraj K >Assignee: Devaraj K > Fix For: 2.0.0 > > > {code:xml} > [devaraj@server2 spark-master]$ ./bin/spark-submit --kill > driver-20160531171222- --master spark://xx.xx.xx.xx:6066 > Exception in thread "main" java.lang.IllegalArgumentException: Missing > application resource. > at > org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) > at org.apache.spark.launcher.Main.main(Main.java:86) > {code} > {code:xml} > [devaraj@server2 spark-master]$ ./bin/spark-submit --status > driver-20160531171222- --master spark://xx.xx.xx.xx:6066 > Exception in thread "main" java.lang.IllegalArgumentException: Missing > application resource. > at > org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) > at org.apache.spark.launcher.Main.main(Main.java:86) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17670) Spark DataFrame/Dataset no longer supports Option[Map] in case classes
[ https://issues.apache.org/jira/browse/SPARK-17670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941820#comment-15941820 ] SaschaC commented on SPARK-17670: - I can confirm that this bug is for real and it is a major issue. Depending on how I varied my case classes I sometimes did NOT get a mismatch on Maps but instead of Lists. I used avrohugger to produce case classes from avro schemas. I have case classes that extend from SpecificRecordBase and varied whether the case classes would use Array, immutable.List as well as Map, collections.Map or collections.immutable.Map. The collection types both lists and maps are optional and sometimes spark complains a mismatch on lists but accepts the maps or vice versa, but I could never get spark to accept my schema, regardless of which case class varient I tried. > Spark DataFrame/Dataset no longer supports Option[Map] in case classes > -- > > Key: SPARK-17670 > URL: https://issues.apache.org/jira/browse/SPARK-17670 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Daniel Williams > > Upon upgrading to Spark 2.0 I discovered that previously supported case > classes containing members of the type Option[Map] of any key/value binding, > mutable or immutable, were no longer supported and produced an exception > similar to the following. Upon further testing I also noticed that Option > was support for Seq, case classes, and primitives. Validating unit tests > included using spark-testing-base. > {code} > org.apache.spark.sql.AnalysisException: cannot resolve > 'wrapoption(staticinvoke(class > org.apache.spark.sql.catalyst.util.ArrayBasedMapData$, ObjectType(interface > scala.collection.Map), toScalaMap, mapobjects(MapObjects_loopValue32, > MapObjects_loopIsNull33, StringType, lambdavariable(MapObjects_loopValue32, > MapObjects_loopIsNull33, StringType).toString, > cast(lambdavariable(MapObjects_loopValue30, MapObjects_loopIsNull31, > StructField(uuid,StringType,true), StructField(timestamp,TimestampType,true), > StructField(sourceSystem,StringType,true), > StructField(input,MapType(StringType,StringType,true),true)).input as > map).keyArray).array, mapobjects(MapObjects_loopValue34, > MapObjects_loopIsNull35, StringType, lambdavariable(MapObjects_loopValue34, > MapObjects_loopIsNull35, StringType).toString, > cast(lambdavariable(MapObjects_loopValue30, MapObjects_loopIsNull31, > StructField(uuid,StringType,true), StructField(timestamp,TimestampType,true), > StructField(sourceSystem,StringType,true), > StructField(input,MapType(StringType,StringType,true),true)).input as > map ).valueArray).array, true), ObjectType(interface > scala.collection.immutable.Map))' due to data type mismatch: argument 1 > requires scala.collection.immutable.Map type, however, 'staticinvoke(class > org.apache.spark.sql.catalyst.util.ArrayBasedMapData$, ObjectType(interface > scala.collection.Map), toScalaMap, mapobjects(MapObjects_loopValue32, > MapObjects_loopIsNull33, StringType, lambdavariable(MapObjects_loopValue32, > MapObjects_loopIsNull33, StringType).toString, > cast(lambdavariable(MapObjects_loopValue30, MapObjects_loopIsNull31, > StructField(uuid,StringType,true), StructField(timestamp,TimestampType,true), > StructField(sourceSystem,StringType,true), > StructField(input,MapType(StringType,StringType,true),true)).input as > map ).keyArray).array, mapobjects(MapObjects_loopValue34, > MapObjects_loopIsNull35, StringType, lambdavariable(MapObjects_loopValue34, > MapObjects_loopIsNull35, StringType).toString, > cast(lambdavariable(MapObjects_loopValue30, MapObjects_loopIsNull31, > StructField(uuid,StringType,true), StructField(timestamp,TimestampType,true), > StructField(sourceSystem,StringType,true), > StructField(input,MapType(StringType,StringType,true),true)).input as > map ).valueArray).array, true)' is of scala.collection.Map > type.; > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:82) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) > {code} > Unit tests: > {code} > import com.holdenkarau.spark.testing.{DataFrameSuiteBase, SharedSparkContext} > import org.scalatest.{Matchers, FunSuite} > import org.slf4j.LoggerFactory > import scala.util.{Failure, Try, Success} > case class ImmutableMapTest(data: Map[String, String]) > case class MapTest(data:
[jira] [Resolved] (SPARK-17137) Add compressed support for multinomial logistic regression coefficients
[ https://issues.apache.org/jira/browse/SPARK-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-17137. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17426 [https://github.com/apache/spark/pull/17426] > Add compressed support for multinomial logistic regression coefficients > --- > > Key: SPARK-17137 > URL: https://issues.apache.org/jira/browse/SPARK-17137 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Seth Hendrickson >Assignee: Seth Hendrickson >Priority: Minor > Fix For: 2.2.0 > > > For sparse coefficients in MLOR, such as when high L1 regularization, it may > be more efficient to store coefficients in compressed format. We can add this > option to MLOR and perhaps to do some performance tests to verify > improvements. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20060) Support Standalone visiting secured HDFS
[ https://issues.apache.org/jira/browse/SPARK-20060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-20060: - Description: h1. Brief design h2. Introductions The basic issue for Standalone mode to visit kerberos secured HDFS or other kerberized Services is how to gather the delegated tokens on the driver side and deliver them to the executor side. When we run Spark on Yarn, we set the tokens to the container launch context to deliver them automatically and for long-term running issue caused by token expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS and updating the credential file and renewing them over and over. When run Spark On Standalone, we currently have no implementations like Yarn to get and deliver those tokens. h2. Implementations Firstly, we simply move the implementation of SPARK-14743 which is only for yarn to core module. And we use it to gather the credentials we need, and also we use it to update and renew with credential files on HDFS. Secondly, credential files on secured HDFS are reachable for executors before they get the tokens. Here we add a sequence configuration `spark.deploy.credential. entities` which is used by the driver to put `token.encodeToUrlString()` before launching the executors, and used by the executors to fetch the credential as a string sequence during fetching the driver side spark properties, and then decode them to tokens. Before setting up the `CoarseGrainedExecutorBackend` we set the credentials to current executor side ugi. was:For **Spark on non-Yarn** mode on a kerberized hdfs, we don't obtain credentials from hive metastore, hdfs, etc and just use the local kinited user to connecting them. But if we specify the --proxy-user argument on non-yarn mode, such as local, standalone, after we simply use `UGI.createProxyUser` to get a proxy ugi as the effective user and wrap the code in doAs, the proxy ugi fails to talk to hive metastore cause by no crendentials. Thus, we need to obtain credentials via the real user and add them to the proxy ugi. Component/s: (was: Spark Submit) Spark Core Issue Type: New Feature (was: Bug) Summary: Support Standalone visiting secured HDFS (was: Spark On Non-Yarn Mode with Kerberized HDFS ProxyUser Fails Talking to Hive MetaStore ) > Support Standalone visiting secured HDFS > - > > Key: SPARK-20060 > URL: https://issues.apache.org/jira/browse/SPARK-20060 > Project: Spark > Issue Type: New Feature > Components: Deploy, Spark Core >Affects Versions: 2.2.0 >Reporter: Kent Yao > > h1. Brief design > h2. Introductions > The basic issue for Standalone mode to visit kerberos secured HDFS or other > kerberized Services is how to gather the delegated tokens on the driver side > and deliver them to the executor side. > When we run Spark on Yarn, we set the tokens to the container launch context > to deliver them automatically and for long-term running issue caused by token > expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS > and updating the credential file and renewing them over and over. > When run Spark On Standalone, we currently have no implementations like Yarn > to get and deliver those tokens. > h2. Implementations > Firstly, we simply move the implementation of SPARK-14743 which is only for > yarn to core module. And we use it to gather the credentials we need, and > also we use it to update and renew with credential files on HDFS. > Secondly, credential files on secured HDFS are reachable for executors before > they get the tokens. Here we add a sequence configuration > `spark.deploy.credential. entities` which is used by the driver to put > `token.encodeToUrlString()` before launching the executors, and used by the > executors to fetch the credential as a string sequence during fetching the > driver side spark properties, and then decode them to tokens. Before setting > up the `CoarseGrainedExecutorBackend` we set the credentials to current > executor side ugi. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20098) DataType's typeName method returns with 'StructF' in case of StructField
Peter Szalai created SPARK-20098: Summary: DataType's typeName method returns with 'StructF' in case of StructField Key: SPARK-20098 URL: https://issues.apache.org/jira/browse/SPARK-20098 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.1.0 Reporter: Peter Szalai Currently, if you want to get the name of a DateType and the DateType is a `StructField`, you get `StructF`. http://spark.apache.org/docs/2.1.0/api/python/_modules/pyspark/sql/types.html -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19674) Ignore driver accumulator updates don't belong to the execution when merging all accumulator updates
[ https://issues.apache.org/jira/browse/SPARK-19674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-19674: Fix Version/s: 2.1.1 > Ignore driver accumulator updates don't belong to the execution when merging > all accumulator updates > > > Key: SPARK-19674 > URL: https://issues.apache.org/jira/browse/SPARK-19674 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Carson Wang >Assignee: Carson Wang >Priority: Minor > Fix For: 2.1.1, 2.2.0 > > > In SQLListener.getExecutionMetrics, driver accumulator updates don't belong > to the execution should be ignored when merging all accumulator updates to > prevent NoSuchElementException. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20096) Expose the real queue name not null while using --verbose
[ https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941712#comment-15941712 ] Kent Yao commented on SPARK-20096: -- [~sowen] example added > Expose the real queue name not null while using --verbose > - > > Key: SPARK-20096 > URL: https://issues.apache.org/jira/browse/SPARK-20096 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.2.0 >Reporter: Kent Yao >Priority: Minor > > while submit apps with -v or --verbose, we can print the right queue name, > but if we set a queue name with `spark.yarn.queue` by --conf or in the > spark-default.conf, we just got `null` for the queue in Parsed arguments. > {code} > bin/spark-shell -v --conf spark.yarn.queue=thequeue > Using properties file: > /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf > > Adding default property: spark.yarn.queue=default > Parsed arguments: > master yarn > deployMode client > ... > queue null > > verbose true > Spark properties used, including those specified through > --conf and those from the properties file > /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf: > spark.yarn.queue -> thequeue > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20096) Expose the real queue name not null while using --verbose
[ https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-20096: - Description: while submit apps with -v or --verbose, we can print the right queue name, but if we set a queue name with `spark.yarn.queue` by --conf or in the spark-default.conf, we just got `null` for the queue in Parsed arguments. {code} bin/spark-shell -v --conf spark.yarn.queue=thequeue Using properties file: /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf Adding default property: spark.yarn.queue=default Parsed arguments: master yarn deployMode client ... queue null verbose true Spark properties used, including those specified through --conf and those from the properties file /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf: spark.yarn.queue -> thequeue {code} was:while submit apps with -v or --verbose, we can print the right queue name, but if we set a queue name with `spark.yarn.queue` by --conf or in the spark-default.conf, we just got `null` > Expose the real queue name not null while using --verbose > - > > Key: SPARK-20096 > URL: https://issues.apache.org/jira/browse/SPARK-20096 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.2.0 >Reporter: Kent Yao >Priority: Minor > > while submit apps with -v or --verbose, we can print the right queue name, > but if we set a queue name with `spark.yarn.queue` by --conf or in the > spark-default.conf, we just got `null` for the queue in Parsed arguments. > {code} > bin/spark-shell -v --conf spark.yarn.queue=thequeue > Using properties file: > /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf > > Adding default property: spark.yarn.queue=default > Parsed arguments: > master yarn > deployMode client > ... > queue null > > verbose true > Spark properties used, including those specified through > --conf and those from the properties file > /home/hadoop/spark-2.1.0-bin-apache-hdp2.7.3/conf/spark-defaults.conf: > spark.yarn.queue -> thequeue > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR
[ https://issues.apache.org/jira/browse/SPARK-20097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20097: Assignee: Apache Spark > Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and > GLR > --- > > Key: SPARK-20097 > URL: https://issues.apache.org/jira/browse/SPARK-20097 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Benjamin Fradet >Assignee: Apache Spark >Priority: Trivial > > - numInstances is public in lr and regression private in glr > - degreesOfFreedom is private in lr and public in glr -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR
[ https://issues.apache.org/jira/browse/SPARK-20097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20097: Assignee: (was: Apache Spark) > Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and > GLR > --- > > Key: SPARK-20097 > URL: https://issues.apache.org/jira/browse/SPARK-20097 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Benjamin Fradet >Priority: Trivial > > - numInstances is public in lr and regression private in glr > - degreesOfFreedom is private in lr and public in glr -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR
[ https://issues.apache.org/jira/browse/SPARK-20097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941705#comment-15941705 ] Apache Spark commented on SPARK-20097: -- User 'BenFradet' has created a pull request for this issue: https://github.com/apache/spark/pull/17431 > Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and > GLR > --- > > Key: SPARK-20097 > URL: https://issues.apache.org/jira/browse/SPARK-20097 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Benjamin Fradet >Priority: Trivial > > - numInstances is public in lr and regression private in glr > - degreesOfFreedom is private in lr and public in glr -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR
Benjamin Fradet created SPARK-20097: --- Summary: Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR Key: SPARK-20097 URL: https://issues.apache.org/jira/browse/SPARK-20097 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.1.0 Reporter: Benjamin Fradet Priority: Trivial - numInstances is public in lr and regression private in glr - degreesOfFreedom is private in lr and public in glr -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20096) Expose the real queue name not null while using --verbose
[ https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20096: Assignee: Apache Spark > Expose the real queue name not null while using --verbose > - > > Key: SPARK-20096 > URL: https://issues.apache.org/jira/browse/SPARK-20096 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.2.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Minor > > while submit apps with -v or --verbose, we can print the right queue name, > but if we set a queue name with `spark.yarn.queue` by --conf or in the > spark-default.conf, we just got `null` -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20096) Expose the real queue name not null while using --verbose
[ https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941701#comment-15941701 ] Apache Spark commented on SPARK-20096: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/17430 > Expose the real queue name not null while using --verbose > - > > Key: SPARK-20096 > URL: https://issues.apache.org/jira/browse/SPARK-20096 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.2.0 >Reporter: Kent Yao >Priority: Minor > > while submit apps with -v or --verbose, we can print the right queue name, > but if we set a queue name with `spark.yarn.queue` by --conf or in the > spark-default.conf, we just got `null` -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20096) Expose the real queue name not null while using --verbose
[ https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20096: Assignee: (was: Apache Spark) > Expose the real queue name not null while using --verbose > - > > Key: SPARK-20096 > URL: https://issues.apache.org/jira/browse/SPARK-20096 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.2.0 >Reporter: Kent Yao >Priority: Minor > > while submit apps with -v or --verbose, we can print the right queue name, > but if we set a queue name with `spark.yarn.queue` by --conf or in the > spark-default.conf, we just got `null` -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20096) Expose the real queue name not null while using --verbose
[ https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941699#comment-15941699 ] Sean Owen commented on SPARK-20096: --- got null where? This needs more detail. > Expose the real queue name not null while using --verbose > - > > Key: SPARK-20096 > URL: https://issues.apache.org/jira/browse/SPARK-20096 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.2.0 >Reporter: Kent Yao >Priority: Minor > > while submit apps with -v or --verbose, we can print the right queue name, > but if we set a queue name with `spark.yarn.queue` by --conf or in the > spark-default.conf, we just got `null` -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20096) Expose the real queue name not null while using --verbose
[ https://issues.apache.org/jira/browse/SPARK-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-20096: - Summary: Expose the real queue name not null while using --verbose (was: Expose the real queue name not null using --verbose) > Expose the real queue name not null while using --verbose > - > > Key: SPARK-20096 > URL: https://issues.apache.org/jira/browse/SPARK-20096 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.2.0 >Reporter: Kent Yao >Priority: Minor > > while submit apps with -v or --verbose, we can print the right queue name, > but if we set a queue name with `spark.yarn.queue` by --conf or in the > spark-default.conf, we just got `null` -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20096) Expose the real queue name not null using --verbose
Kent Yao created SPARK-20096: Summary: Expose the real queue name not null using --verbose Key: SPARK-20096 URL: https://issues.apache.org/jira/browse/SPARK-20096 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 2.2.0 Reporter: Kent Yao Priority: Minor while submit apps with -v or --verbose, we can print the right queue name, but if we set a queue name with `spark.yarn.queue` by --conf or in the spark-default.conf, we just got `null` -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20078) Mesos executor configurability for task name and labels
[ https://issues.apache.org/jira/browse/SPARK-20078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-20078: - Assignee: Kalvin Chau > Mesos executor configurability for task name and labels > --- > > Key: SPARK-20078 > URL: https://issues.apache.org/jira/browse/SPARK-20078 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Kalvin Chau >Assignee: Kalvin Chau >Priority: Minor > Fix For: 2.2.0 > > > Add in the ability to configure the mesos task name as well as add labels to > the Mesos ExecutorInfo protobuf message. > Currently all executors that are spun up are named Task X (where X is the > executor number). > For centralized logging it would be nice to be able to have SparkJob1 X then > Name, as well as allowing users to add any labels they would want. > In this PR I chose "k1:v1,k2:v2" as the format, colons separating key-value > and commas to list out more than one. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20078) Mesos executor configurability for task name and labels
[ https://issues.apache.org/jira/browse/SPARK-20078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20078. --- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17404 [https://github.com/apache/spark/pull/17404] > Mesos executor configurability for task name and labels > --- > > Key: SPARK-20078 > URL: https://issues.apache.org/jira/browse/SPARK-20078 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Kalvin Chau >Priority: Minor > Fix For: 2.2.0 > > > Add in the ability to configure the mesos task name as well as add labels to > the Mesos ExecutorInfo protobuf message. > Currently all executors that are spun up are named Task X (where X is the > executor number). > For centralized logging it would be nice to be able to have SparkJob1 X then > Name, as well as allowing users to add any labels they would want. > In this PR I chose "k1:v1,k2:v2" as the format, colons separating key-value > and commas to list out more than one. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20095) A code bug in CodegenContext.withSubExprEliminationExprs
song fengfei created SPARK-20095: Summary: A code bug in CodegenContext.withSubExprEliminationExprs Key: SPARK-20095 URL: https://issues.apache.org/jira/browse/SPARK-20095 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: song fengfei Priority: Minor In the function:CodegenContext.withSubExprEliminationExprs...{ val oldsubExprEliminationExprs = subExprEliminationExprs subExprEliminationExprs.clear ... // Restore previous subExprEliminationExprs subExprEliminationExprs.clear oldsubExprEliminationExprs.foreach(subExprEliminationExprs += _) } it seems that the oldsubExprEliminationExprs and subExprEliminationExprs are the same instance,after the second subExprEliminationExprs.clear ,the oldsubExprEliminationExprs is also cleared,so,the previous subExprEliminationExprs in CodegenContext will not be Restored. Is it a bug? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19999) Test failures in Spark Core due to java.nio.Bits.unaligned()
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1. --- Resolution: Won't Fix I'm open to reopening this to special-case PPC if anyone cares enough to open a PR to explore whether that's all there is to it in Spark. > Test failures in Spark Core due to java.nio.Bits.unaligned() > > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 > Environment: Ubuntu 14.04 ppc64le > $ java -version > openjdk version "1.8.0_111" > OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14) > OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) >Reporter: Sonia Garudi > Labels: ppc64le > > There are multiple test failures seen in Spark Core project with the > following error message : > {code:borderStyle=solid} > java.lang.IllegalArgumentException: requirement failed: No support for > unaligned Unsafe. Set spark.memory.offHeap.enabled to false. > {code} > These errors occur due to java.nio.Bits.unaligned(), which does not return > true for the ppc64le arch. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20094) Putting predicate with IN subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr
[ https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-20094: - Summary: Putting predicate with IN subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr (was: Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr) > Putting predicate with IN subquery into join condition in ReorderJoin fails > RewritePredicateSubquery.rewriteExistentialExpr > --- > > Key: SPARK-20094 > URL: https://issues.apache.org/jira/browse/SPARK-20094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Zhenhua Wang > > ReorderJoin collects all predicates and try to put them into join condition > when creating ordered join. If a predicate with a subquery is in a join > condition instead of a filter condition, > `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the > subquery to an ExistenceJoin, and thus result in error. > For example, tpcds q45 fails due to the above reason: > {noformat} > spark-sql> explain codegen > > SELECT > > ca_zip, > > ca_city, > > sum(ws_sales_price) > > FROM web_sales, customer, customer_address, date_dim, item > > WHERE ws_bill_customer_sk = c_customer_sk > > AND c_current_addr_sk = ca_address_sk > > AND ws_item_sk = i_item_sk > > AND (substr(ca_zip, 1, 5) IN > > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', > '80348', '81792') > > OR > > i_item_id IN (SELECT i_item_id > > FROM item > > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > > ) > > ) > > AND ws_sold_date_sk = d_date_sk > > AND d_qoy = 2 AND d_year = 2001 > > GROUP BY ca_zip, ca_city > > ORDER BY ca_zip, ca_city > > LIMIT 100; > 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen > > SELECT > ca_zip, > ca_city, > sum(ws_sales_price) > FROM web_sales, customer, customer_address, date_dim, item > WHERE ws_bill_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND ws_item_sk = i_item_sk > AND (substr(ca_zip, 1, 5) IN > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', > '81792') > OR > i_item_id IN (SELECT i_item_id > FROM item > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > ) > ) > AND ws_sold_date_sk = d_date_sk > AND d_qoy = 2 AND d_year = 2001 > GROUP BY ca_zip, ca_city > ORDER BY ca_zip, ca_city > LIMIT 100] > java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 [] > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224) > at > org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) > at > org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at >
[jira] [Updated] (SPARK-20094) Putting predicate with IN subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr
[ https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-20094: - Description: ReorderJoin collects all predicates and try to put them into join condition when creating ordered join. If a predicate with an IN subquery is in a join condition instead of a filter condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the subquery to an ExistenceJoin, and thus result in error. For example, tpcds q45 fails due to the above reason: {noformat} spark-sql> explain codegen > SELECT > ca_zip, > ca_city, > sum(ws_sales_price) > FROM web_sales, customer, customer_address, date_dim, item > WHERE ws_bill_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND ws_item_sk = i_item_sk > AND (substr(ca_zip, 1, 5) IN > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', '81792') > OR > i_item_id IN (SELECT i_item_id > FROM item > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > ) > ) > AND ws_sold_date_sk = d_date_sk > AND d_qoy = 2 AND d_year = 2001 > GROUP BY ca_zip, ca_city > ORDER BY ca_zip, ca_city > LIMIT 100; 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen SELECT ca_zip, ca_city, sum(ws_sales_price) FROM web_sales, customer, customer_address, date_dim, item WHERE ws_bill_customer_sk = c_customer_sk AND c_current_addr_sk = ca_address_sk AND ws_item_sk = i_item_sk AND (substr(ca_zip, 1, 5) IN ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', '81792') OR i_item_id IN (SELECT i_item_id FROM item WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) ) ) AND ws_sold_date_sk = d_date_sk AND d_qoy = 2 AND d_year = 2001 GROUP BY ca_zip, ca_city ORDER BY ca_zip, ca_city LIMIT 100] java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 [] at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224) at org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) at org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) at org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) at org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.getJoinCondition(BroadcastHashJoinExec.scala:174) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:199) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68) at
[jira] [Assigned] (SPARK-20094) Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr
[ https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20094: Assignee: Apache Spark > Putting predicate with subquery into join condition in ReorderJoin fails > RewritePredicateSubquery.rewriteExistentialExpr > > > Key: SPARK-20094 > URL: https://issues.apache.org/jira/browse/SPARK-20094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Zhenhua Wang >Assignee: Apache Spark > > ReorderJoin collects all predicates and try to put them into join condition > when creating ordered join. If a predicate with a subquery is in a join > condition instead of a filter condition, > `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the > subquery to an ExistenceJoin, and thus result in error. > For example, tpcds q45 fails due to the above reason: > {noformat} > spark-sql> explain codegen > > SELECT > > ca_zip, > > ca_city, > > sum(ws_sales_price) > > FROM web_sales, customer, customer_address, date_dim, item > > WHERE ws_bill_customer_sk = c_customer_sk > > AND c_current_addr_sk = ca_address_sk > > AND ws_item_sk = i_item_sk > > AND (substr(ca_zip, 1, 5) IN > > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', > '80348', '81792') > > OR > > i_item_id IN (SELECT i_item_id > > FROM item > > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > > ) > > ) > > AND ws_sold_date_sk = d_date_sk > > AND d_qoy = 2 AND d_year = 2001 > > GROUP BY ca_zip, ca_city > > ORDER BY ca_zip, ca_city > > LIMIT 100; > 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen > > SELECT > ca_zip, > ca_city, > sum(ws_sales_price) > FROM web_sales, customer, customer_address, date_dim, item > WHERE ws_bill_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND ws_item_sk = i_item_sk > AND (substr(ca_zip, 1, 5) IN > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', > '81792') > OR > i_item_id IN (SELECT i_item_id > FROM item > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > ) > ) > AND ws_sold_date_sk = d_date_sk > AND d_qoy = 2 AND d_year = 2001 > GROUP BY ca_zip, ca_city > ORDER BY ca_zip, ca_city > LIMIT 100] > java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 [] > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224) > at > org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) > at > org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at >
[jira] [Commented] (SPARK-20094) Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr
[ https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941625#comment-15941625 ] Apache Spark commented on SPARK-20094: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/17428 > Putting predicate with subquery into join condition in ReorderJoin fails > RewritePredicateSubquery.rewriteExistentialExpr > > > Key: SPARK-20094 > URL: https://issues.apache.org/jira/browse/SPARK-20094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Zhenhua Wang > > ReorderJoin collects all predicates and try to put them into join condition > when creating ordered join. If a predicate with a subquery is in a join > condition instead of a filter condition, > `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the > subquery to an ExistenceJoin, and thus result in error. > For example, tpcds q45 fails due to the above reason: > {noformat} > spark-sql> explain codegen > > SELECT > > ca_zip, > > ca_city, > > sum(ws_sales_price) > > FROM web_sales, customer, customer_address, date_dim, item > > WHERE ws_bill_customer_sk = c_customer_sk > > AND c_current_addr_sk = ca_address_sk > > AND ws_item_sk = i_item_sk > > AND (substr(ca_zip, 1, 5) IN > > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', > '80348', '81792') > > OR > > i_item_id IN (SELECT i_item_id > > FROM item > > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > > ) > > ) > > AND ws_sold_date_sk = d_date_sk > > AND d_qoy = 2 AND d_year = 2001 > > GROUP BY ca_zip, ca_city > > ORDER BY ca_zip, ca_city > > LIMIT 100; > 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen > > SELECT > ca_zip, > ca_city, > sum(ws_sales_price) > FROM web_sales, customer, customer_address, date_dim, item > WHERE ws_bill_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND ws_item_sk = i_item_sk > AND (substr(ca_zip, 1, 5) IN > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', > '81792') > OR > i_item_id IN (SELECT i_item_id > FROM item > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > ) > ) > AND ws_sold_date_sk = d_date_sk > AND d_qoy = 2 AND d_year = 2001 > GROUP BY ca_zip, ca_city > ORDER BY ca_zip, ca_city > LIMIT 100] > java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 [] > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224) > at > org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) > at > org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at >
[jira] [Assigned] (SPARK-20094) Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr
[ https://issues.apache.org/jira/browse/SPARK-20094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20094: Assignee: (was: Apache Spark) > Putting predicate with subquery into join condition in ReorderJoin fails > RewritePredicateSubquery.rewriteExistentialExpr > > > Key: SPARK-20094 > URL: https://issues.apache.org/jira/browse/SPARK-20094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Zhenhua Wang > > ReorderJoin collects all predicates and try to put them into join condition > when creating ordered join. If a predicate with a subquery is in a join > condition instead of a filter condition, > `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the > subquery to an ExistenceJoin, and thus result in error. > For example, tpcds q45 fails due to the above reason: > {noformat} > spark-sql> explain codegen > > SELECT > > ca_zip, > > ca_city, > > sum(ws_sales_price) > > FROM web_sales, customer, customer_address, date_dim, item > > WHERE ws_bill_customer_sk = c_customer_sk > > AND c_current_addr_sk = ca_address_sk > > AND ws_item_sk = i_item_sk > > AND (substr(ca_zip, 1, 5) IN > > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', > '80348', '81792') > > OR > > i_item_id IN (SELECT i_item_id > > FROM item > > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > > ) > > ) > > AND ws_sold_date_sk = d_date_sk > > AND d_qoy = 2 AND d_year = 2001 > > GROUP BY ca_zip, ca_city > > ORDER BY ca_zip, ca_city > > LIMIT 100; > 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen > > SELECT > ca_zip, > ca_city, > sum(ws_sales_price) > FROM web_sales, customer, customer_address, date_dim, item > WHERE ws_bill_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND ws_item_sk = i_item_sk > AND (substr(ca_zip, 1, 5) IN > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', > '81792') > OR > i_item_id IN (SELECT i_item_id > FROM item > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > ) > ) > AND ws_sold_date_sk = d_date_sk > AND d_qoy = 2 AND d_year = 2001 > GROUP BY ca_zip, ca_city > ORDER BY ca_zip, ca_city > LIMIT 100] > java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 [] > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224) > at > org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) > at > org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at >
[jira] [Created] (SPARK-20094) Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr
Zhenhua Wang created SPARK-20094: Summary: Putting predicate with subquery into join condition in ReorderJoin fails RewritePredicateSubquery.rewriteExistentialExpr Key: SPARK-20094 URL: https://issues.apache.org/jira/browse/SPARK-20094 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Zhenhua Wang ReorderJoin collects all predicates and try to put them into join condition when creating ordered join. If a predicate with a subquery is in a join condition instead of a filter condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the subquery to an ExistenceJoin, and thus result in error. For example, tpcds q45 fails due to the above reason: {noformat} spark-sql> explain codegen > SELECT > ca_zip, > ca_city, > sum(ws_sales_price) > FROM web_sales, customer, customer_address, date_dim, item > WHERE ws_bill_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND ws_item_sk = i_item_sk > AND (substr(ca_zip, 1, 5) IN > ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', '81792') > OR > i_item_id IN (SELECT i_item_id > FROM item > WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) > ) > ) > AND ws_sold_date_sk = d_date_sk > AND d_qoy = 2 AND d_year = 2001 > GROUP BY ca_zip, ca_city > ORDER BY ca_zip, ca_city > LIMIT 100; 17/03/25 15:27:02 ERROR SparkSQLDriver: Failed in [explain codegen SELECT ca_zip, ca_city, sum(ws_sales_price) FROM web_sales, customer, customer_address, date_dim, item WHERE ws_bill_customer_sk = c_customer_sk AND c_current_addr_sk = ca_address_sk AND ws_item_sk = i_item_sk AND (substr(ca_zip, 1, 5) IN ('85669', '86197', '88274', '83405', '86475', '85392', '85460', '80348', '81792') OR i_item_id IN (SELECT i_item_id FROM item WHERE i_item_sk IN (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) ) ) AND ws_sold_date_sk = d_date_sk AND d_qoy = 2 AND d_year = 2001 GROUP BY ca_zip, ca_city ORDER BY ca_zip, ca_city LIMIT 100] java.lang.UnsupportedOperationException: Cannot evaluate expression: list#1 [] at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:224) at org.apache.spark.sql.catalyst.expressions.ListQuery.doGenCode(subquery.scala:262) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) at org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) at org.apache.spark.sql.catalyst.expressions.In$$anonfun$3.apply(predicates.scala:199) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.expressions.In.doGenCode(predicates.scala:199) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) at org.apache.spark.sql.catalyst.expressions.Or.doGenCode(predicates.scala:379) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.getJoinCondition(BroadcastHashJoinExec.scala:174) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:199) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at
[jira] [Created] (SPARK-20093) Exception when Joining dataframe with another dataframe generated by applying groupBy transformation on original one
Hosur Narahari created SPARK-20093: -- Summary: Exception when Joining dataframe with another dataframe generated by applying groupBy transformation on original one Key: SPARK-20093 URL: https://issues.apache.org/jira/browse/SPARK-20093 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0, 2.2.0 Reporter: Hosur Narahari When we generate a dataframe by doing grouping, and perform join on original dataframe with aggregate column, we get AnalysisException. Below I've attached a piece of code and resulting exception to reproduce. Code: import org.apache.spark.sql.SparkSession object App { lazy val spark = SparkSession.builder.appName("Test").master("local").getOrCreate def main(args: Array[String]): Unit = { test1 } private def test1 { import org.apache.spark.sql.functions._ val df = spark.createDataFrame(Seq(("M",172,60), ("M", 170, 60), ("F", 155, 56), ("M", 160, 55), ("F", 150, 53))).toDF("gender", "height", "weight") val groupDF = df.groupBy("gender").agg(min("height").as("height")) groupDF.show() val out = groupDF.join(df, groupDF("height") <=> df("height")).select(df("gender"), df("height"), df("weight")) out.show } } When I ran above code, I got below exception: Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) height#8 missing from height#19,height#30,gender#29,weight#31,gender#7 in operator !Join Inner, (height#19 <=> height#8);; !Join Inner, (height#19 <=> height#8) :- Aggregate [gender#7], [gender#7, min(height#8) AS height#19] : +- Project [_1#0 AS gender#7, _2#1 AS height#8, _3#2 AS weight#9] : +- LocalRelation [_1#0, _2#1, _3#2] +- Project [_1#0 AS gender#29, _2#1 AS height#30, _3#2 AS weight#31] +- LocalRelation [_1#0, _2#1, _3#2] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:90) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:342) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:90) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:53) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2831) at org.apache.spark.sql.Dataset.join(Dataset.scala:843) at org.apache.spark.sql.Dataset.join(Dataset.scala:807) at App$.test1(App.scala:17) at App$.main(App.scala:9) at App.main(App.scala) Please someone look into it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19518) IGNORE NULLS in first_value / last_value should be supported in SQL statements
[ https://issues.apache.org/jira/browse/SPARK-19518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941605#comment-15941605 ] Hyukjin Kwon commented on SPARK-19518: -- [~sabhyankar], Do you mind if I ask you are currently working on this? I am willing to take over this. > IGNORE NULLS in first_value / last_value should be supported in SQL statements > -- > > Key: SPARK-19518 > URL: https://issues.apache.org/jira/browse/SPARK-19518 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Ferenc Erdelyi > > https://issues.apache.org/jira/browse/SPARK-13049 was implemented in Spark2, > however it does not work in SQL statements as it is not implemented in Hive > yet: https://issues.apache.org/jira/browse/HIVE-11189 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API
[ https://issues.apache.org/jira/browse/SPARK-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20092: Assignee: Apache Spark > Trigger AppVeyor R tests for changes in Scala code related with R API > - > > Key: SPARK-20092 > URL: https://issues.apache.org/jira/browse/SPARK-20092 > Project: Spark > Issue Type: Improvement > Components: Project Infra, SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > We are currently detecting the changes in {{./R}} directory and then trigger > AppVeyor tests. > It seems we need to tests when there are some changes in > {{./core/src/main/scala/org/apache/spark/r}} and > {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API
[ https://issues.apache.org/jira/browse/SPARK-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941594#comment-15941594 ] Apache Spark commented on SPARK-20092: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/17427 > Trigger AppVeyor R tests for changes in Scala code related with R API > - > > Key: SPARK-20092 > URL: https://issues.apache.org/jira/browse/SPARK-20092 > Project: Spark > Issue Type: Improvement > Components: Project Infra, SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > We are currently detecting the changes in {{./R}} directory and then trigger > AppVeyor tests. > It seems we need to tests when there are some changes in > {{./core/src/main/scala/org/apache/spark/r}} and > {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API
[ https://issues.apache.org/jira/browse/SPARK-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20092: Assignee: (was: Apache Spark) > Trigger AppVeyor R tests for changes in Scala code related with R API > - > > Key: SPARK-20092 > URL: https://issues.apache.org/jira/browse/SPARK-20092 > Project: Spark > Issue Type: Improvement > Components: Project Infra, SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > We are currently detecting the changes in {{./R}} directory and then trigger > AppVeyor tests. > It seems we need to tests when there are some changes in > {{./core/src/main/scala/org/apache/spark/r}} and > {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API
[ https://issues.apache.org/jira/browse/SPARK-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-20092: - Component/s: Project Infra > Trigger AppVeyor R tests for changes in Scala code related with R API > - > > Key: SPARK-20092 > URL: https://issues.apache.org/jira/browse/SPARK-20092 > Project: Spark > Issue Type: Improvement > Components: Project Infra, SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > We are currently detecting the changes in {{./R}} directory and then trigger > AppVeyor tests. > It seems we need to tests when there are some changes in > {{./core/src/main/scala/org/apache/spark/r}} and > {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20092) Trigger AppVeyor R tests for changes in Scala code related with R API
Hyukjin Kwon created SPARK-20092: Summary: Trigger AppVeyor R tests for changes in Scala code related with R API Key: SPARK-20092 URL: https://issues.apache.org/jira/browse/SPARK-20092 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 2.2.0 Reporter: Hyukjin Kwon Priority: Minor We are currently detecting the changes in {{./R}} directory and then trigger AppVeyor tests. It seems we need to tests when there are some changes in {{./core/src/main/scala/org/apache/spark/r}} and {{./sql/core/src/main/scala/org/apache/spark/sql/api/r}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org