[jira] [Commented] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))
[ https://issues.apache.org/jira/browse/SPARK-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182214#comment-15182214 ] Santiago M. Mola commented on SPARK-13701: -- Installed gfortran. Now it fails on NLSSuite, then ALSSuite succeeds. {code} [info] NNLSSuite: [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.mllib.optimization.NNLSSuite *** ABORTED *** (68 milliseconds) [info] java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V [info] at org.jblas.NativeBlas.dgemm(Native Method) [info] at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247) [info] at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781) [info] at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138) [info] at org.apache.spark.mllib.optimization.NNLSSuite.genOnesData(NNLSSuite.scala:33) [info] at org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(NNLSSuite.scala:56) [info] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166) [info] at org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply$mcV$sp(NNLSSuite.scala:55) [info] at org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply(NNLSSuite.scala:45) [info] at org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply(NNLSSuite.scala:45) {code} > MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: > org.jblas.NativeBlas.dgemm)) > -- > > Key: SPARK-13701 > URL: https://issues.apache.org/jira/browse/SPARK-13701 > Project: Spark > Issue Type: Bug > Components: MLlib > Environment: Ubuntu 14.04 on aarch64 >Reporter: Santiago M. Mola >Priority: Minor > Labels: arm64, porting > > jblas fails on arm64. > {code} > ALSSuite: > Exception encountered when attempting to run a suite with class name: > org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 > milliseconds) > java.lang.UnsatisfiedLinkError: > org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V > at org.jblas.NativeBlas.dgemm(Native Method) > at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247) > at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781) > at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138) > at > org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))
[ https://issues.apache.org/jira/browse/SPARK-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181921#comment-15181921 ] Santiago M. Mola commented on SPARK-13701: -- This is probably just gfortran not being installed? I'll test as soon as possible. > MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: > org.jblas.NativeBlas.dgemm)) > -- > > Key: SPARK-13701 > URL: https://issues.apache.org/jira/browse/SPARK-13701 > Project: Spark > Issue Type: Bug > Components: MLlib > Environment: Ubuntu 14.04 on aarch64 >Reporter: Santiago M. Mola >Priority: Minor > Labels: arm64, porting > > jblas fails on arm64. > {code} > ALSSuite: > Exception encountered when attempting to run a suite with class name: > org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 > milliseconds) > java.lang.UnsatisfiedLinkError: > org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V > at org.jblas.NativeBlas.dgemm(Native Method) > at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247) > at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781) > at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138) > at > org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))
Santiago M. Mola created SPARK-13701: Summary: MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm)) Key: SPARK-13701 URL: https://issues.apache.org/jira/browse/SPARK-13701 Project: Spark Issue Type: Bug Components: MLlib Environment: Ubuntu 14.04 on aarch64 Reporter: Santiago M. Mola Priority: Minor jblas fails on arm64. {code} ALSSuite: Exception encountered when attempting to run a suite with class name: org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 milliseconds) java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V at org.jblas.NativeBlas.dgemm(Native Method) at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247) at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781) at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138) at org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13690) UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is found)
[ https://issues.apache.org/jira/browse/SPARK-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181362#comment-15181362 ] Santiago M. Mola commented on SPARK-13690: -- snappy-java does not have any fallback, but snappy seems to work on arm64 correctly. I submitted a PR for snappy-java, so a future version should have support. This issue will have to wait until such version is out. I don't expect active support for arm64, but given the latest developments on arm64 servers, I'm interested in experimenting with it. It seems I'm not the first one to think about it: http://www.sparkonarm.com/ ;-) > UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is > found) > - > > Key: SPARK-13690 > URL: https://issues.apache.org/jira/browse/SPARK-13690 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 > Environment: $ java -version > java version "1.8.0_73" > Java(TM) SE Runtime Environment (build 1.8.0_73-b02) > Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode) > $ uname -a > Linux spark-on-arm 4.2.0-55598-g45f70e3 #5 SMP Tue Feb 2 10:14:08 CET 2016 > aarch64 aarch64 aarch64 GNU/Linux >Reporter: Santiago M. Mola >Priority: Minor > Labels: arm64, porting > > UnsafeShuffleWriterSuite fails because of missing Snappy native library on > arm64. > {code} > Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 6.437 sec > <<< FAILURE! - in org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite > mergeSpillsWithFileStreamAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite) > Time elapsed: 0.072 sec <<< ERROR! > java.lang.reflect.InvocationTargetException > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389) > Caused by: java.lang.IllegalArgumentException: org.xerial.snappy.SnappyError: > [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux > and os.arch=aarch64 > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389) > Caused by: org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no > native library is found for os.name=Linux and os.arch=aarch64 > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389) > mergeSpillsWithTransferToAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite) > Time elapsed: 0.041 sec <<< ERROR! > java.lang.reflect.InvocationTargetException > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384) > Caused by: java.lang.IllegalArgumentException: > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384) > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384) > Running org.apache.spark.JavaAPISuite > Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.526 sec - > in org.apache.spark.JavaAPISuite > Running org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite > Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.761 sec - > in org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite > Running org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite > Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.967 sec - > in org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite > Running org.apache.spark.api.java.OptionalSuite > Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec - > in org.apache.spark.api.java.OptionalSu
[jira] [Created] (SPARK-13690) UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is found)
Santiago M. Mola created SPARK-13690: Summary: UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is found) Key: SPARK-13690 URL: https://issues.apache.org/jira/browse/SPARK-13690 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.0 Environment: $ java -version java version "1.8.0_73" Java(TM) SE Runtime Environment (build 1.8.0_73-b02) Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode) $ uname -a Linux spark-on-arm 4.2.0-55598-g45f70e3 #5 SMP Tue Feb 2 10:14:08 CET 2016 aarch64 aarch64 aarch64 GNU/Linux Reporter: Santiago M. Mola Priority: Minor UnsafeShuffleWriterSuite fails because of missing Snappy native library on arm64. {code} Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 6.437 sec <<< FAILURE! - in org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite mergeSpillsWithFileStreamAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite) Time elapsed: 0.072 sec <<< ERROR! java.lang.reflect.InvocationTargetException at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389) Caused by: java.lang.IllegalArgumentException: org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux and os.arch=aarch64 at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389) Caused by: org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux and os.arch=aarch64 at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389) mergeSpillsWithTransferToAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite) Time elapsed: 0.041 sec <<< ERROR! java.lang.reflect.InvocationTargetException at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384) Caused by: java.lang.IllegalArgumentException: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337) at org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384) Running org.apache.spark.JavaAPISuite Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.526 sec - in org.apache.spark.JavaAPISuite Running org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.761 sec - in org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite Running org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.967 sec - in org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite Running org.apache.spark.api.java.OptionalSuite Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec - in org.apache.spark.api.java.OptionalSuite Results : Tests in error: UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy:389->testMergingSpills:337 » InvocationTarget UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy:384->testMergingSpills:337 » InvocationTarget {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources
[ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093533#comment-15093533 ] Santiago M. Mola commented on SPARK-12449: -- Implementing this interface or an equivalent one would help standarize a lot of advanced features that data sources have been doing for some time. And while doing so, it would prevent them from creating their own SQLContext variants or patching the running SQLContext at runtime (using extraStrategies). Here's a list of data source that are currently this approach. It would also be good to take them into account for this JIRA. The proposed interface and strategy should probably support all of these use cases. Some of them also use their own catalog implementation, but that should be something for a separate JIRA. *spark-sql-on-hbase* Already mentioned by [~yzhou2001]. They are using HBaseContext with extraStrategies that inject HBaseStrategies doing aggregation push down: https://github.com/Huawei-Spark/Spark-SQL-on-HBase/blob/master/src/main/scala/org/apache/spark/sql/hbase/execution/HBaseStrategies.scala *memsql-spark-connector* They offer both their own SQLContext or inject their MemSQL-specific push down strategy on runtime. They do match Catalyst's LogicalPlan in the same way we're proposing to push down filters, projects, aggregates, limits, sorts and joins: https://github.com/memsql/memsql-spark-connector/blob/master/connectorLib/src/main/scala/com/memsql/spark/pushdown/MemSQLPushdownStrategy.scala *spark-iqmulus* Strategy injected to push down counts and some aggregates: https://github.com/IGNF/spark-iqmulus/blob/master/src/main/scala/fr/ign/spark/iqmulus/ExtraStrategies.scala *druid-olap* They use SparkPlanner, Strategy and LogicalPlan APIs to do extensive push down. Their API usage could be limited to LogicalPlan only if this JIRA is implemented: https://github.com/SparklineData/spark-druid-olap/blob/master/src/main/scala/org/apache/spark/sql/sources/druid/ *magellan* _(probably out of scope)_ Does its own BroadcastJoin. Although, it seems to me that this usage would be out of scope for us. https://github.com/harsha2010/magellan/blob/master/src/main/scala/magellan/execution/MagellanStrategies.scala > Pushing down arbitrary logical plans to data sources > > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Stephan Kessler > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources > for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows > to push down filters and projects pruning unnecessary fields and rows > directly in the data source. > However, data sources such as SQL Engines are capable of doing even more > preprocessing, e.g., evaluating aggregates. This is beneficial because it > would reduce the amount of data transferred from the source to Spark. The > existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to > defer the processing of arbitrary logical plans to the data source. We have > already shown the details at the Spark Summit 2015 Europe > [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources
[ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070100#comment-15070100 ] Santiago M. Mola commented on SPARK-12449: -- Well, at least with the implementation presented at the Spark Summit, only the logical plan is required. The physical plan is handled only by the planner strategy, which would be internal to Spark. The strategy has all the logic required to split partial ops and push down only one part. > Pushing down arbitrary logical plans to data sources > > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Stephan Kessler > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources > for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows > to push down filters and projects pruning unnecessary fields and rows > directly in the data source. > However, data sources such as SQL Engines are capable of doing even more > preprocessing, e.g., evaluating aggregates. This is beneficial because it > would reduce the amount of data transferred from the source to Spark. The > existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to > defer the processing of arbitrary logical plans to the data source. We have > already shown the details at the Spark Summit 2015 Europe > [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources
[ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070062#comment-15070062 ] Santiago M. Mola commented on SPARK-12449: -- The physical plan would not be consumed by data sources, only the logical plan. An alternative approach would be to use a different representation to pass the logical plan to the data source. If the relational algebra from Apache Calcite is stable enough, it could be used as the logical plan representation for this interface. > Pushing down arbitrary logical plans to data sources > > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Stephan Kessler > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources > for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows > to push down filters and projects pruning unnecessary fields and rows > directly in the data source. > However, data sources such as SQL Engines are capable of doing even more > preprocessing, e.g., evaluating aggregates. This is beneficial because it > would reduce the amount of data transferred from the source to Spark. The > existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to > defer the processing of arbitrary logical plans to the data source. We have > already shown the details at the Spark Summit 2015 Europe > [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6
[ https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068396#comment-15068396 ] Santiago M. Mola commented on SPARK-11855: -- I will not have time to finish this before 1.6 release. Feel free to close the issue, since it won't apply after the release. > Catalyst breaks backwards compatibility in branch-1.6 > - > > Key: SPARK-11855 > URL: https://issues.apache.org/jira/browse/SPARK-11855 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Santiago M. Mola >Priority: Critical > > There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most > cases: > *UnresolvedRelation*'s constructor has been changed from taking a Seq to a > TableIdentifier. A deprecated constructor taking Seq would be needed to be > backwards compatible. > {code} > case class UnresolvedRelation( > -tableIdentifier: Seq[String], > +tableIdentifier: TableIdentifier, > alias: Option[String] = None) extends LeafNode { > {code} > It is similar with *UnresolvedStar*: > {code} > -case class UnresolvedStar(table: Option[String]) extends Star with > Unevaluable { > +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with > Unevaluable { > {code} > *Catalog* did get a lot of signatures changed too (because of > TableIdentifier). Providing the older methods as deprecated also seems viable > here. > Spark 1.5 already broke backwards compatibility of part of catalyst API with > respect to 1.4. I understand there are good reasons for some cases, but we > should try to minimize backwards compatibility breakages for 1.x. Specially > now that 2.x is on the horizon and there will be a near opportunity to remove > deprecated stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources
[ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066655#comment-15066655 ] Santiago M. Mola commented on SPARK-12449: -- At Stratio we are interested in this kind of interface too, both for SQL and NoSQL data sources (e.g. MongoDB). > Pushing down arbitrary logical plans to data sources > > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Stephan Kessler > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources > for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows > to push down filters and projects pruning unnecessary fields and rows > directly in the data source. > However, data sources such as SQL Engines are capable of doing even more > preprocessing, e.g., evaluating aggregates. This is beneficial because it > would reduce the amount of data transferred from the source to Spark. The > existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to > defer the processing of arbitrary logical plans to the data source. We have > already shown the details at the Spark Summit 2015 Europe > [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6
[ https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014193#comment-15014193 ] Santiago M. Mola commented on SPARK-11855: -- Thanks Michael. Sounds reasonable. I'll prepare a PR reducing the incompatibilities where it can be done in a non-invasive way. > Catalyst breaks backwards compatibility in branch-1.6 > - > > Key: SPARK-11855 > URL: https://issues.apache.org/jira/browse/SPARK-11855 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Santiago M. Mola >Priority: Critical > > There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most > cases: > *UnresolvedRelation*'s constructor has been changed from taking a Seq to a > TableIdentifier. A deprecated constructor taking Seq would be needed to be > backwards compatible. > {code} > case class UnresolvedRelation( > -tableIdentifier: Seq[String], > +tableIdentifier: TableIdentifier, > alias: Option[String] = None) extends LeafNode { > {code} > It is similar with *UnresolvedStar*: > {code} > -case class UnresolvedStar(table: Option[String]) extends Star with > Unevaluable { > +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with > Unevaluable { > {code} > *Catalog* did get a lot of signatures changed too (because of > TableIdentifier). Providing the older methods as deprecated also seems viable > here. > Spark 1.5 already broke backwards compatibility of part of catalyst API with > respect to 1.4. I understand there are good reasons for some cases, but we > should try to minimize backwards compatibility breakages for 1.x. Specially > now that 2.x is on the horizon and there will be a near opportunity to remove > deprecated stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6
[ https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-11855: - Description: There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most cases: *UnresolvedRelation*'s constructor has been changed from taking a Seq to a TableIdentifier. A deprecated constructor taking Seq would be needed to be backwards compatible. {code} case class UnresolvedRelation( -tableIdentifier: Seq[String], +tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode { {code} It is similar with *UnresolvedStar*: {code} -case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable { +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable { {code} *Catalog* did get a lot of signatures changed too (because of TableIdentifier). Providing the older methods as deprecated also seems viable here. Spark 1.5 already broke backwards compatibility of part of catalyst API with respect to 1.4. I understand there are good reasons for some cases, but we should try to minimize backwards compatibility breakages for 1.x. Specially now that 2.x is on the horizon and there will be a near opportunity to remove deprecated stuff. was: There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most cases: *UnresolvedRelation*'s constructor has been changed from taking a Seq to a TableIdentifier. A deprecated constructor taking Seq would be needed to be backwards compatible. {code} case class UnresolvedRelation( -tableIdentifier: Seq[String], +tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode { {code} It is similar with *UnresolvedStar*: {code} -case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable { +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable { {code} Spark 1.5 already broke backwards compatibility of part of catalyst API with respect to 1.4. I understand there are good reasons for some cases, but we should try to minimize backwards compatibility breakages for 1.x. Specially now that 2.x is on the horizon and there will be a near opportunity to remove deprecated stuff. > Catalyst breaks backwards compatibility in branch-1.6 > - > > Key: SPARK-11855 > URL: https://issues.apache.org/jira/browse/SPARK-11855 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Santiago M. Mola >Priority: Critical > > There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most > cases: > *UnresolvedRelation*'s constructor has been changed from taking a Seq to a > TableIdentifier. A deprecated constructor taking Seq would be needed to be > backwards compatible. > {code} > case class UnresolvedRelation( > -tableIdentifier: Seq[String], > +tableIdentifier: TableIdentifier, > alias: Option[String] = None) extends LeafNode { > {code} > It is similar with *UnresolvedStar*: > {code} > -case class UnresolvedStar(table: Option[String]) extends Star with > Unevaluable { > +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with > Unevaluable { > {code} > *Catalog* did get a lot of signatures changed too (because of > TableIdentifier). Providing the older methods as deprecated also seems viable > here. > Spark 1.5 already broke backwards compatibility of part of catalyst API with > respect to 1.4. I understand there are good reasons for some cases, but we > should try to minimize backwards compatibility breakages for 1.x. Specially > now that 2.x is on the horizon and there will be a near opportunity to remove > deprecated stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6
[ https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013914#comment-15013914 ] Santiago M. Mola commented on SPARK-11855: -- They have public visibility and no @DeveloperApi or @Experimental annotations, so I always assumed they are. I have been working with catalyst on a day-to-day basis for almost 1 year now. I understand that catalyst might not offer the same kind of backwards compatibility as spark-core, but it would be good to avoid breaking backwards compatibility, specially in cases where it is easy to do (which are most of the cases I encounter). I think part of the solution is also marking some parts as @Experimental. For example, UnsafeArrayData interface changed wildly, and it's probably not viable to maintain backwards compatibility, but it should be marked as @Experimental if more breakage is expected before 2.0. > Catalyst breaks backwards compatibility in branch-1.6 > - > > Key: SPARK-11855 > URL: https://issues.apache.org/jira/browse/SPARK-11855 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Santiago M. Mola >Priority: Critical > > There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most > cases: > *UnresolvedRelation*'s constructor has been changed from taking a Seq to a > TableIdentifier. A deprecated constructor taking Seq would be needed to be > backwards compatible. > {code} > case class UnresolvedRelation( > -tableIdentifier: Seq[String], > +tableIdentifier: TableIdentifier, > alias: Option[String] = None) extends LeafNode { > {code} > It is similar with *UnresolvedStar*: > {code} > -case class UnresolvedStar(table: Option[String]) extends Star with > Unevaluable { > +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with > Unevaluable { > {code} > Spark 1.5 already broke backwards compatibility of part of catalyst API with > respect to 1.4. I understand there are good reasons for some cases, but we > should try to minimize backwards compatibility breakages for 1.x. Specially > now that 2.x is on the horizon and there will be a near opportunity to remove > deprecated stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6
[ https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-11855: - Priority: Critical (was: Major) Description: There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most cases: *UnresolvedRelation*'s constructor has been changed from taking a Seq to a TableIdentifier. A deprecated constructor taking Seq would be needed to be backwards compatible. {code} case class UnresolvedRelation( -tableIdentifier: Seq[String], +tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode { {code} It is similar with *UnresolvedStar*: {code} -case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable { +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable { {code} Spark 1.5 already broke backwards compatibility of part of catalyst API with respect to 1.4. I understand there are good reasons for some cases, but we should try to minimize backwards compatibility breakages for 1.x. Specially now that 2.x is on the horizon and there will be a near opportunity to remove deprecated stuff. was: UnresolvedRelation's constructor has been changed from taking a Seq to a TableIdentifier. A deprecated constructor taking Seq would be needed to be backwards compatible. {code} case class UnresolvedRelation( -tableIdentifier: Seq[String], +tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode { {code} It is similar with UnresolvedStar: {code} -case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable { +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable { {code} > Catalyst breaks backwards compatibility in branch-1.6 > - > > Key: SPARK-11855 > URL: https://issues.apache.org/jira/browse/SPARK-11855 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Santiago M. Mola >Priority: Critical > > There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most > cases: > *UnresolvedRelation*'s constructor has been changed from taking a Seq to a > TableIdentifier. A deprecated constructor taking Seq would be needed to be > backwards compatible. > {code} > case class UnresolvedRelation( > -tableIdentifier: Seq[String], > +tableIdentifier: TableIdentifier, > alias: Option[String] = None) extends LeafNode { > {code} > It is similar with *UnresolvedStar*: > {code} > -case class UnresolvedStar(table: Option[String]) extends Star with > Unevaluable { > +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with > Unevaluable { > {code} > Spark 1.5 already broke backwards compatibility of part of catalyst API with > respect to 1.4. I understand there are good reasons for some cases, but we > should try to minimize backwards compatibility breakages for 1.x. Specially > now that 2.x is on the horizon and there will be a near opportunity to remove > deprecated stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6
[ https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-11855: - Summary: Catalyst breaks backwards compatibility in branch-1.6 (was: UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in branch-1.6) > Catalyst breaks backwards compatibility in branch-1.6 > - > > Key: SPARK-11855 > URL: https://issues.apache.org/jira/browse/SPARK-11855 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Santiago M. Mola > > UnresolvedRelation's constructor has been changed from taking a Seq to a > TableIdentifier. A deprecated constructor taking Seq would be needed to be > backwards compatible. > {code} > case class UnresolvedRelation( > -tableIdentifier: Seq[String], > +tableIdentifier: TableIdentifier, > alias: Option[String] = None) extends LeafNode { > {code} > It is similar with UnresolvedStar: > {code} > -case class UnresolvedStar(table: Option[String]) extends Star with > Unevaluable { > +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with > Unevaluable { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11855) UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in branch-1.6
[ https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-11855: - Summary: UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in branch-1.6 (was: UnresolvedRelation constructor is not backwards compatible in branch-1.6) > UnresolvedRelation/UnresolvedStar constructors are not backwards compatible > in branch-1.6 > - > > Key: SPARK-11855 > URL: https://issues.apache.org/jira/browse/SPARK-11855 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Santiago M. Mola > > UnresolvedRelation's constructor has been changed from taking a Seq to a > TableIdentifier. A deprecated constructor taking Seq would be needed to be > backwards compatible. > {code} > case class UnresolvedRelation( > -tableIdentifier: Seq[String], > +tableIdentifier: TableIdentifier, > alias: Option[String] = None) extends LeafNode { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11855) UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in branch-1.6
[ https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-11855: - Description: UnresolvedRelation's constructor has been changed from taking a Seq to a TableIdentifier. A deprecated constructor taking Seq would be needed to be backwards compatible. {code} case class UnresolvedRelation( -tableIdentifier: Seq[String], +tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode { {code} It is similar with UnresolvedStar: {code} -case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable { +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable { {code} was: UnresolvedRelation's constructor has been changed from taking a Seq to a TableIdentifier. A deprecated constructor taking Seq would be needed to be backwards compatible. {code} case class UnresolvedRelation( -tableIdentifier: Seq[String], +tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode { {code} > UnresolvedRelation/UnresolvedStar constructors are not backwards compatible > in branch-1.6 > - > > Key: SPARK-11855 > URL: https://issues.apache.org/jira/browse/SPARK-11855 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Santiago M. Mola > > UnresolvedRelation's constructor has been changed from taking a Seq to a > TableIdentifier. A deprecated constructor taking Seq would be needed to be > backwards compatible. > {code} > case class UnresolvedRelation( > -tableIdentifier: Seq[String], > +tableIdentifier: TableIdentifier, > alias: Option[String] = None) extends LeafNode { > {code} > It is similar with UnresolvedStar: > {code} > -case class UnresolvedStar(table: Option[String]) extends Star with > Unevaluable { > +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with > Unevaluable { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11855) UnresolvedRelation constructor is not backwards compatible in branch-1.6
Santiago M. Mola created SPARK-11855: Summary: UnresolvedRelation constructor is not backwards compatible in branch-1.6 Key: SPARK-11855 URL: https://issues.apache.org/jira/browse/SPARK-11855 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: Santiago M. Mola UnresolvedRelation's constructor has been changed from taking a Seq to a TableIdentifier. A deprecated constructor taking Seq would be needed to be backwards compatible. {code} case class UnresolvedRelation( -tableIdentifier: Seq[String], +tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode { {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11780) Provide type aliases in org.apache.spark.sql.types for backwards compatibility
Santiago M. Mola created SPARK-11780: Summary: Provide type aliases in org.apache.spark.sql.types for backwards compatibility Key: SPARK-11780 URL: https://issues.apache.org/jira/browse/SPARK-11780 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.6.0 Reporter: Santiago M. Mola With SPARK-11273, ArrayData, MapData and others were moved from org.apache.spark.sql.types to org.apache.spark.sql.catalyst.util. Since this is a backward incompatible change, it would be good to provide type aliases from the old package (deprecated) to the new one. For example: {code} package object types { @deprecated type ArrayData = org.apache.spark.sql.catalyst.util.ArrayData } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11186) Caseness inconsistency between SQLContext and HiveContext
[ https://issues.apache.org/jira/browse/SPARK-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-11186: - Description: Default catalog behaviour for caseness is different in {{SQLContext}} and {{HiveContext}}. {code} test("Catalog caseness (SQL)") { val sqlc = new SQLContext(sc) val relationName = "MyTable" sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new BaseRelation { override def sqlContext: SQLContext = sqlc override def schema: StructType = StructType(Nil) })) val tables = sqlc.tableNames() assert(tables.contains(relationName)) } test("Catalog caseness (Hive)") { val sqlc = new HiveContext(sc) val relationName = "MyTable" sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new BaseRelation { override def sqlContext: SQLContext = sqlc override def schema: StructType = StructType(Nil) })) val tables = sqlc.tableNames() assert(tables.contains(relationName)) } {code} Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. But the reason that this is needed seems undocumented (both in the manual or in the source code comments). was: Default catalog behaviour for caseness is different in {{SQLContext}} and {{HiveContext}}. {code} test("Catalog caseness (SQL)") { val sqlc = new SQLContext(sc) val relationName = "MyTable" sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new BaseRelation { override def sqlContext: SQLContext = sqlc override def schema: StructType = StructType(Nil) })) val tables = sqlc.tableNames() assert(tables.contains(relationName)) } test("Catalog caseness (Hive)") { val sqlc = new HiveContext(sc) val relationName = "MyTable" sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new BaseRelation { override def sqlContext: SQLContext = sqlc override def schema: StructType = StructType(Nil) })) val tables = sqlc.tableNames() assert(tables.contains(relationName)) } {/code} Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. But the reason that this is needed seems undocumented (both in the manual or in the source code comments). > Caseness inconsistency between SQLContext and HiveContext > - > > Key: SPARK-11186 > URL: https://issues.apache.org/jira/browse/SPARK-11186 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Santiago M. Mola >Priority: Minor > > Default catalog behaviour for caseness is different in {{SQLContext}} and > {{HiveContext}}. > {code} > test("Catalog caseness (SQL)") { > val sqlc = new SQLContext(sc) > val relationName = "MyTable" > sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new > BaseRelation { > override def sqlContext: SQLContext = sqlc > override def schema: StructType = StructType(Nil) > })) > val tables = sqlc.tableNames() > assert(tables.contains(relationName)) > } > test("Catalog caseness (Hive)") { > val sqlc = new HiveContext(sc) > val relationName = "MyTable" > sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new > BaseRelation { > override def sqlContext: SQLContext = sqlc > override def schema: StructType = StructType(Nil) > })) > val tables = sqlc.tableNames() > assert(tables.contains(relationName)) > } > {code} > Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. > But the reason that this is needed seems undocumented (both in the manual or > in the source code comments). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11186) Caseness inconsistency between SQLContext and HiveContext
Santiago M. Mola created SPARK-11186: Summary: Caseness inconsistency between SQLContext and HiveContext Key: SPARK-11186 URL: https://issues.apache.org/jira/browse/SPARK-11186 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1 Reporter: Santiago M. Mola Priority: Minor Default catalog behaviour for caseness is different in {{SQLContext}} and {{HiveContext}}. {code} test("Catalog caseness (SQL)") { val sqlc = new SQLContext(sc) val relationName = "MyTable" sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new BaseRelation { override def sqlContext: SQLContext = sqlc override def schema: StructType = StructType(Nil) })) val tables = sqlc.tableNames() assert(tables.contains(relationName)) } test("Catalog caseness (Hive)") { val sqlc = new HiveContext(sc) val relationName = "MyTable" sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new BaseRelation { override def sqlContext: SQLContext = sqlc override def schema: StructType = StructType(Nil) })) val tables = sqlc.tableNames() assert(tables.contains(relationName)) } {/code} Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. But the reason that this is needed seems undocumented (both in the manual or in the source code comments). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7275) Make LogicalRelation public
[ https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939660#comment-14939660 ] Santiago M. Mola commented on SPARK-7275: - LogicalRelation was moved to execution.datasources in Spark 1.5, but it's still private[sql]. Can we make it public now? > Make LogicalRelation public > --- > > Key: SPARK-7275 > URL: https://issues.apache.org/jira/browse/SPARK-7275 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Santiago M. Mola >Priority: Minor > > It seems LogicalRelation is the only part of the LogicalPlan that is not > public. This makes it harder to work with full logical plans from third party > packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8377) Identifiers caseness information should be available at any time
[ https://issues.apache.org/jira/browse/SPARK-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710855#comment-14710855 ] Santiago M. Mola commented on SPARK-8377: - Right. However, there is no distinction between an identifier that was quoted by the user and one that was not. So the user intent is lost. If we see "a", we don't know if the user wanted strictly "a" or case insensitive "a". So if we have a column "a" and a column "A", which one should we match? > Identifiers caseness information should be available at any time > > > Key: SPARK-8377 > URL: https://issues.apache.org/jira/browse/SPARK-8377 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Santiago M. Mola > > Currently, we have the option of having a case sensitive catalog or not. A > case insensitive catalog just lowercases all identifiers. However, when > pushing down to a data source, we lose the information about if an identifier > should be case insensitive or strictly lowercase. > Ideally, we would be able to distinguish a case insensitive identifier from a > case sensitive one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9307) Logging: Make it either stable or private[spark]
Santiago M. Mola created SPARK-9307: --- Summary: Logging: Make it either stable or private[spark] Key: SPARK-9307 URL: https://issues.apache.org/jira/browse/SPARK-9307 Project: Spark Issue Type: Improvement Reporter: Santiago M. Mola Priority: Minor org.apache.spark.Logging is a public class that is quite easy to include from any IDE, assuming it's safe to use because it's part of the public API. However, its Javadoc states: {code} NOTE: DO NOT USE this class outside of Spark. It is intended as an internal utility. This will likely be changed or removed in future releases. {/code} It would be safer to either make a commitment for the backwards-compatibility of this class, or make it private[spark]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9307) Logging: Make it either stable or private[spark]
[ https://issues.apache.org/jira/browse/SPARK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-9307: Description: org.apache.spark.Logging is a public class that is quite easy to include from any IDE, assuming it's safe to use because it's part of the public API. However, its Javadoc states: {code} NOTE: DO NOT USE this class outside of Spark. It is intended as an internal utility. This will likely be changed or removed in future releases. {code} It would be safer to either make a commitment for the backwards-compatibility of this class, or make it private[spark]. was: org.apache.spark.Logging is a public class that is quite easy to include from any IDE, assuming it's safe to use because it's part of the public API. However, its Javadoc states: {code} NOTE: DO NOT USE this class outside of Spark. It is intended as an internal utility. This will likely be changed or removed in future releases. {/code} It would be safer to either make a commitment for the backwards-compatibility of this class, or make it private[spark]. > Logging: Make it either stable or private[spark] > > > Key: SPARK-9307 > URL: https://issues.apache.org/jira/browse/SPARK-9307 > Project: Spark > Issue Type: Improvement >Reporter: Santiago M. Mola >Priority: Minor > > org.apache.spark.Logging is a public class that is quite easy to include from > any IDE, assuming it's safe to use because it's part of the public API. > However, its Javadoc states: > {code} > NOTE: DO NOT USE this class outside of Spark. It is intended as an internal > utility. > This will likely be changed or removed in future releases. > {code} > It would be safer to either make a commitment for the backwards-compatibility > of this class, or make it private[spark]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6981) [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext
[ https://issues.apache.org/jira/browse/SPARK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614645#comment-14614645 ] Santiago M. Mola commented on SPARK-6981: - Any progress on this? > [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext > > > Key: SPARK-6981 > URL: https://issues.apache.org/jira/browse/SPARK-6981 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0, 1.4.0 >Reporter: Edoardo Vacchi >Priority: Minor > > In order to simplify extensibility with new strategies from third-parties, it > should be better to factor SparkPlanner and QueryExecution in their own > classes. Dependent types add additional, unnecessary complexity; besides, > HiveContext would benefit from this change as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling
[ https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614627#comment-14614627 ] Santiago M. Mola commented on SPARK-8636: - [~davies] NULL values are grouped together when using a GROUP BY clause. See https://en.wikipedia.org/wiki/Null_%28SQL%29#When_two_nulls_are_equal:_grouping.2C_sorting.2C_and_some_set_operations {quote} Because SQL:2003 defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct". This definition of not distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other keywords that perform grouping) are used. {quote} > CaseKeyWhen has incorrect NULL handling > --- > > Key: SPARK-8636 > URL: https://issues.apache.org/jira/browse/SPARK-8636 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Santiago M. Mola > Labels: starter > > CaseKeyWhen implementation in Spark uses the following equals implementation: > {code} > private def equalNullSafe(l: Any, r: Any) = { > if (l == null && r == null) { > true > } else if (l == null || r == null) { > false > } else { > l == r > } > } > {code} > Which is not correct, since in SQL, NULL is never equal to NULL (actually, it > is not unequal either). In this case, a NULL value in a CASE WHEN expression > should never match. > For example, you can execute this in MySQL: > {code} > SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END > FROM DUAL; > {code} > And the result will be "NULL DOES NOT MATCH". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling
[ https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605557#comment-14605557 ] Santiago M. Mola commented on SPARK-8636: - [~davies], [~animeshbaranawal] In SQL, NULL is never equal to NULL. Any comparison to NULL is UNKNOWN. Most SQL implementations represent UNKNOWN as NULL, too. > CaseKeyWhen has incorrect NULL handling > --- > > Key: SPARK-8636 > URL: https://issues.apache.org/jira/browse/SPARK-8636 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Santiago M. Mola > Labels: starter > > CaseKeyWhen implementation in Spark uses the following equals implementation: > {code} > private def equalNullSafe(l: Any, r: Any) = { > if (l == null && r == null) { > true > } else if (l == null || r == null) { > false > } else { > l == r > } > } > {code} > Which is not correct, since in SQL, NULL is never equal to NULL (actually, it > is not unequal either). In this case, a NULL value in a CASE WHEN expression > should never match. > For example, you can execute this in MySQL: > {code} > SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END > FROM DUAL; > {code} > And the result will be "NULL DOES NOT MATCH". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6064) Checking data types when resolving types
[ https://issues.apache.org/jira/browse/SPARK-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602755#comment-14602755 ] Santiago M. Mola commented on SPARK-6064: - This issue might have been superseded by https://issues.apache.org/jira/browse/SPARK-7562 > Checking data types when resolving types > > > Key: SPARK-6064 > URL: https://issues.apache.org/jira/browse/SPARK-6064 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Kai Zeng > > In catalyst/expressions/arithmetic.scala and > catalyst/expressions/predicates.scala, many arithmetic/predicate requires the > operands to be of certain numeric type. > These type checking codes should be done when we are resolving the > expressions. > See this PR: > https://github.com/apache/spark/pull/4685 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8654) Analysis exception when using "NULL IN (...)": invalid cast
Santiago M. Mola created SPARK-8654: --- Summary: Analysis exception when using "NULL IN (...)": invalid cast Key: SPARK-8654 URL: https://issues.apache.org/jira/browse/SPARK-8654 Project: Spark Issue Type: Bug Components: SQL Reporter: Santiago M. Mola Priority: Minor The following query throws an analysis exception: {code} SELECT * FROM t WHERE NULL NOT IN (1, 2, 3); {code} The exception is: {code} org.apache.spark.sql.AnalysisException: invalid cast from int to null; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:66) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52) {code} Here is a test that can be added to AnalysisSuite to check the issue: {code} test("SPARK- regression test") { val plan = Project(Alias(In(Literal(null), Seq(Literal(1), Literal(2))), "a")() :: Nil, LocalRelation() ) caseInsensitiveAnalyze(plan) } {code} Note that this kind of query is a corner case, but it is still valid SQL. An expression such as "NULL IN (...)" or "NULL NOT IN (...)" always gives NULL as a result, even if the list contains NULL. So it is safe to translate these expressions to Literal(null) during analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling
[ https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602520#comment-14602520 ] Santiago M. Mola commented on SPARK-8636: - [~animeshbaranawal] Yes, I think so. > CaseKeyWhen has incorrect NULL handling > --- > > Key: SPARK-8636 > URL: https://issues.apache.org/jira/browse/SPARK-8636 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Santiago M. Mola > Labels: starter > > CaseKeyWhen implementation in Spark uses the following equals implementation: > {code} > private def equalNullSafe(l: Any, r: Any) = { > if (l == null && r == null) { > true > } else if (l == null || r == null) { > false > } else { > l == r > } > } > {code} > Which is not correct, since in SQL, NULL is never equal to NULL (actually, it > is not unequal either). In this case, a NULL value in a CASE WHEN expression > should never match. > For example, you can execute this in MySQL: > {code} > SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END > FROM DUAL; > {code} > And the result will be "NULL DOES NOT MATCH". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8636) CaseKeyWhen has incorrect NULL handling
Santiago M. Mola created SPARK-8636: --- Summary: CaseKeyWhen has incorrect NULL handling Key: SPARK-8636 URL: https://issues.apache.org/jira/browse/SPARK-8636 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Santiago M. Mola CaseKeyWhen implementation in Spark uses the following equals implementation: {code} private def equalNullSafe(l: Any, r: Any) = { if (l == null && r == null) { true } else if (l == null || r == null) { false } else { l == r } } {code} Which is not correct, since in SQL, NULL is never equal to NULL (actually, it is not unequal either). In this case, a NULL value in a CASE WHEN expression should never match. For example, you can execute this in MySQL: {code} SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END FROM DUAL; {code} And the result will be "NULL DOES NOT MATCH". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse
[ https://issues.apache.org/jira/browse/SPARK-8628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-8628: Description: SPARK-5009 introduced the following code in AbstractSparkSQLParser: {code} def parse(input: String): LogicalPlan = { // Initialize the Keywords. lexical.initialize(reservedWords) phrase(start)(new lexical.Scanner(input)) match { case Success(plan, _) => plan case failureOrError => sys.error(failureOrError.toString) } } {code} The corresponding initialize method in SqlLexical is not thread-safe: {code} /* This is a work around to support the lazy setting */ def initialize(keywords: Seq[String]): Unit = { reserved.clear() reserved ++= keywords } {code} I'm hitting this when parsing multiple SQL queries concurrently. When one query parsing starts, it empties the reserved keyword list, then a race-condition occurs and other queries fail to parse because they recognize keywords as identifiers. was: SPARK-5009 introduced the following code: def parse(input: String): LogicalPlan = { // Initialize the Keywords. lexical.initialize(reservedWords) phrase(start)(new lexical.Scanner(input)) match { case Success(plan, _) => plan case failureOrError => sys.error(failureOrError.toString) } } The corresponding initialize method in SqlLexical is not thread-safe: /* This is a work around to support the lazy setting */ def initialize(keywords: Seq[String]): Unit = { reserved.clear() reserved ++= keywords } I'm hitting this when parsing multiple SQL queries concurrently. When one query parsing starts, it empties the reserved keyword list, then a race-condition occurs and other queries fail to parse because they recognize keywords as identifiers. > Race condition in AbstractSparkSQLParser.parse > -- > > Key: SPARK-8628 > URL: https://issues.apache.org/jira/browse/SPARK-8628 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.0 >Reporter: Santiago M. Mola >Priority: Critical > Labels: regression > > SPARK-5009 introduced the following code in AbstractSparkSQLParser: > {code} > def parse(input: String): LogicalPlan = { > // Initialize the Keywords. > lexical.initialize(reservedWords) > phrase(start)(new lexical.Scanner(input)) match { > case Success(plan, _) => plan > case failureOrError => sys.error(failureOrError.toString) > } > } > {code} > The corresponding initialize method in SqlLexical is not thread-safe: > {code} > /* This is a work around to support the lazy setting */ > def initialize(keywords: Seq[String]): Unit = { > reserved.clear() > reserved ++= keywords > } > {code} > I'm hitting this when parsing multiple SQL queries concurrently. When one > query parsing starts, it empties the reserved keyword list, then a > race-condition occurs and other queries fail to parse because they recognize > keywords as identifiers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse
[ https://issues.apache.org/jira/browse/SPARK-8628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601012#comment-14601012 ] Santiago M. Mola commented on SPARK-8628: - Here is an example of failure with Spark 1.4.0: {code} [1.152] failure: ``union'' expected but identifier OR found SELECT CASE a+1 WHEN b THEN 111 WHEN c THEN 222 WHEN d THEN 333 WHEN e THEN 444 ELSE 555 END, a-b, a FROM t1 WHERE e+d BETWEEN a+b-10 AND c+130 OR a>b OR d>e ^ java.lang.RuntimeException: [1.152] failure: ``union'' expected but identifier OR found SELECT CASE a+1 WHEN b THEN 111 WHEN c THEN 222 WHEN d THEN 333 WHEN e THEN 444 ELSE 555 END, a-b, a FROM t1 WHERE e+d BETWEEN a+b-10 AND c+130 OR a>b OR d>e ^ at scala.sys.package$.error(package.scala:27) {code} > Race condition in AbstractSparkSQLParser.parse > -- > > Key: SPARK-8628 > URL: https://issues.apache.org/jira/browse/SPARK-8628 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.0 >Reporter: Santiago M. Mola >Priority: Critical > Labels: regression > > SPARK-5009 introduced the following code: > def parse(input: String): LogicalPlan = { > // Initialize the Keywords. > lexical.initialize(reservedWords) > phrase(start)(new lexical.Scanner(input)) match { > case Success(plan, _) => plan > case failureOrError => sys.error(failureOrError.toString) > } > } > The corresponding initialize method in SqlLexical is not thread-safe: > /* This is a work around to support the lazy setting */ > def initialize(keywords: Seq[String]): Unit = { > reserved.clear() > reserved ++= keywords > } > I'm hitting this when parsing multiple SQL queries concurrently. When one > query parsing starts, it empties the reserved keyword list, then a > race-condition occurs and other queries fail to parse because they recognize > keywords as identifiers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse
Santiago M. Mola created SPARK-8628: --- Summary: Race condition in AbstractSparkSQLParser.parse Key: SPARK-8628 URL: https://issues.apache.org/jira/browse/SPARK-8628 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0, 1.3.1, 1.3.0 Reporter: Santiago M. Mola Priority: Critical SPARK-5009 introduced the following code: def parse(input: String): LogicalPlan = { // Initialize the Keywords. lexical.initialize(reservedWords) phrase(start)(new lexical.Scanner(input)) match { case Success(plan, _) => plan case failureOrError => sys.error(failureOrError.toString) } } The corresponding initialize method in SqlLexical is not thread-safe: /* This is a work around to support the lazy setting */ def initialize(keywords: Seq[String]): Unit = { reserved.clear() reserved ++= keywords } I'm hitting this when parsing multiple SQL queries concurrently. When one query parsing starts, it empties the reserved keyword list, then a race-condition occurs and other queries fail to parse because they recognize keywords as identifiers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585883#comment-14585883 ] Santiago M. Mola commented on SPARK-: - I opened SPARK-8377 to track the general case, since I have this problem with other data sources, not just JDBC. > org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: >Reporter: John Ferguson >Priority: Critical > > Is there a way to have JDBC DataFrames use quoted/escaped column names? > Right now, it looks like it "sees" the names correctly in the schema created > but does not escape them in the SQL it creates when they are not compliant: > org.apache.spark.sql.jdbc.JDBCRDD > > private val columnList: String = { > val sb = new StringBuilder() > columns.foreach(x => sb.append(",").append(x)) > if (sb.length == 0) "1" else sb.substring(1) > } > If you see value in this, I would take a shot at adding the quoting > (escaping) of column names here. If you don't do it, some drivers... like > postgresql's will simply drop case all names when parsing the query. As you > can see in the TL;DR below that means they won't match the schema I am given. > TL;DR: > > I am able to connect to a Postgres database in the shell (with driver > referenced): >val jdbcDf = > sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500") > In fact when I run: >jdbcDf.registerTempTable("sp500") >val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI > FROM sp500") > and >val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share"))) > The values come back as expected. However, if I try: >jdbcDf.show > Or if I try > >val all = sqlContext.sql("SELECT * FROM sp500") >all.show > I get errors about column names not being found. In fact the error includes > a mention of column names all lower cased. For now I will change my schema > to be more restrictive. Right now it is, per a Stack Overflow poster, not > ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL and > SQLServer. BTW, our users are giving us tables like this... because various > tools they already use support non-compliant names. In fact, this is mild > compared to what we've had to support. > Currently the schema in question uses mixed case, quoted names with special > characters and spaces: > CREATE TABLE sp500 > ( > "Symbol" text, > "Name" text, > "Sector" text, > "Price" double precision, > "Dividend Yield" double precision, > "Price/Earnings" double precision, > "Earnings/Share" double precision, > "Book Value" double precision, > "52 week low" double precision, > "52 week high" double precision, > "Market Cap" double precision, > "EBITDA" double precision, > "Price/Sales" double precision, > "Price/Book" double precision, > "SEC Filings" text > ) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8377) Identifiers caseness information should be available at any time
Santiago M. Mola created SPARK-8377: --- Summary: Identifiers caseness information should be available at any time Key: SPARK-8377 URL: https://issues.apache.org/jira/browse/SPARK-8377 Project: Spark Issue Type: Improvement Components: SQL Reporter: Santiago M. Mola Currently, we have the option of having a case sensitive catalog or not. A case insensitive catalog just lowercases all identifiers. However, when pushing down to a data source, we lose the information about if an identifier should be case insensitive or strictly lowercase. Ideally, we would be able to distinguish a case insensitive identifier from a case sensitive one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8370) Add API for data sources to register databases
[ https://issues.apache.org/jira/browse/SPARK-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-8370: Component/s: SQL > Add API for data sources to register databases > -- > > Key: SPARK-8370 > URL: https://issues.apache.org/jira/browse/SPARK-8370 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Santiago M. Mola > > This API would allow to register a database with a data source instead of > just a table. Registering a data source database would register all its table > and maintain the catalog updated. The catalog could delegate to the data > source lookups of tables for a database registered with this API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8370) Add API for data sources to register databases
Santiago M. Mola created SPARK-8370: --- Summary: Add API for data sources to register databases Key: SPARK-8370 URL: https://issues.apache.org/jira/browse/SPARK-8370 Project: Spark Issue Type: New Feature Reporter: Santiago M. Mola This API would allow to register a database with a data source instead of just a table. Registering a data source database would register all its table and maintain the catalog updated. The catalog could delegate to the data source lookups of tables for a database registered with this API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4867) UDF clean up
[ https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559423#comment-14559423 ] Santiago M. Mola commented on SPARK-4867: - Maybe this issue can be split in smaller tasks? A lot of built-in functions can be removed from the parser quite easily by registering them in the FunctionRegistry. I am doing this with a lot of fixed-arity functions. I'm using some helper functions to create FunctionBuilders for Expression for use with the FunctionRegistry. The main helper looks like this: {code} def expression[T <: Expression](arity: Int)(implicit tag: ClassTag[T]): ExpressionBuilder = { val argTypes = (1 to arity).map(x => classOf[Expression]) val constructor = tag.runtimeClass.getDeclaredConstructor(argTypes: _*) (expressions: Seq[Expression]) => { if (expressions.size != arity) { throw new IllegalArgumentException( s"Invalid number of arguments: ${expressions.size} (must be equal to $arity)" ) } constructor.newInstance(expressions: _*).asInstanceOf[Expression] } } {code} and can be used like this: {code} functionRegistry.registerFunction("MY_FUNCTION", expression[MyFunction]) {code} If this approach looks like what is needed, I can extend it to use expressions with a variable number of parameters. Also, with some syntatic sugar we can provide a function that works this way: {code} functionRegistry.registerFunction[MyFunction] // Register the builder produced by expression[MyFunction] with name "MY_FUNCTION" by using a camelcase -> underscore-separated conversion. {code} How does this sound? > UDF clean up > > > Key: SPARK-4867 > URL: https://issues.apache.org/jira/browse/SPARK-4867 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Michael Armbrust >Priority: Blocker > > Right now our support and internal implementation of many functions has a few > issues. Specifically: > - UDFS don't know their input types and thus don't do type coercion. > - We hard code a bunch of built in functions into the parser. This is bad > because in SQL it creates new reserved words for things that aren't actually > keywords. Also it means that for each function we need to add support to > both SQLContext and HiveContext separately. > For this JIRA I propose we do the following: > - Change the interfaces for registerFunction and ScalaUdf to include types > for the input arguments as well as the output type. > - Add a rule to analysis that does type coercion for UDFs. > - Add a parse rule for functions to SQLParser. > - Rewrite all the UDFs that are currently hacked into the various parsers > using this new functionality. > Depending on how big this refactoring becomes we could split parts 1&2 from > part 3 above. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4678) A SQL query with subquery fails with TreeNodeException
[ https://issues.apache.org/jira/browse/SPARK-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559335#comment-14559335 ] Santiago M. Mola commented on SPARK-4678: - [~ozawa] Does this happen in more recent versions? > A SQL query with subquery fails with TreeNodeException > -- > > Key: SPARK-4678 > URL: https://issues.apache.org/jira/browse/SPARK-4678 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.1 >Reporter: Tsuyoshi Ozawa > > {code} > spark-sql> create external table if NOT EXISTS randomText100GB(text string) > location 'hdfs:///user/ozawa/randomText100GB'; > spark-sql> CREATE TABLE wordcount AS > > SELECT word, count(1) AS count > > FROM (SELECT > EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' ')) > > AS word FROM randomText100GB) words > > GROUP BY word; > org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in > stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 > (TID 25, hadoop-slave2.c.gcp-s > amples.internal): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: word#5 > > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) > > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43) > > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42) > > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) > > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156) > > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42) > > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52) > > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52) > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > scala.collection.AbstractTraversable.map(Traversable.scala:105) > > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.(Projection.scala:52) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106) > > org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) > > org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:54) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3815) LPAD function does not work in where predicate
[ https://issues.apache.org/jira/browse/SPARK-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559331#comment-14559331 ] Santiago M. Mola commented on SPARK-3815: - [~yanakad] Is this still present in more recent versions? If yes, could you provide a minimal test case (query + data)? > LPAD function does not work in where predicate > -- > > Key: SPARK-3815 > URL: https://issues.apache.org/jira/browse/SPARK-3815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Yana Kadiyska >Priority: Minor > > select customer_id from mytable where > pkey=concat_ws('-',LPAD('077',4,'0'),'2014-07') LIMIT 2 > produces: > 14/10/03 14:51:35 ERROR server.SparkSQLOperationManager: Error executing > query: > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) > at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:597) > at > org.apache.spark.sql.execution.Limit.execute(basicOperators.scala:146) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360) > at > org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:185) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175) > at > org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) > at > org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526) > at > org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.NotSerializableException: java.lang.reflect.Constructor > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at scala.collection.immutable.$colon$colon.writeObject(List.scala:379) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
[jira] [Commented] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser
[ https://issues.apache.org/jira/browse/SPARK-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559097#comment-14559097 ] Santiago M. Mola commented on SPARK-7012: - [~6133d] SQLContext parses DDL statements (such as CREATE TEMPORARY TABLE) with an independent parser called DDLParser: https://github.com/apache/spark/blob/f38e619c41d242143c916373f2a44ec674679f19/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L87 The parsing of the columns for the schema is done in DDLParser.column: https://github.com/apache/spark/blob/f38e619c41d242143c916373f2a44ec674679f19/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L176 > Add support for NOT NULL modifier for column definitions on DDLParser > - > > Key: SPARK-7012 > URL: https://issues.apache.org/jira/browse/SPARK-7012 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Santiago M. Mola >Priority: Minor > Labels: easyfix > > Add support for NOT NULL modifier for column definitions on DDLParser. This > would add support for the following syntax: > CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3846) [SQL] Serialization exception (Kryo) on joins when enabling codegen
[ https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-3846: Summary: [SQL] Serialization exception (Kryo) on joins when enabling codegen (was: [SQL] Serialization exception (Kryo and Java) on joins when enabling codegen ) > [SQL] Serialization exception (Kryo) on joins when enabling codegen > > > Key: SPARK-3846 > URL: https://issues.apache.org/jira/browse/SPARK-3846 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0, 1.2.0 >Reporter: Jianshi Huang >Priority: Blocker > > The error is reproducible when I join two tables manually. The error message > is like follows. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 > in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage > 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException: > Unable to find class: > __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1 > > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) > > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) > com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610) > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721) > com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42) > com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133) > > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) > org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) > > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198) > > org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:56) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5707) Enabling spark.sql.codegen throws ClassNotFound exception
[ https://issues.apache.org/jira/browse/SPARK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558823#comment-14558823 ] Santiago M. Mola commented on SPARK-5707: - This is probably a duplicate of SPARK-3846. > Enabling spark.sql.codegen throws ClassNotFound exception > - > > Key: SPARK-5707 > URL: https://issues.apache.org/jira/browse/SPARK-5707 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0, 1.3.1 > Environment: yarn-client mode, spark.sql.codegen=true >Reporter: Yi Yao >Assignee: Ram Sriharsha >Priority: Blocker > > Exception thrown: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in > stage 133.0 failed 4 times, most recent failure: Lost task 13.3 in stage > 133.0 (TID 3066, cdh52-node2): java.io.IOException: > com.esotericsoftware.kryo.KryoException: Unable to find class: > __wrapper$1$81257352e1c844aebf09cb84fe9e7459.__wrapper$1$81257352e1c844aebf09cb84fe9e7459$SpecificRow$1 > Serialization trace: > hashTable (org.apache.spark.sql.execution.joins.UniqueKeyHashedRelation) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) > at > org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) > at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$3.apply(BroadcastHashJoin.scala:62) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$3.apply(BroadcastHashJoin.scala:61) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.rdd.CartesianRDD.compute(CartesianRDD.scala:75) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.rdd.CartesianRDD.compute(CartesianRDD.scala:75) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.r
[jira] [Updated] (SPARK-3846) [SQL] Serialization exception (Kryo and Java) on joins when enabling codegen
[ https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-3846: Summary: [SQL] Serialization exception (Kryo and Java) on joins when enabling codegen (was: KryoException when doing joins in SparkSQL ) > [SQL] Serialization exception (Kryo and Java) on joins when enabling codegen > - > > Key: SPARK-3846 > URL: https://issues.apache.org/jira/browse/SPARK-3846 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0, 1.2.0 >Reporter: Jianshi Huang >Priority: Blocker > > The error is reproducible when I join two tables manually. The error message > is like follows. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 > in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage > 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException: > Unable to find class: > __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1 > > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) > > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) > com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610) > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721) > com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42) > com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133) > > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) > org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) > > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198) > > org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:56) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3846) KryoException when doing joins in SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-3846: Priority: Blocker (was: Major) > KryoException when doing joins in SparkSQL > --- > > Key: SPARK-3846 > URL: https://issues.apache.org/jira/browse/SPARK-3846 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0, 1.2.0 >Reporter: Jianshi Huang >Priority: Blocker > > The error is reproducible when I join two tables manually. The error message > is like follows. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 > in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage > 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException: > Unable to find class: > __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1 > > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) > > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) > com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610) > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721) > com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42) > com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133) > > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) > org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) > > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198) > > org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:56) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3846) KryoException when doing joins in SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558808#comment-14558808 ] Santiago M. Mola commented on SPARK-3846: - [~huangjs] Would you mind adding a test case here (an example of data and exact code used to produce the exception)? > KryoException when doing joins in SparkSQL > --- > > Key: SPARK-3846 > URL: https://issues.apache.org/jira/browse/SPARK-3846 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0, 1.2.0 >Reporter: Jianshi Huang > > The error is reproducible when I join two tables manually. The error message > is like follows. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 > in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage > 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException: > Unable to find class: > __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1 > > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) > > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) > com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610) > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721) > com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42) > com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133) > > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) > org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) > > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198) > > org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:56) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7823) [SQL] Batch, FixedPoint, Strategy should not be inner classes of class RuleExecutor
[ https://issues.apache.org/jira/browse/SPARK-7823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola resolved SPARK-7823. - Resolution: Duplicate This is a duplicate of https://issues.apache.org/jira/browse/SPARK-7727 > [SQL] Batch, FixedPoint, Strategy should not be inner classes of class > RuleExecutor > --- > > Key: SPARK-7823 > URL: https://issues.apache.org/jira/browse/SPARK-7823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.1, 1.4.0 >Reporter: Edoardo Vacchi >Priority: Minor > > Batch, FixedPoint, Strategy, Once, are defined within the class > RuleExecutor[TreeType]. This makes unnecessarily complicated to reuse batches > of rules within custom optimizers. E.g: > {code:java} > object DefaultOptimizer extends Optimizer { > override val batches = /* batches defined here */ > } > object MyCustomOptimizer extends Optimizer { > override val batches = > Batch("my custom batch" ...) :: > DefaultOptimizer.batches > } > {code} > MyCustomOptimizer won't compile, because DefaultOptimizer.batches has type > "Seq[DefaultOptimizer.this.Batch]". > Solution: Batch, FixedPoint, etc. should be moved *outside* the > RuleExecutor[T] class body, either in a companion object or right in the > `rules` package. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7727) Avoid inner classes in RuleExecutor
[ https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-7727: Comment: was deleted (was: [~evacchi] I'm sorry I opened this duplicate for: https://issues.apache.org/jira/browse/SPARK-7823 Not sure which one to mark as duplicate since both have pull requests.) > Avoid inner classes in RuleExecutor > --- > > Key: SPARK-7727 > URL: https://issues.apache.org/jira/browse/SPARK-7727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola > Labels: easyfix, starter > > In RuleExecutor, the following classes and objects are defined as inner > classes or objects: Strategy, Once, FixedPoint, Batch. > This does not seem to accomplish anything in this case, but makes > extensibility harder. For example, if I want to define a new Optimizer that > uses all batches from the DefaultOptimizer plus some more, I would do > something like: > {code} > new Optimizer { > override protected val batches: Seq[Batch] = > DefaultOptimizer.batches ++ myBatches > } > {code} > But this will give a typing error because batches in DefaultOptimizer are of > type DefaultOptimizer#Batch while myBatches are this#Batch. > Workarounds include either copying the list of batches from DefaultOptimizer > or using a method like this: > {code} > private def transformBatchType(b: DefaultOptimizer.Batch): Batch = { > val strategy = b.strategy.maxIterations match { > case 1 => Once > case n => FixedPoint(n) > } > Batch(b.name, strategy, b.rules) > } > {code} > However, making these classes outer would solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7727) Avoid inner classes in RuleExecutor
[ https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558777#comment-14558777 ] Santiago M. Mola commented on SPARK-7727: - [~evacchi] I'm sorry I opened this duplicate for: https://issues.apache.org/jira/browse/SPARK-7823 Not sure which one to mark as duplicate since both have pull requests. > Avoid inner classes in RuleExecutor > --- > > Key: SPARK-7727 > URL: https://issues.apache.org/jira/browse/SPARK-7727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola > Labels: easyfix, starter > > In RuleExecutor, the following classes and objects are defined as inner > classes or objects: Strategy, Once, FixedPoint, Batch. > This does not seem to accomplish anything in this case, but makes > extensibility harder. For example, if I want to define a new Optimizer that > uses all batches from the DefaultOptimizer plus some more, I would do > something like: > {code} > new Optimizer { > override protected val batches: Seq[Batch] = > DefaultOptimizer.batches ++ myBatches > } > {code} > But this will give a typing error because batches in DefaultOptimizer are of > type DefaultOptimizer#Batch while myBatches are this#Batch. > Workarounds include either copying the list of batches from DefaultOptimizer > or using a method like this: > {code} > private def transformBatchType(b: DefaultOptimizer.Batch): Batch = { > val strategy = b.strategy.maxIterations match { > case 1 => Once > case n => FixedPoint(n) > } > Batch(b.name, strategy, b.rules) > } > {code} > However, making these classes outer would solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7727) Avoid inner classes in RuleExecutor
[ https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558758#comment-14558758 ] Santiago M. Mola commented on SPARK-7727: - [~chenghao] I think that is a good idea. Analyzer could be converted into a trait, moving current Analyzer to DefaultAnalyzer. It is probably a good idea to use a separate JIRA and pull request for that though. > Avoid inner classes in RuleExecutor > --- > > Key: SPARK-7727 > URL: https://issues.apache.org/jira/browse/SPARK-7727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola > Labels: easyfix, starter > > In RuleExecutor, the following classes and objects are defined as inner > classes or objects: Strategy, Once, FixedPoint, Batch. > This does not seem to accomplish anything in this case, but makes > extensibility harder. For example, if I want to define a new Optimizer that > uses all batches from the DefaultOptimizer plus some more, I would do > something like: > {code} > new Optimizer { > override protected val batches: Seq[Batch] = > DefaultOptimizer.batches ++ myBatches > } > {code} > But this will give a typing error because batches in DefaultOptimizer are of > type DefaultOptimizer#Batch while myBatches are this#Batch. > Workarounds include either copying the list of batches from DefaultOptimizer > or using a method like this: > {code} > private def transformBatchType(b: DefaultOptimizer.Batch): Batch = { > val strategy = b.strategy.maxIterations match { > case 1 => Once > case n => FixedPoint(n) > } > Batch(b.name, strategy, b.rules) > } > {code} > However, making these classes outer would solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5755) remove unnecessary Add for unary plus sign
[ https://issues.apache.org/jira/browse/SPARK-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola resolved SPARK-5755. - Resolution: Fixed Fix Version/s: 1.3.0 > remove unnecessary Add for unary plus sign > --- > > Key: SPARK-5755 > URL: https://issues.apache.org/jira/browse/SPARK-5755 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Adrian Wang >Priority: Minor > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5755) remove unnecessary Add for unary plus sign (HiveQL)
[ https://issues.apache.org/jira/browse/SPARK-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-5755: Summary: remove unnecessary Add for unary plus sign (HiveQL) (was: remove unnecessary Add for unary plus sign ) > remove unnecessary Add for unary plus sign (HiveQL) > --- > > Key: SPARK-5755 > URL: https://issues.apache.org/jira/browse/SPARK-5755 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Adrian Wang >Priority: Minor > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5305) Using a field in a WHERE clause that is not in the schema does not throw an exception.
[ https://issues.apache.org/jira/browse/SPARK-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555070#comment-14555070 ] Santiago M. Mola commented on SPARK-5305: - [~sonixbp] What version were you using? Do you still experience this problem? It does not seem possible with recent versions. > Using a field in a WHERE clause that is not in the schema does not throw an > exception. > -- > > Key: SPARK-5305 > URL: https://issues.apache.org/jira/browse/SPARK-5305 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Corey J. Nolet > > Given a schema: > key1 = String > key2 = Integer > The following sql statement doesn't seem to throw an exception: > SELECT * FROM myTable WHERE doesntExist = 'val1' -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7754) [SQL] Use PartialFunction literals instead of objects in Catalyst
[ https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554935#comment-14554935 ] Santiago M. Mola commented on SPARK-7754: - Not all rules use transform. Some use transformUp and others use transformAllExpressions. Maybe this rule API could be extended to cover these cases. > [SQL] Use PartialFunction literals instead of objects in Catalyst > - > > Key: SPARK-7754 > URL: https://issues.apache.org/jira/browse/SPARK-7754 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Edoardo Vacchi >Priority: Minor > > Catalyst rules extend two distinct "rule" types: {{Rule[LogicalPlan]}} and > {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}). > The distinction is fairly subtle: in the end, both rule types are supposed to > define a method {{apply(plan: LogicalPlan)}} > (where LogicalPlan is either Logical- or Spark-) which returns a transformed > plan (or a sequence thereof, in the case > of Strategy). > Ceremonies asides, the body of such method is always of the kind: > {code:java} > def apply(plan: PlanType) = plan match pf > {code} > where `pf` would be some `PartialFunction` of the PlanType: > {code:java} > val pf = { > case ... => ... > } > {code} > This is JIRA is a proposal to introduce utility methods to > a) reduce the boilerplate to define rewrite rules > b) turning them back into what they essentially represent: function types. > These changes would be backwards compatible, and would greatly help in > understanding what the code does. Current use of objects is redundant and > possibly confusing. > *{{Rule[LogicalPlan]}}* > a) Introduce the utility object > {code:java} > object rule > def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): > Rule[LogicalPlan] = > new Rule[LogicalPlan] { > def apply (plan: LogicalPlan): LogicalPlan = plan transform pf > } > def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): > Rule[LogicalPlan] = > new Rule[LogicalPlan] { > override val ruleName = name > def apply (plan: LogicalPlan): LogicalPlan = plan transform pf > } > {code} > b) progressively replace the boilerplate-y object definitions; e.g. > {code:java} > object MyRewriteRule extends Rule[LogicalPlan] { > def apply(plan: LogicalPlan): LogicalPlan = plan transform { > case ... => ... > } > {code} > with > {code:java} > // define a Rule[LogicalPlan] > val MyRewriteRule = rule { > case ... => ... > } > {code} > and/or : > {code:java} > // define a named Rule[LogicalPlan] > val MyRewriteRule = rule.named("My rewrite rule") { > case ... => ... > } > {code} > *Strategies* > A similar solution could be applied to shorten the code for > Strategies, which are total functions > only because they are all supposed to manage the default case, > possibly returning `Nil`. In this case > we might introduce the following utility: > {code:java} > object strategy { > /** >* Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan]. >* The partial function must therefore return *one single* SparkPlan for > each case. >* The method will automatically wrap them in a [[Seq]]. >* Unhandled cases will automatically return Seq.empty >*/ > def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy = > new Strategy { > def apply(plan: LogicalPlan): Seq[SparkPlan] = > if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty > } > /** >* Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] > ]. >* The partial function must therefore return a Seq[SparkPlan] for each > case. >* Unhandled cases will automatically return Seq.empty >*/ > def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy = > new Strategy { > def apply(plan: LogicalPlan): Seq[SparkPlan] = > if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan] > } > } > {code} > Usage: > {code:java} > val mystrategy = strategy { case ... => ... } > val seqstrategy = strategy.seq { case ... => ... } > {code} > *Further possible improvements:* > Making the utility methods `implicit`, thereby > further reducing the rewrite rules to: > {code:java} > // define a PartialFunction[LogicalPlan, LogicalPlan] > // the implicit would convert it into a Rule[LogicalPlan] at the use sites > val MyRewriteRule = { > case ... => ... > } > {code} > *Caveats* > Because of the way objects are initialized vs. vals, it might be necessary > reorder instructions so that vals are actually initialized before they are > used. > E.g.: > {code:java} > class MyOptim
[jira] [Reopened] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL
[ https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola reopened SPARK-7724: - Thanks. Here's a PR. > Add support for Intersect and Except in Catalyst DSL > > > Key: SPARK-7724 > URL: https://issues.apache.org/jira/browse/SPARK-7724 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola >Priority: Trivial > Labels: easyfix, starter > > Catalyst DSL to create logical plans supports most of the current plan, but > it is missing Except and Intersect. See LogicalPlanFunctions: > https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL
[ https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553059#comment-14553059 ] Santiago M. Mola edited comment on SPARK-7724 at 5/20/15 8:36 PM: -- DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively for writing test cases, so I thought that a trivial patch to complete the API would make sense. I can continue using Intersect and Except classes directly though. was (Author: smolav): DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively for writing test cases, so I thought that a trivial patch to complete the API would make sense. I can continue using Intersect and Except clases directly though. > Add support for Intersect and Except in Catalyst DSL > > > Key: SPARK-7724 > URL: https://issues.apache.org/jira/browse/SPARK-7724 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola >Priority: Trivial > Labels: easyfix, starter > > Catalyst DSL to create logical plans supports most of the current plan, but > it is missing Except and Intersect. See LogicalPlanFunctions: > https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL
[ https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553059#comment-14553059 ] Santiago M. Mola commented on SPARK-7724: - DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively for writing test cases, so I thought that a trivial patch to complete the API would make sense. I can continue using Intersect and Except clases directly though. > Add support for Intersect and Except in Catalyst DSL > > > Key: SPARK-7724 > URL: https://issues.apache.org/jira/browse/SPARK-7724 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola >Priority: Trivial > Labels: easyfix, starter > > Catalyst DSL to create logical plans supports most of the current plan, but > it is missing Except and Intersect. See LogicalPlanFunctions: > https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7727) Avoid inner classes in RuleExecutor
Santiago M. Mola created SPARK-7727: --- Summary: Avoid inner classes in RuleExecutor Key: SPARK-7727 URL: https://issues.apache.org/jira/browse/SPARK-7727 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Santiago M. Mola In RuleExecutor, the following classes and objects are defined as inner classes or objects: Strategy, Once, FixedPoint, Batch. This does not seem to accomplish anything in this case, but makes extensibility harder. For example, if I want to define a new Optimizer that uses all batches from the DefaultOptimizer plus some more, I would do something like: {code} new Optimizer { override protected val batches: Seq[Batch] = DefaultOptimizer.batches ++ myBatches } {code} But this will give a typing error because batches in DefaultOptimizer are of type DefaultOptimizer#Batch while myBatches are this#Batch. Workarounds include either copying the list of batches from DefaultOptimizer or using a method like this: {code} private def transformBatchType(b: DefaultOptimizer.Batch): Batch = { val strategy = b.strategy.maxIterations match { case 1 => Once case n => FixedPoint(n) } Batch(b.name, strategy, b.rules) } {code} However, making these classes outer would solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL
Santiago M. Mola created SPARK-7724: --- Summary: Add support for Intersect and Except in Catalyst DSL Key: SPARK-7724 URL: https://issues.apache.org/jira/browse/SPARK-7724 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Santiago M. Mola Priority: Trivial Catalyst DSL to create logical plans supports most of the current plan, but it is missing Except and Intersect. See LogicalPlanFunctions: https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7275) Make LogicalRelation public
[ https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547362#comment-14547362 ] Santiago M. Mola commented on SPARK-7275: - [~rxin] What are your thoughts on this? > Make LogicalRelation public > --- > > Key: SPARK-7275 > URL: https://issues.apache.org/jira/browse/SPARK-7275 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Santiago M. Mola >Priority: Minor > > It seems LogicalRelation is the only part of the LogicalPlan that is not > public. This makes it harder to work with full logical plans from third party > packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results
[ https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543602#comment-14543602 ] Santiago M. Mola commented on SPARK-6743: - This problem only happens for cached relations. Here is the root of the problem: {code} /* Fails. Got: Array(Row("A1"), Row("A2") */ assertResult(Array(Row(), Row()))( InMemoryColumnarTableScan(Nil, Nil, sqlc.table("tab0").queryExecution.sparkPlan.asInstanceOf[InMemoryColumnarTableScan].relation) .execute().collect() ) {code} InMemoryColumnarTableScan returns the narrowest column when no attributes are requested: {code} // Find the ordinals and data types of the requested columns. If none are requested, use the // narrowest (the field with minimum default element size). val (requestedColumnIndices, requestedColumnDataTypes) = if (attributes.isEmpty) { val (narrowestOrdinal, narrowestDataType) = relation.output.zipWithIndex.map { case (a, ordinal) => ordinal -> a.dataType } minBy { case (_, dataType) => ColumnType(dataType).defaultSize } Seq(narrowestOrdinal) -> Seq(narrowestDataType) } else { attributes.map { a => relation.output.indexWhere(_.exprId == a.exprId) -> a.dataType }.unzip } {code} It seems this is what leads to incorrect results. > Join with empty projection on one side produces invalid results > --- > > Key: SPARK-6743 > URL: https://issues.apache.org/jira/browse/SPARK-6743 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Santiago M. Mola >Priority: Critical > > {code:java} > val sqlContext = new SQLContext(sc) > val tab0 = sc.parallelize(Seq( > (83,0,38), > (26,0,79), > (43,81,24) > )) > sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), > "tab0") > sqlContext.cacheTable("tab0") > val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP > BY tab0._2, cor0._2") > val result1 = df1.collect() > val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY > cor0._2") > val result2 = df2.collect() > val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2") > val result3 = df3.collect() > {code} > Given the previous code, result2 equals to Row(43), Row(83), Row(26), which > is wrong. These results correspond to cor0._1, instead of cor0._2. Correct > results would be Row(0), Row(81), which are ok for the third query. The first > query also produces valid results, and the only difference is that the left > side of the join is not empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results
[ https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543466#comment-14543466 ] Santiago M. Mola commented on SPARK-6743: - Note that the bug is not related to GROUP BY, that's just a quick way to produce a Project logical plan with an empty projection list from SQL. Builing upon my previous test case, here are some further instances of the bug using logical plans and DataFrames: {code} import org.apache.spark.sql.catalyst.dsl.plans._ import org.apache.spark.sql.catalyst.dsl.expressions._ val plan0 = sqlc.table("tab0").logicalPlan.subquery('tab0) val plan1 = sqlc.table("tab1").logicalPlan.subquery('tab1) /* Succeeds */ val planA = plan0.select('_1 as "c0") .join(plan1.select('_1 as "c1")) .select('c0, 'c1) .orderBy('c0.asc, 'c1.asc) assertResult(Array(Row("A1", "B1"), Row("A1", "B2"), Row("A2", "B1"), Row("A2", "B2")))(DataFrame(sqlc, planA).collect()) /* Fails. Got: Array([A1], [A1], [A2], [A2]) */ val planB = plan0.select('_1 as "c0") .join(plan1.select('_1 as "c1")) .select('c1) .orderBy('c1.asc) assertResult(Array(Row("B1"), Row("B1"), Row("B2"), Row("B2")))(DataFrame(sqlc, planB).collect()) /* Fails. Got: Array([A1], [A1], [A2], [A2]) */ val planC = plan0.select() .join(plan1.select('_1 as "c1")) .select('c1) .orderBy('c1.asc) assertResult(Array(Row("B1"), Row("B1"), Row("B2"), Row("B2")))(DataFrame(sqlc, planC).collect()) {code} > Join with empty projection on one side produces invalid results > --- > > Key: SPARK-6743 > URL: https://issues.apache.org/jira/browse/SPARK-6743 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Santiago M. Mola >Priority: Critical > > {code:java} > val sqlContext = new SQLContext(sc) > val tab0 = sc.parallelize(Seq( > (83,0,38), > (26,0,79), > (43,81,24) > )) > sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), > "tab0") > sqlContext.cacheTable("tab0") > val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP > BY tab0._2, cor0._2") > val result1 = df1.collect() > val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY > cor0._2") > val result2 = df2.collect() > val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2") > val result3 = df3.collect() > {code} > Given the previous code, result2 equals to Row(43), Row(83), Row(26), which > is wrong. These results correspond to cor0._1, instead of cor0._2. Correct > results would be Row(0), Row(81), which are ok for the third query. The first > query also produces valid results, and the only difference is that the left > side of the join is not empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results
[ https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543435#comment-14543435 ] Santiago M. Mola commented on SPARK-6743: - Sorry, my first example was not very clear. Here is a more precise one: {code} val sqlc = new SQLContext(sc) val tab0 = sc.parallelize(Seq( Tuple1("A1"), Tuple1("A2") )) sqlc.registerDataFrameAsTable(sqlc.createDataFrame(tab0), "tab0") sqlc.cacheTable("tab0") val tab1 = sc.parallelize(Seq( Tuple1("B1"), Tuple1("B2") )) sqlc.registerDataFrameAsTable(sqlc.createDataFrame(tab1), "tab1") sqlc.cacheTable("tab1") /* Succeeds */ val result1 = sqlc.sql("SELECT tab0._1,tab1._1 FROM tab0, tab1 GROUP BY tab0._1,tab1._1 ORDER BY tab0._1, tab1._1").collect() assertResult(Array(Row("A1", "B1"), Row("A1", "B2"), Row("A2", "B1"), Row("A2", "B2")))(result1) /* Fails. Got: Array([A1], [A2]) */ val result2 = sqlc.sql("SELECT tab1._1 FROM tab0, tab1 GROUP BY tab1._1 ORDER BY tab1._1").collect() assertResult(Array(Row("B1"), Row("B2")))(result2) {code} > Join with empty projection on one side produces invalid results > --- > > Key: SPARK-6743 > URL: https://issues.apache.org/jira/browse/SPARK-6743 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Santiago M. Mola >Priority: Critical > > {code:java} > val sqlContext = new SQLContext(sc) > val tab0 = sc.parallelize(Seq( > (83,0,38), > (26,0,79), > (43,81,24) > )) > sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), > "tab0") > sqlContext.cacheTable("tab0") > val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP > BY tab0._2, cor0._2") > val result1 = df1.collect() > val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY > cor0._2") > val result2 = df2.collect() > val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2") > val result3 = df3.collect() > {code} > Given the previous code, result2 equals to Row(43), Row(83), Row(26), which > is wrong. These results correspond to cor0._1, instead of cor0._2. Correct > results would be Row(0), Row(81), which are ok for the third query. The first > query also produces valid results, and the only difference is that the left > side of the join is not empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser
[ https://issues.apache.org/jira/browse/SPARK-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543403#comment-14543403 ] Santiago M. Mola commented on SPARK-7012: - In Spark SQL, every expression can be nullable or not (i.e. values can be null or not). All Spark SQL and Catalyst internals support specifying this. See, for example, StructField, which is the relevant class for schemas: https://github.com/apache/spark/blob/2d6612cc8b98f767d73c4d15e4065bf3d6c12ea7/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala#L31 Or AttributeReference: https://github.com/apache/spark/blob/c1080b6fddb22d84694da2453e46a03fbc041576/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L166 However, when creating a temporary table through a SQL statement (CREATE TEMPORARY TABLE), there is no way of specifying if a column is nullable or not (it will be always nullable by default). Standard SQL supports a constraint called "NOT NULL" to specify that a column is not nullable. See: http://www.w3schools.com/sql/sql_notnull.asp In order to implement this, the parser for "CREATE TEMPORARY TABLE", that is, DDLParser, should be modifyed to allow "NOT NULL" and set nullable = false accordingly in StructField. See: https://github.com/apache/spark/blob/0595b6de8f1da04baceda082553c2aa1aa2cb006/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L176 > Add support for NOT NULL modifier for column definitions on DDLParser > - > > Key: SPARK-7012 > URL: https://issues.apache.org/jira/browse/SPARK-7012 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Santiago M. Mola >Priority: Minor > Labels: easyfix > > Add support for NOT NULL modifier for column definitions on DDLParser. This > would add support for the following syntax: > CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4758) Make metastore_db in-memory for HiveContext
[ https://issues.apache.org/jira/browse/SPARK-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541865#comment-14541865 ] Santiago M. Mola commented on SPARK-4758: - This could also make testing more convenient. Is there any progress on this? > Make metastore_db in-memory for HiveContext > --- > > Key: SPARK-4758 > URL: https://issues.apache.org/jira/browse/SPARK-4758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.2.0, 1.3.0 >Reporter: Jianshi Huang >Priority: Minor > > HiveContext by default will create a local folder metastore_db. > This is not very user friendly as the metastore_db will be locked by > HiveContext and thus will block multiple Spark process to start from the same > directory. > I would propose adding a default hive-site.xml in conf/ with the following > content. > > > javax.jdo.option.ConnectionURL > jdbc:derby:memory:databaseName=metastore_db;create=true > > > javax.jdo.option.ConnectionDriverName > org.apache.derby.jdbc.EmbeddedDriver > > > hive.metastore.warehouse.dir > file://${user.dir}/hive/warehouse > > > jdbc:derby:memory:databaseName=metastore_db;create=true Will make sure the > embedded derby database is created in-memory. > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7566) HiveContext.analyzer cannot be overriden
Santiago M. Mola created SPARK-7566: --- Summary: HiveContext.analyzer cannot be overriden Key: SPARK-7566 URL: https://issues.apache.org/jira/browse/SPARK-7566 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Santiago M. Mola Trying to override HiveContext.analyzer will give the following compilation error: {code} Error:(51, 36) overriding lazy value analyzer in class HiveContext of type org.apache.spark.sql.catalyst.analysis.Analyzer{val extendedResolutionRules: List[org.apache.spark.sql.catalyst.rules.Rule[org.apache.spark.sql.catalyst.plans.logical.LogicalPlan]]}; lazy value analyzer has incompatible type override protected[sql] lazy val analyzer: Analyzer = { ^ {code} That is because the type changed inadvertedly when omitting the type declaration of the return type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results
[ https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538737#comment-14538737 ] Santiago M. Mola commented on SPARK-6743: - Any thoughts on this? > Join with empty projection on one side produces invalid results > --- > > Key: SPARK-6743 > URL: https://issues.apache.org/jira/browse/SPARK-6743 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Santiago M. Mola >Priority: Critical > > {code:java} > val sqlContext = new SQLContext(sc) > val tab0 = sc.parallelize(Seq( > (83,0,38), > (26,0,79), > (43,81,24) > )) > sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), > "tab0") > sqlContext.cacheTable("tab0") > val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP > BY tab0._2, cor0._2") > val result1 = df1.collect() > val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY > cor0._2") > val result2 = df2.collect() > val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2") > val result3 = df3.collect() > {code} > Given the previous code, result2 equals to Row(43), Row(83), Row(26), which > is wrong. These results correspond to cor0._1, instead of cor0._2. Correct > results would be Row(0), Row(81), which are ok for the third query. The first > query also produces valid results, and the only difference is that the left > side of the join is not empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
[ https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538734#comment-14538734 ] Santiago M. Mola commented on SPARK-7088: - Any thoughts on this? > [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans > - > > Key: SPARK-7088 > URL: https://issues.apache.org/jira/browse/SPARK-7088 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola >Priority: Critical > Labels: regression > > We're using some custom logical plans. We are now migrating from Spark 1.3.0 > to 1.3.1 and found a few incompatible API changes. All of them seem to be in > internal code, so we understand that. But now the ResolveReferences rule, > that used to work with third-party logical plans just does not work, without > any possible workaround that I'm aware other than just copying > ResolveReferences rule and using it with our own fix. > The change in question is this section of code: > {code} > }.headOption.getOrElse { // Only handle first case, others will be > fixed on the next pass. > sys.error( > s""" > |Failure when resolving conflicting references in Join: > |$plan > | > |Conflicting attributes: ${conflictingAttributes.mkString(",")} > """.stripMargin) > } > {code} > Which causes the following error on analysis: > {code} > Failure when resolving conflicting references in Join: > 'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS > c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39] > 'Join Inner, None > Subquery l >Subquery h > Project [name#12,node#36] > CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 > Subquery v >Subquery h_src > LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at > mapPartitions at ExistingRDD.scala:37 > Subquery r >Subquery h > Project [name#40,node#36] > CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 > Subquery v >Subquery h_src > LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at > mapPartitions at ExistingRDD.scala:37 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7275) Make LogicalRelation public
[ https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532159#comment-14532159 ] Santiago M. Mola commented on SPARK-7275: - [~gweidner] I work on a project that extends Spark SQL with a richer data sources API. One of such extensions is the ability to push down a subtree of the logical plan in full to a data source. Data sources implementing this API must be able to inspect the LogicalPlan they're given, and that includes matching LogicalRelation. If a data source is in its own Java package (i.e. not org.apache.spark.sql) which is the usual case, it will not be able to match a LogicalRelation out of the box. Currently, I implemented a workaround by adding a public extractor IsLogicalRelation in org.apache.spark.sql package that proxies LogicalRelation to outsider packages... which is, of course, a ugly hack. Note that LogicalRelation is the only element of the logical plan which is not public. > Make LogicalRelation public > --- > > Key: SPARK-7275 > URL: https://issues.apache.org/jira/browse/SPARK-7275 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Santiago M. Mola >Priority: Minor > > It seems LogicalRelation is the only part of the LogicalPlan that is not > public. This makes it harder to work with full logical plans from third party > packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7275) Make LogicalRelation public
Santiago M. Mola created SPARK-7275: --- Summary: Make LogicalRelation public Key: SPARK-7275 URL: https://issues.apache.org/jira/browse/SPARK-7275 Project: Spark Issue Type: Improvement Components: SQL Reporter: Santiago M. Mola Priority: Minor It seems LogicalRelation is the only part of the LogicalPlan that is not public. This makes it harder to work with full logical plans from third party packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
[ https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-7088: Labels: regression (was: ) > [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans > - > > Key: SPARK-7088 > URL: https://issues.apache.org/jira/browse/SPARK-7088 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola >Priority: Critical > Labels: regression > > We're using some custom logical plans. We are now migrating from Spark 1.3.0 > to 1.3.1 and found a few incompatible API changes. All of them seem to be in > internal code, so we understand that. But now the ResolveReferences rule, > that used to work with third-party logical plans just does not work, without > any possible workaround that I'm aware other than just copying > ResolveReferences rule and using it with our own fix. > The change in question is this section of code: > {code} > }.headOption.getOrElse { // Only handle first case, others will be > fixed on the next pass. > sys.error( > s""" > |Failure when resolving conflicting references in Join: > |$plan > | > |Conflicting attributes: ${conflictingAttributes.mkString(",")} > """.stripMargin) > } > {code} > Which causes the following error on analysis: > {code} > Failure when resolving conflicting references in Join: > 'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS > c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39] > 'Join Inner, None > Subquery l >Subquery h > Project [name#12,node#36] > CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 > Subquery v >Subquery h_src > LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at > mapPartitions at ExistingRDD.scala:37 > Subquery r >Subquery h > Project [name#40,node#36] > CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 > Subquery v >Subquery h_src > LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at > mapPartitions at ExistingRDD.scala:37 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
[ https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-7088: Description: We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 1.3.1 and found a few incompatible API changes. All of them seem to be in internal code, so we understand that. But now the ResolveReferences rule, that used to work with third-party logical plans just does not work, without any possible workaround that I'm aware other than just copying ResolveReferences rule and using it with our own fix. The change in question is this section of code: {code} }.headOption.getOrElse { // Only handle first case, others will be fixed on the next pass. sys.error( s""" |Failure when resolving conflicting references in Join: |$plan | |Conflicting attributes: ${conflictingAttributes.mkString(",")} """.stripMargin) } {code} Which causes the following error on analysis: {code} Failure when resolving conflicting references in Join: 'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39] 'Join Inner, None Subquery l Subquery h Project [name#12,node#36] CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 Subquery v Subquery h_src LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 Subquery r Subquery h Project [name#40,node#36] CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 Subquery v Subquery h_src LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 {code} was: We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 1.3.1 and found a few incompatible API changes. All of them seem to be in internal code, so we understand that. But now the ResolveReferences rule, that used to work with third-party logical plans just does not work, without any possible workaround that I'm aware other than just copying ResolveReferences rule and using it with our own fix. The change in question is this section of code: {code} }.headOption.getOrElse { // Only handle first case, others will be fixed on the next pass. sys.error( s""" |Failure when resolving conflicting references in Join: |$plan | |Conflicting attributes: ${conflictingAttributes.mkString(",")} """.stripMargin) } {code} Which causes the following error on analysis: {code} Failure when resolving conflicting references in Join: 'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS c3#38,'IS_PARENT('r.node,'l.node) AS c4#39] 'Join Inner, None Subquery l Subquery h Project [name#12,node#36] CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 Subquery v Subquery h_src LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 Subquery r Subquery h Project [name#40,node#36] CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 Subquery v Subquery h_src LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 {code} > [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans > - > > Key: SPARK-7088 > URL: https://issues.apache.org/jira/browse/SPARK-7088 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola >Priority: Critical > > We're using some custom logical plans. We are now migrating from Spark 1.3.0 > to 1.3.1 and found a few incompatible API changes. All of them seem to be in > internal code, so we understand that. But now the ResolveReferences rule, > that used to work with third-party logical plans just does not work, without > any possible workaround that I'm aware other than just copying > ResolveReferences rule and using it with our own fix. > The change in question is this section of code: > {code} > }.headOption.getOrElse { // Only handle first case, others will be > fixed on the next pass. > sys.error( > s""" > |Failure when resolving conflicting references in Join: > |$plan > | > |Conflicting attributes: ${conflictingAttributes.mkString(",")} > """.stripMargin) > } > {code} > Which causes the following error on a
[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
[ https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-7088: Description: We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 1.3.1 and found a few incompatible API changes. All of them seem to be in internal code, so we understand that. But now the ResolveReferences rule, that used to work with third-party logical plans just does not work, without any possible workaround that I'm aware other than just copying ResolveReferences rule and using it with our own fix. The change in question is this section of code: {code} }.headOption.getOrElse { // Only handle first case, others will be fixed on the next pass. sys.error( s""" |Failure when resolving conflicting references in Join: |$plan | |Conflicting attributes: ${conflictingAttributes.mkString(",")} """.stripMargin) } {code} Which causes the following error on analysis: {code} Failure when resolving conflicting references in Join: 'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS c3#38,'IS_PARENT('r.node,'l.node) AS c4#39] 'Join Inner, None Subquery l Subquery h Project [name#12,node#36] CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 Subquery v Subquery h_src LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 Subquery r Subquery h Project [name#40,node#36] CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 Subquery v Subquery h_src LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 {code} was: We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 1.3.1 and found a few incompatible API changes. All of them seem to be in internal code, so we understand that. But now the ResolveReferences rule, that used to work with third-party logical plans just does not work, without any possible workaround that I'm aware other than just copying ResolveReferences rule and using it with our own fix. The change in question is this section of code: }.headOption.getOrElse { // Only handle first case, others will be fixed on the next pass. sys.error( s""" |Failure when resolving conflicting references in Join: |$plan | |Conflicting attributes: ${conflictingAttributes.mkString(",")} """.stripMargin) } Which causes the following error on analysis: Failure when resolving conflicting references in Join: 'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS c3#38,'IS_PARENT('r.node,'l.node) AS c4#39] 'Join Inner, None Subquery l Subquery h Project [name#12,node#36] CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 Subquery v Subquery h_src LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 Subquery r Subquery h Project [name#40,node#36] CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 Subquery v Subquery h_src LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 > [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans > - > > Key: SPARK-7088 > URL: https://issues.apache.org/jira/browse/SPARK-7088 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola >Priority: Critical > > We're using some custom logical plans. We are now migrating from Spark 1.3.0 > to 1.3.1 and found a few incompatible API changes. All of them seem to be in > internal code, so we understand that. But now the ResolveReferences rule, > that used to work with third-party logical plans just does not work, without > any possible workaround that I'm aware other than just copying > ResolveReferences rule and using it with our own fix. > The change in question is this section of code: > {code} > }.headOption.getOrElse { // Only handle first case, others will be > fixed on the next pass. > sys.error( > s""" > |Failure when resolving conflicting references in Join: > |$plan > | > |Conflicting attributes: ${conflictingAttributes.mkString(",")} > """.stripMargin) > } > {code} > Which causes the following error
[jira] [Created] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
Santiago M. Mola created SPARK-7088: --- Summary: [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans Key: SPARK-7088 URL: https://issues.apache.org/jira/browse/SPARK-7088 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Santiago M. Mola Priority: Critical We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 1.3.1 and found a few incompatible API changes. All of them seem to be in internal code, so we understand at. But now the ResolveReferences rule, that used to work with third-party logical plans just does not work, without any possible workaround that I'm aware other than just copying ResolveReferences rule and using it with our own fix. The change in question is this section of code: }.headOption.getOrElse { // Only handle first case, others will be fixed on the next pass. sys.error( s""" |Failure when resolving conflicting references in Join: |$plan | |Conflicting attributes: ${conflictingAttributes.mkString(",")} """.stripMargin) } Which causes the following error on analysis: Failure when resolving conflicting references in Join: 'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS c3#38,'IS_PARENT('r.node,'l.node) AS c4#39] 'Join Inner, None Subquery l Subquery h Project [name#12,node#36] CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 Subquery v Subquery h_src LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 Subquery r Subquery h Project [name#40,node#36] CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 Subquery v Subquery h_src LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
[ https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-7088: Description: We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 1.3.1 and found a few incompatible API changes. All of them seem to be in internal code, so we understand that. But now the ResolveReferences rule, that used to work with third-party logical plans just does not work, without any possible workaround that I'm aware other than just copying ResolveReferences rule and using it with our own fix. The change in question is this section of code: }.headOption.getOrElse { // Only handle first case, others will be fixed on the next pass. sys.error( s""" |Failure when resolving conflicting references in Join: |$plan | |Conflicting attributes: ${conflictingAttributes.mkString(",")} """.stripMargin) } Which causes the following error on analysis: Failure when resolving conflicting references in Join: 'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS c3#38,'IS_PARENT('r.node,'l.node) AS c4#39] 'Join Inner, None Subquery l Subquery h Project [name#12,node#36] CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 Subquery v Subquery h_src LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 Subquery r Subquery h Project [name#40,node#36] CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 Subquery v Subquery h_src LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 was: We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 1.3.1 and found a few incompatible API changes. All of them seem to be in internal code, so we understand at. But now the ResolveReferences rule, that used to work with third-party logical plans just does not work, without any possible workaround that I'm aware other than just copying ResolveReferences rule and using it with our own fix. The change in question is this section of code: }.headOption.getOrElse { // Only handle first case, others will be fixed on the next pass. sys.error( s""" |Failure when resolving conflicting references in Join: |$plan | |Conflicting attributes: ${conflictingAttributes.mkString(",")} """.stripMargin) } Which causes the following error on analysis: Failure when resolving conflicting references in Join: 'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS c3#38,'IS_PARENT('r.node,'l.node) AS c4#39] 'Join Inner, None Subquery l Subquery h Project [name#12,node#36] CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36 Subquery v Subquery h_src LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 Subquery r Subquery h Project [name#40,node#36] CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36 Subquery v Subquery h_src LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:37 > [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans > - > > Key: SPARK-7088 > URL: https://issues.apache.org/jira/browse/SPARK-7088 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 >Reporter: Santiago M. Mola >Priority: Critical > > We're using some custom logical plans. We are now migrating from Spark 1.3.0 > to 1.3.1 and found a few incompatible API changes. All of them seem to be in > internal code, so we understand that. But now the ResolveReferences rule, > that used to work with third-party logical plans just does not work, without > any possible workaround that I'm aware other than just copying > ResolveReferences rule and using it with our own fix. > The change in question is this section of code: > }.headOption.getOrElse { // Only handle first case, others will be > fixed on the next pass. > sys.error( > s""" > |Failure when resolving conflicting references in Join: > |$plan > | > |Conflicting attributes: ${conflictingAttributes.mkString(",")} > """.stripMargin) > } > Which causes the following error on analysis: > Failure when resolving conflic
[jira] [Created] (SPARK-7034) Support escaped double quotes on data source options
Santiago M. Mola created SPARK-7034: --- Summary: Support escaped double quotes on data source options Key: SPARK-7034 URL: https://issues.apache.org/jira/browse/SPARK-7034 Project: Spark Issue Type: Improvement Components: SQL Reporter: Santiago M. Mola Priority: Minor Currently, this is not supported: CREATE TEMPORARY TABLE t USING my.data.source OPTIONS ( myFancyOption "with \"escaped\" double quotes" ); it will produce a parsing error. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser
Santiago M. Mola created SPARK-7012: --- Summary: Add support for NOT NULL modifier for column definitions on DDLParser Key: SPARK-7012 URL: https://issues.apache.org/jira/browse/SPARK-7012 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Santiago M. Mola Priority: Minor Add support for NOT NULL modifier for column definitions on DDLParser. This would add support for the following syntax: CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6874) Add support for SQL:2003 array type declaration syntax
Santiago M. Mola created SPARK-6874: --- Summary: Add support for SQL:2003 array type declaration syntax Key: SPARK-6874 URL: https://issues.apache.org/jira/browse/SPARK-6874 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Santiago M. Mola Priority: Minor As of SQL:2003, arrays are standard SQL types, However, declaration syntax differs from Spark's CQL-like syntax. Examples of standard syntax: BIGINT ARRAY BIGINT ARRAY[100] BIGINT ARRAY[100] ARRAY[200] It would be great to have support standard syntax here. Some additional details that this addition should have IMO: - Forbit mixed syntax such as ARRAY ARRAY[100] - Ignore the maximum capacity (ARRAY[N]) but allow it to be specified. This seems to be what others (i.e. PostgreSQL) are doing. ARRAY ARRAY[100] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6863) Formatted list broken on Hive compatibility section of SQL programming guide
Santiago M. Mola created SPARK-6863: --- Summary: Formatted list broken on Hive compatibility section of SQL programming guide Key: SPARK-6863 URL: https://issues.apache.org/jira/browse/SPARK-6863 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.3.0 Reporter: Santiago M. Mola Priority: Trivial Formatted list broken on Hive compatibility section of SQL programming guide. It does not appear as a list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6744) Add support for CROSS JOIN syntax
Santiago M. Mola created SPARK-6744: --- Summary: Add support for CROSS JOIN syntax Key: SPARK-6744 URL: https://issues.apache.org/jira/browse/SPARK-6744 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Environment: Add support for the standard CROSS JOIN syntax. Reporter: Santiago M. Mola Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6743) Join with empty projection on one side produces invalid results
Santiago M. Mola created SPARK-6743: --- Summary: Join with empty projection on one side produces invalid results Key: SPARK-6743 URL: https://issues.apache.org/jira/browse/SPARK-6743 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Santiago M. Mola {code:java} val sqlContext = new SQLContext(sc) val tab0 = sc.parallelize(Seq( (83,0,38), (26,0,79), (43,81,24) )) sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), "tab0") sqlContext.cacheTable("tab0") val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP BY tab0._2, cor0._2") val result1 = df1.collect() val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY cor0._2") val result2 = df2.collect() val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2") val result3 = df3.collect() {code} Given the previous code, result2 equals to Row(43), Row(83), Row(26), which is wrong. These results correspond to cor0._1, instead of cor0._2. Correct results would be Row(0), Row(81), which are ok for the third query. The first query also produces valid results, and the only difference is that the left side of the join is not empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6743) Join with empty projection on one side produces invalid results
[ https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-6743: Priority: Critical (was: Major) > Join with empty projection on one side produces invalid results > --- > > Key: SPARK-6743 > URL: https://issues.apache.org/jira/browse/SPARK-6743 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Santiago M. Mola >Priority: Critical > > {code:java} > val sqlContext = new SQLContext(sc) > val tab0 = sc.parallelize(Seq( > (83,0,38), > (26,0,79), > (43,81,24) > )) > sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), > "tab0") > sqlContext.cacheTable("tab0") > val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP > BY tab0._2, cor0._2") > val result1 = df1.collect() > val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY > cor0._2") > val result2 = df2.collect() > val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2") > val result3 = df3.collect() > {code} > Given the previous code, result2 equals to Row(43), Row(83), Row(26), which > is wrong. These results correspond to cor0._1, instead of cor0._2. Correct > results would be Row(0), Row(81), which are ok for the third query. The first > query also produces valid results, and the only difference is that the left > side of the join is not empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6741) Add support for SELECT ALL syntax
Santiago M. Mola created SPARK-6741: --- Summary: Add support for SELECT ALL syntax Key: SPARK-6741 URL: https://issues.apache.org/jira/browse/SPARK-6741 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Santiago M. Mola Priority: Minor Support SELECT ALL syntax (equivalent to SELECT, without DISTINCT). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6740) SQL operator and condition precedence is not honoured
Santiago M. Mola created SPARK-6740: --- Summary: SQL operator and condition precedence is not honoured Key: SPARK-6740 URL: https://issues.apache.org/jira/browse/SPARK-6740 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Santiago M. Mola The following query from the SQL Logic Test suite fails to parse: SELECT DISTINCT * FROM t1 AS cor0 WHERE NOT ( - _2 + - 39 ) IS NULL while the following (equivalent) does parse correctly: SELECT DISTINCT * FROM t1 AS cor0 WHERE NOT (( - _2 + - 39 ) IS NULL) SQLite, MySQL and Oracle (and probably most SQL implementations) define IS with higher precedence than NOT, so the first query is valid and well-defined. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6611) Add support for INTEGER as synonym of INT to DDLParser
Santiago M. Mola created SPARK-6611: --- Summary: Add support for INTEGER as synonym of INT to DDLParser Key: SPARK-6611 URL: https://issues.apache.org/jira/browse/SPARK-6611 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Santiago M. Mola Priority: Minor Add support for INTEGER as synonym of INT to DDLParser. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
[ https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola resolved SPARK-6410. - Resolution: Not a Problem This has something to do with the incremental compiler. I got it working after running "sbt clean". > Build error on Windows: polymorphic expression cannot be instantiated to > expected type > -- > > Key: SPARK-6410 > URL: https://issues.apache.org/jira/browse/SPARK-6410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: >Reporter: Santiago M. Mola > Labels: build-failure > Attachments: output.log > > > $ bash build/sbt -Phadoop-2.3 assembly > [...] > [error] > C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: > polymorphic expression cannot be instantiated to expected type; > [error] found : [T(in method > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] > [error] required: > org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method > functionToUdfBuilder)] > [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, > T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) > [error] > ^ > [...] > $ uname -a > CYGWIN_NT-6.3 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin > $ java -version > java version "1.7.0_75" > Java(TM) SE Runtime Environment (build 1.7.0_75-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) > $ scala -version > Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL > $ build/zinc-0.3.5.3/bin/zinc -version > zinc (scala incremental compiler) 0.3.5.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
[ https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-6410: Description: $ bash build/sbt -Phadoop-2.3 assembly [...] [error] C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: polymorphic expression cannot be instantiated to expected type; [error] found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] [error] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) [error] ^ [...] $ uname -a CYGWIN_NT-6.3 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin $ java -version java version "1.7.0_75" Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) $ scala -version Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL $ build/zinc-0.3.5.3/bin/zinc -version zinc (scala incremental compiler) 0.3.5.3 was: $ bash build/sbt -Phadoop-2.3 assembly [...] [error] C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: polymorphic expression cannot be instantiated to expected type; [error] found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] [error] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) [error] ^ [...] $ uname -a CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin $ java -version java version "1.7.0_75" Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) $ scala -version Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL $ build/zinc-0.3.5.3/bin/zinc -version zinc (scala incremental compiler) 0.3.5.3 > Build error on Windows: polymorphic expression cannot be instantiated to > expected type > -- > > Key: SPARK-6410 > URL: https://issues.apache.org/jira/browse/SPARK-6410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: >Reporter: Santiago M. Mola > Labels: build-failure > Attachments: output.log > > > $ bash build/sbt -Phadoop-2.3 assembly > [...] > [error] > C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: > polymorphic expression cannot be instantiated to expected type; > [error] found : [T(in method > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] > [error] required: > org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method > functionToUdfBuilder)] > [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, > T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) > [error] > ^ > [...] > $ uname -a > CYGWIN_NT-6.3 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin > $ java -version > java version "1.7.0_75" > Java(TM) SE Runtime Environment (build 1.7.0_75-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) > $ scala -version > Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL > $ build/zinc-0.3.5.3/bin/zinc -version > zinc (scala incremental compiler) 0.3.5.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
[ https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-6410: Attachment: output.log Full error log. > Build error on Windows: polymorphic expression cannot be instantiated to > expected type > -- > > Key: SPARK-6410 > URL: https://issues.apache.org/jira/browse/SPARK-6410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: >Reporter: Santiago M. Mola > Labels: build-failure > Attachments: output.log > > > $ bash build/sbt -Phadoop-2.3 assembly > [...] > [error] > C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: > polymorphic expression cannot be instantiated to expected type; > [error] found : [T(in method > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] > [error] required: > org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method > functionToUdfBuilder)] > [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, > T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) > [error] > ^ > [...] > $ uname -a > CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin > $ java -version > java version "1.7.0_75" > Java(TM) SE Runtime Environment (build 1.7.0_75-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) > $ scala -version > Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL > $ build/zinc-0.3.5.3/bin/zinc -version > zinc (scala incremental compiler) 0.3.5.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
[ https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-6410: Comment: was deleted (was: Full error log.) > Build error on Windows: polymorphic expression cannot be instantiated to > expected type > -- > > Key: SPARK-6410 > URL: https://issues.apache.org/jira/browse/SPARK-6410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: >Reporter: Santiago M. Mola > Labels: build-failure > Attachments: output.log > > > $ bash build/sbt -Phadoop-2.3 assembly > [...] > [error] > C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: > polymorphic expression cannot be instantiated to expected type; > [error] found : [T(in method > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] > [error] required: > org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method > functionToUdfBuilder)] > [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, > T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) > [error] > ^ > [...] > $ uname -a > CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin > $ java -version > java version "1.7.0_75" > Java(TM) SE Runtime Environment (build 1.7.0_75-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) > $ scala -version > Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL > $ build/zinc-0.3.5.3/bin/zinc -version > zinc (scala incremental compiler) 0.3.5.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
[ https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-6410: Environment: was: $ uname -a CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin $ java -version java version "1.7.0_75" Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) $ scala -version Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL $ build/zinc-0.3.5.3/bin/zinc -version zinc (scala incremental compiler) 0.3.5.3 > Build error on Windows: polymorphic expression cannot be instantiated to > expected type > -- > > Key: SPARK-6410 > URL: https://issues.apache.org/jira/browse/SPARK-6410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: >Reporter: Santiago M. Mola > Labels: build-failure > > $ bash build/sbt -Phadoop-2.3 assembly > [...] > [error] > C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: > polymorphic expression cannot be instantiated to expected type; > [error] found : [T(in method > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] > [error] required: > org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method > functionToUdfBuilder)] > [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, > T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) > [error] > ^ > [...] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
[ https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-6410: Labels: build-failure (was: ) > Build error on Windows: polymorphic expression cannot be instantiated to > expected type > -- > > Key: SPARK-6410 > URL: https://issues.apache.org/jira/browse/SPARK-6410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: >Reporter: Santiago M. Mola > Labels: build-failure > > $ bash build/sbt -Phadoop-2.3 assembly > [...] > [error] > C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: > polymorphic expression cannot be instantiated to expected type; > [error] found : [T(in method > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] > [error] required: > org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method > functionToUdfBuilder)] > [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, > T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) > [error] > ^ > [...] > $ uname -a > CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin > $ java -version > java version "1.7.0_75" > Java(TM) SE Runtime Environment (build 1.7.0_75-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) > $ scala -version > Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL > $ build/zinc-0.3.5.3/bin/zinc -version > zinc (scala incremental compiler) 0.3.5.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
[ https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-6410: Component/s: SQL > Build error on Windows: polymorphic expression cannot be instantiated to > expected type > -- > > Key: SPARK-6410 > URL: https://issues.apache.org/jira/browse/SPARK-6410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: >Reporter: Santiago M. Mola > Labels: build-failure > > $ bash build/sbt -Phadoop-2.3 assembly > [...] > [error] > C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: > polymorphic expression cannot be instantiated to expected type; > [error] found : [T(in method > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] > [error] required: > org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method > functionToUdfBuilder)] > [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, > T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) > [error] > ^ > [...] > $ uname -a > CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin > $ java -version > java version "1.7.0_75" > Java(TM) SE Runtime Environment (build 1.7.0_75-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) > $ scala -version > Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL > $ build/zinc-0.3.5.3/bin/zinc -version > zinc (scala incremental compiler) 0.3.5.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
[ https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santiago M. Mola updated SPARK-6410: Description: $ bash build/sbt -Phadoop-2.3 assembly [...] [error] C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: polymorphic expression cannot be instantiated to expected type; [error] found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] [error] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) [error] ^ [...] $ uname -a CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin $ java -version java version "1.7.0_75" Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) $ scala -version Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL $ build/zinc-0.3.5.3/bin/zinc -version zinc (scala incremental compiler) 0.3.5.3 was: $ bash build/sbt -Phadoop-2.3 assembly [...] [error] C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: polymorphic expression cannot be instantiated to expected type; [error] found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] [error] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) [error] ^ [...] > Build error on Windows: polymorphic expression cannot be instantiated to > expected type > -- > > Key: SPARK-6410 > URL: https://issues.apache.org/jira/browse/SPARK-6410 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: >Reporter: Santiago M. Mola > Labels: build-failure > > $ bash build/sbt -Phadoop-2.3 assembly > [...] > [error] > C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: > polymorphic expression cannot be instantiated to expected type; > [error] found : [T(in method > apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] > [error] required: > org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method > functionToUdfBuilder)] > [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, > T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) > [error] > ^ > [...] > $ uname -a > CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin > $ java -version > java version "1.7.0_75" > Java(TM) SE Runtime Environment (build 1.7.0_75-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) > $ scala -version > Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL > $ build/zinc-0.3.5.3/bin/zinc -version > zinc (scala incremental compiler) 0.3.5.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type
Santiago M. Mola created SPARK-6410: --- Summary: Build error on Windows: polymorphic expression cannot be instantiated to expected type Key: SPARK-6410 URL: https://issues.apache.org/jira/browse/SPARK-6410 Project: Spark Issue Type: Bug Affects Versions: 1.4.0 Environment: $ uname -a CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin $ java -version java version "1.7.0_75" Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) $ scala -version Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL $ build/zinc-0.3.5.3/bin/zinc -version zinc (scala incremental compiler) 0.3.5.3 Reporter: Santiago M. Mola $ bash build/sbt -Phadoop-2.3 assembly [...] [error] C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314: polymorphic expression cannot be instantiated to expected type; [error] found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] [error] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] [error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) [error] ^ [...] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6320) Adding new query plan strategy to SQLContext
[ https://issues.apache.org/jira/browse/SPARK-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368683#comment-14368683 ] Santiago M. Mola commented on SPARK-6320: - [~marmbrus] We could change strategies so that they take a SparkPlanner in their constructor. This should provide enough flexibility for [~H.Youssef]]'s use case and might improve code organization of the core strategies in the future. > Adding new query plan strategy to SQLContext > > > Key: SPARK-6320 > URL: https://issues.apache.org/jira/browse/SPARK-6320 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Youssef Hatem >Priority: Minor > > Hi, > I would like to add a new strategy to {{SQLContext}}. To do this I created a > new class which extends {{Strategy}}. In my new class I need to call > {{planLater}} function. However this method is defined in {{SparkPlanner}} > (which itself inherits the method from {{QueryPlanner}}). > To my knowledge the only way to make {{planLater}} function visible to my new > strategy is to define my strategy inside another class that extends > {{SparkPlanner}} and inherits {{planLater}} as a result, by doing so I will > have to extend the {{SQLContext}} such that I can override the {{planner}} > field with the new {{Planner}} class I created. > It seems that this is a design problem because adding a new strategy seems to > require extending {{SQLContext}} (unless I am doing it wrong and there is a > better way to do it). > Thanks a lot, > Youssef -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6320) Adding new query plan strategy to SQLContext
[ https://issues.apache.org/jira/browse/SPARK-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368683#comment-14368683 ] Santiago M. Mola edited comment on SPARK-6320 at 3/19/15 8:12 AM: -- [~marmbrus] We could change strategies so that they take a SparkPlanner in their constructor. This should provide enough flexibility for [~H.Youssef]'s use case and might improve code organization of the core strategies in the future. was (Author: smolav): [~marmbrus] We could change strategies so that they take a SparkPlanner in their constructor. This should provide enough flexibility for [~H.Youssef]]'s use case and might improve code organization of the core strategies in the future. > Adding new query plan strategy to SQLContext > > > Key: SPARK-6320 > URL: https://issues.apache.org/jira/browse/SPARK-6320 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Youssef Hatem >Priority: Minor > > Hi, > I would like to add a new strategy to {{SQLContext}}. To do this I created a > new class which extends {{Strategy}}. In my new class I need to call > {{planLater}} function. However this method is defined in {{SparkPlanner}} > (which itself inherits the method from {{QueryPlanner}}). > To my knowledge the only way to make {{planLater}} function visible to my new > strategy is to define my strategy inside another class that extends > {{SparkPlanner}} and inherits {{planLater}} as a result, by doing so I will > have to extend the {{SQLContext}} such that I can override the {{planner}} > field with the new {{Planner}} class I created. > It seems that this is a design problem because adding a new strategy seems to > require extending {{SQLContext}} (unless I am doing it wrong and there is a > better way to do it). > Thanks a lot, > Youssef -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6397) Check the missingInput simply
[ https://issues.apache.org/jira/browse/SPARK-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367437#comment-14367437 ] Santiago M. Mola commented on SPARK-6397: - I think a proper title would be: Override QueryPlan.missingInput when necessary and rely on it CheckAnalysis. And description: Currently, some LogicalPlans do not override missingInput, but they should. Then, the lack of proper missingInput implementations leaks to CheckAnalysis. (I'm about to create a pull request that fixes this problem in some more places) > Check the missingInput simply > - > > Key: SPARK-6397 > URL: https://issues.apache.org/jira/browse/SPARK-6397 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yadong Qi >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4799) Spark should not rely on local host being resolvable on every node
Santiago M. Mola created SPARK-4799: --- Summary: Spark should not rely on local host being resolvable on every node Key: SPARK-4799 URL: https://issues.apache.org/jira/browse/SPARK-4799 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: Tested a Spark+Mesos cluster on top of Docker to reproduce the issue. Reporter: Santiago M. Mola Spark fails when a node hostname is not resolvable by other nodes. See an example trace: {code} 14/12/09 17:02:41 ERROR SendingConnection: Error connecting to 27e434cf36ac:35093 java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:127) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644) at org.apache.spark.network.SendingConnection.connect(Connection.scala:299) at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:278) at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) {code} The relevant code is here: https://github.com/apache/spark/blob/bcb5cdad614d4fce43725dfec3ce88172d2f8c11/core/src/main/scala/org/apache/spark/network/nio/ConnectionManager.scala#L170 {code} val id = new ConnectionManagerId(Utils.localHostName, serverChannel.socket.getLocalPort) {code} This piece of code should use the host IP with Utils.localIpAddress or a method that acknowleges user settings (e.g. SPARK_LOCAL_IP). Since I cannot think about a use case for using hostname here, I'm creating a PR with the former solution, but if you think the later is better, I'm willing to create a new PR with a more elaborate fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org