[jira] [Closed] (SPARK-32551) Ambiguous self join error in non self join with window
[ https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria closed SPARK-32551. - Closing as duplicate. > Ambiguous self join error in non self join with window > -- > > Key: SPARK-32551 > URL: https://issues.apache.org/jira/browse/SPARK-32551 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: kanika dhuria >Priority: Major > Fix For: 3.0.1 > > > Following code fails ambiguous self join analysis, even when it doesn't have > self join > val v1 = spark.range(3).toDF("m") > val v2 = spark.range(3).toDF("d") > val v3 = v1.join(v2, v1("m").===(v2("d"))) > val v4 = v3("d"); > val w1 = Window.partitionBy(v4) > val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b")) > org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's > probably because you joined several Datasets together, and some of these > Datasets are the same. This column points to one of the Datasets but Spark is > unable to figure out which one. Please alias the Datasets with different > names via `Dataset.as` before joining them, and specify the column using > qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You > can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable > this check.; > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32551) Ambiguous self join error in non self join with window
[ https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria resolved SPARK-32551. --- Fix Version/s: 3.0.1 Resolution: Fixed > Ambiguous self join error in non self join with window > -- > > Key: SPARK-32551 > URL: https://issues.apache.org/jira/browse/SPARK-32551 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: kanika dhuria >Priority: Major > Fix For: 3.0.1 > > > Following code fails ambiguous self join analysis, even when it doesn't have > self join > val v1 = spark.range(3).toDF("m") > val v2 = spark.range(3).toDF("d") > val v3 = v1.join(v2, v1("m").===(v2("d"))) > val v4 = v3("d"); > val w1 = Window.partitionBy(v4) > val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b")) > org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's > probably because you joined several Datasets together, and some of these > Datasets are the same. This column points to one of the Datasets but Spark is > unable to figure out which one. Please alias the Datasets with different > names via `Dataset.as` before joining them, and specify the column using > qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You > can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable > this check.; > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32551) Ambiguous self join error in non self join with window
[ https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172586#comment-17172586 ] kanika dhuria commented on SPARK-32551: --- Thanks [~cloud_fan], it is fixed in latest 3.0 branch. Fixed as part of https://issues.apache.org/jira/browse/SPARK-31956. > Ambiguous self join error in non self join with window > -- > > Key: SPARK-32551 > URL: https://issues.apache.org/jira/browse/SPARK-32551 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: kanika dhuria >Priority: Major > > Following code fails ambiguous self join analysis, even when it doesn't have > self join > val v1 = spark.range(3).toDF("m") > val v2 = spark.range(3).toDF("d") > val v3 = v1.join(v2, v1("m").===(v2("d"))) > val v4 = v3("d"); > val w1 = Window.partitionBy(v4) > val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b")) > org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's > probably because you joined several Datasets together, and some of these > Datasets are the same. This column points to one of the Datasets but Spark is > unable to figure out which one. Please alias the Datasets with different > names via `Dataset.as` before joining them, and specify the column using > qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You > can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable > this check.; > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32551) Ambiguous self join error in non self join with window
kanika dhuria created SPARK-32551: - Summary: Ambiguous self join error in non self join with window Key: SPARK-32551 URL: https://issues.apache.org/jira/browse/SPARK-32551 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: kanika dhuria Following code hits ambiguous join error even when it doesn't have self join val v1 = spark.range(3).toDF("m") val v2 = spark.range(3).toDF("d") val v3 = v1.join(v2, v1("m").===(v2("d"))) val v4 = v3("d"); val w1 = Window.partitionBy(v4) val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b")) org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. Please alias the Datasets with different names via `Dataset.as` before joining them, and specify the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.; -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32551) Ambiguous self join error in non self join with window
[ https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-32551: -- Description: Following code fails ambiguous self join analysis, even when it doesn't have self join val v1 = spark.range(3).toDF("m") val v2 = spark.range(3).toDF("d") val v3 = v1.join(v2, v1("m").===(v2("d"))) val v4 = v3("d"); val w1 = Window.partitionBy(v4) val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b")) org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. Please alias the Datasets with different names via `Dataset.as` before joining them, and specify the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.; was: Following code hits ambiguous join error even when it doesn't have self join val v1 = spark.range(3).toDF("m") val v2 = spark.range(3).toDF("d") val v3 = v1.join(v2, v1("m").===(v2("d"))) val v4 = v3("d"); val w1 = Window.partitionBy(v4) val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b")) org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. Please alias the Datasets with different names via `Dataset.as` before joining them, and specify the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.; > Ambiguous self join error in non self join with window > -- > > Key: SPARK-32551 > URL: https://issues.apache.org/jira/browse/SPARK-32551 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: kanika dhuria >Priority: Major > > Following code fails ambiguous self join analysis, even when it doesn't have > self join > val v1 = spark.range(3).toDF("m") > val v2 = spark.range(3).toDF("d") > val v3 = v1.join(v2, v1("m").===(v2("d"))) > val v4 = v3("d"); > val w1 = Window.partitionBy(v4) > val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b")) > org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's > probably because you joined several Datasets together, and some of these > Datasets are the same. This column points to one of the Datasets but Spark is > unable to figure out which one. Please alias the Datasets with different > names via `Dataset.as` before joining them, and specify the column using > qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You > can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable > this check.; > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-22207) High memory usage when converting relational data to Hierarchical data
[ https://issues.apache.org/jira/browse/SPARK-22207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria reopened SPARK-22207: --- Same issue is seen in spark 2.4 > High memory usage when converting relational data to Hierarchical data > -- > > Key: SPARK-22207 > URL: https://issues.apache.org/jira/browse/SPARK-22207 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: kanika dhuria >Priority: Major > Labels: bulk-closed > > Have 4 tables > lineitems ~1.4Gb, > orders ~ 330MB > customer ~47MB > nations ~ 2.2K > These tables are related as follows > There are multiple lineitems per order (pk, fk:orderkey) > There are multiple orders per customer(pk,fk: cust_key) > There are multiple customers per nation(pk, fk:nation key) > Data is almost evenly distributed. > Building hierarchy till 3 levels i.e joining lineitems, orders, customers > works good with executor memory 4Gb/2cores > Adding nations require 8GB/2 cores or 4GB/1 core memory. > == > {noformat} > val sqlContext = SparkSession.builder() .enableHiveSupport() > .config("spark.sql.retainGroupColumns", false) > .config("spark.sql.crossJoin.enabled", true) .getOrCreate() > > val orders = sqlContext.sql("select * from orders") > val lineItem = sqlContext.sql("select * from lineitems") > > val customer = sqlContext.sql("select * from customers") > > val nation = sqlContext.sql("select * from nations") > > val lineitemOrders = > lineItem.groupBy(col("l_orderkey")).agg(col("l_orderkey"), > collect_list(struct(col("l_partkey"), > col("l_suppkey"),col("l_linenumber"),col("l_quantity"),col("l_extendedprice"),col("l_discount"),col("l_tax"),col("l_returnflag"),col("l_linestatus"),col("l_shipdate"),col("l_commitdate"),col("l_receiptdate"),col("l_shipinstruct"),col("l_shipmode"))).as("lineitem")).join(orders, > orders("O_ORDERKEY")=== lineItem("l_orderkey")).select(col("O_ORDERKEY"), > col("O_CUSTKEY"), col("O_ORDERSTATUS"), col("O_TOTALPRICE"), > col("O_ORDERDATE"), col("O_ORDERPRIORITY"), col("O_CLERK"), > col("O_SHIPPRIORITY"), col("O_COMMENT"), col("lineitem")) > > val customerList = > lineitemOrders.groupBy(col("o_custkey")).agg(col("o_custkey"),collect_list(struct(col("O_ORDERKEY"), > col("O_CUSTKEY"), col("O_ORDERSTATUS"), col("O_TOTALPRICE"), > col("O_ORDERDATE"), col("O_ORDERPRIORITY"), col("O_CLERK"), > col("O_SHIPPRIORITY"), > col("O_COMMENT"),col("lineitem"))).as("items")).join(customer,customer("c_custkey")=== > > lineitemOrders("o_custkey")).select(col("c_custkey"),col("c_name"),col("c_nationkey"),col("items")) > val nationList = > customerList.groupBy(col("c_nationkey")).agg(col("c_nationkey"),collect_list(struct(col("c_custkey"),col("c_name"),col("c_nationkey"),col("items"))).as("custList")).join(nation,nation("n_nationkey")===customerList("c_nationkey")).select(col("n_nationkey"),col("n_name"),col("custList")) > > nationList.write.mode("overwrite").json("filePath") > {noformat} > > If the customeList is saved in a file and then the last agg/join is run > separately, it does run fine in 4GB/2 core . > I can provide the data if needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22207) High memory usage when converting relational data to Hierarchical data
kanika dhuria created SPARK-22207: - Summary: High memory usage when converting relational data to Hierarchical data Key: SPARK-22207 URL: https://issues.apache.org/jira/browse/SPARK-22207 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: kanika dhuria Have 4 tables lineitems ~1.4Gb, orders ~ 330MB customer ~47MB nations ~ 2.2K These tables are related as follows There are multiple lineitems per order (pk, fk:orderkey) There are multiple orders per customer(pk,fk: cust_key) There are multiple customers per nation(pk, fk:nation key) Data is almost evenly distributed. Building hierarchy till 3 levels i.e joining lineitems, orders, customers works good with executor memory 4Gb/2cores Adding nations require 8GB/2 cores or 4GB/1 core memory. == {noformat} val sqlContext = SparkSession.builder() .enableHiveSupport() .config("spark.sql.retainGroupColumns", false) .config("spark.sql.crossJoin.enabled", true) .getOrCreate() val orders = sqlContext.sql("select * from orders") val lineItem = sqlContext.sql("select * from lineitems") val customer = sqlContext.sql("select * from customers") val nation = sqlContext.sql("select * from nations") val lineitemOrders = lineItem.groupBy(col("l_orderkey")).agg(col("l_orderkey"), collect_list(struct(col("l_partkey"), col("l_suppkey"),col("l_linenumber"),col("l_quantity"),col("l_extendedprice"),col("l_discount"),col("l_tax"),col("l_returnflag"),col("l_linestatus"),col("l_shipdate"),col("l_commitdate"),col("l_receiptdate"),col("l_shipinstruct"),col("l_shipmode"))).as("lineitem")).join(orders, orders("O_ORDERKEY")=== lineItem("l_orderkey")).select(col("O_ORDERKEY"), col("O_CUSTKEY"), col("O_ORDERSTATUS"), col("O_TOTALPRICE"), col("O_ORDERDATE"), col("O_ORDERPRIORITY"), col("O_CLERK"), col("O_SHIPPRIORITY"), col("O_COMMENT"), col("lineitem")) val customerList = lineitemOrders.groupBy(col("o_custkey")).agg(col("o_custkey"),collect_list(struct(col("O_ORDERKEY"), col("O_CUSTKEY"), col("O_ORDERSTATUS"), col("O_TOTALPRICE"), col("O_ORDERDATE"), col("O_ORDERPRIORITY"), col("O_CLERK"), col("O_SHIPPRIORITY"), col("O_COMMENT"),col("lineitem"))).as("items")).join(customer,customer("c_custkey")=== lineitemOrders("o_custkey")).select(col("c_custkey"),col("c_name"),col("c_nationkey"),col("items")) val nationList = customerList.groupBy(col("c_nationkey")).agg(col("c_nationkey"),collect_list(struct(col("c_custkey"),col("c_name"),col("c_nationkey"),col("items"))).as("custList")).join(nation,nation("n_nationkey")===customerList("c_nationkey")).select(col("n_nationkey"),col("n_name"),col("custList")) nationList.write.mode("overwrite").json("filePath") {noformat} If the customeList is saved in a file and then the last agg/join is run separately, it does run fine in 4GB/2 core . I can provide the data if needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22205) Incorrect result with user defined agg function followed by a non deterministic function
kanika dhuria created SPARK-22205: - Summary: Incorrect result with user defined agg function followed by a non deterministic function Key: SPARK-22205 URL: https://issues.apache.org/jira/browse/SPARK-22205 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: kanika dhuria Repro Create a user defined function like lass AnyUdaf(dtype:DataType) extends UserDefinedAggregateFunction { def inputSchema:StructType = StructType(StructField("v", dtype) :: Nil) def bufferSchema:StructType = StructType(StructField("v", dtype) :: Nil) def dataType: DataType = dtype def deterministic: Boolean = true def initialize(buffer: MutableAggregationBuffer): Unit = { buffer(0) = null } def update(buffer: MutableAggregationBuffer,input: Row): Unit = { if (buffer(0) == null) buffer(0) = input(0) } def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = { if(buffer1(0) == null) buffer1(0) = buffer2(0) } def evaluate(buffer: Row): Any = { buffer(0) } } Use this in an agg and follow it with non deterministic function like monotonically_increasing_id. Seq(0,1).toDF("c1").select(col("c1"), lit(10)).toDF("c1", "c2").select(col("c1"), col("c2")).toDF("c1", "c2").groupBy(col("c1")).agg(new AnyUdaf()(col("c2"))).toDF("c1", "c2").select(lit(5), col("c2"), monotonically_increasing_id).show +---+---+-+ | 5| c2|monotonically_increasing_id()| +---+---+-+ | 5|10|0| | 5|10|0| +---+---+-+ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cata
[ https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982700#comment-15982700 ] kanika dhuria commented on SPARK-17922: --- Hi , I have attached the repro case for this issue. The zip has ReadMe with details of configuration steps that are required. Can somebody please use that and review the change requested? > ClassCastException java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator > cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection > - > > Key: SPARK-17922 > URL: https://issues.apache.org/jira/browse/SPARK-17922 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > Attachments: spark_17922.tar.gz > > > I am using spark 2.0 > Seeing class loading issue because the whole stage code gen is generating > multiple classes with same name as > "org.apache.spark.sql.catalyst.expressions.GeneratedClass" > I am using dataframe transform. and within transform i use Osgi. > Osgi replaces the thread context class loader to ContextFinder which looks at > all the class loaders in the stack to find out the new generated class and > finds the GeneratedClass with inner class GeneratedIterator byteclass > loader(instead of falling back to the byte class loader created by janino > compiler), since the class name is same that byte class loader loads the > class and returns GeneratedClass$GeneratedIterator instead of expected > GeneratedClass$UnsafeProjection. > Can we generate different classes with different names or is it expected to > generate one class only? > This is the somewhat I am trying to do > {noformat} > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > import com.databricks.spark.avro._ > def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { > //Initialize osgi > (rows:Iterator[Row]) => { > var outi = Iterator[Row]() > while(rows.hasNext) { > val r = rows.next > outi = outi.++(Iterator(Row(r.get(0 > } > //val ors = Row("abc") > //outi =outi.++( Iterator(ors)) > outi > } > } > def transform1( outType:StructType) :((DataFrame) => DataFrame) = { > (d:DataFrame) => { > val inType = d.schema > val rdd = d.rdd.mapPartitions(exePart(outType)) > d.sqlContext.createDataFrame(rdd, outType) > } > > } > val df = spark.read.avro("file:///data/builds/a1.avro") > val df1 = df.select($"id2").filter(false) > val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, > true)::Nil))).createOrReplaceTempView("tbl0") > spark.sql("insert overwrite table testtable select p1 from tbl0") > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly
[ https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17922: -- Attachment: spark_17922.tar.gz Repro case > ClassCastException java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator > cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection > - > > Key: SPARK-17922 > URL: https://issues.apache.org/jira/browse/SPARK-17922 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > Attachments: spark_17922.tar.gz > > > I am using spark 2.0 > Seeing class loading issue because the whole stage code gen is generating > multiple classes with same name as > "org.apache.spark.sql.catalyst.expressions.GeneratedClass" > I am using dataframe transform. and within transform i use Osgi. > Osgi replaces the thread context class loader to ContextFinder which looks at > all the class loaders in the stack to find out the new generated class and > finds the GeneratedClass with inner class GeneratedIterator byteclass > loader(instead of falling back to the byte class loader created by janino > compiler), since the class name is same that byte class loader loads the > class and returns GeneratedClass$GeneratedIterator instead of expected > GeneratedClass$UnsafeProjection. > Can we generate different classes with different names or is it expected to > generate one class only? > This is the somewhat I am trying to do > {noformat} > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > import com.databricks.spark.avro._ > def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { > //Initialize osgi > (rows:Iterator[Row]) => { > var outi = Iterator[Row]() > while(rows.hasNext) { > val r = rows.next > outi = outi.++(Iterator(Row(r.get(0 > } > //val ors = Row("abc") > //outi =outi.++( Iterator(ors)) > outi > } > } > def transform1( outType:StructType) :((DataFrame) => DataFrame) = { > (d:DataFrame) => { > val inType = d.schema > val rdd = d.rdd.mapPartitions(exePart(outType)) > d.sqlContext.createDataFrame(rdd, outType) > } > > } > val df = spark.read.avro("file:///data/builds/a1.avro") > val df1 = df.select($"id2").filter(false) > val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, > true)::Nil))).createOrReplaceTempView("tbl0") > spark.sql("insert overwrite table testtable select p1 from tbl0") > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly
[ https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17922: -- Description: I am using spark 2.0 Seeing class loading issue because the whole stage code gen is generating multiple classes with same name as "org.apache.spark.sql.catalyst.expressions.GeneratedClass" I am using dataframe transform. and within transform i use Osgi. Osgi replaces the thread context class loader to ContextFinder which looks at all the class loaders in the stack to find out the new generated class and finds the GeneratedClass with inner class GeneratedIterator byteclass loader(instead of falling back to the byte class loader created by janino compiler), since the class name is same that byte class loader loads the class and returns GeneratedClass$GeneratedIterator instead of expected GeneratedClass$UnsafeProjection. Can we generate different classes with different names or is it expected to generate one class only? This is the somewhat I am trying to do {noformat} import org.apache.spark.sql._ import org.apache.spark.sql.types._ import com.databricks.spark.avro._ def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { //Initialize osgi (rows:Iterator[Row]) => { var outi = Iterator[Row]() while(rows.hasNext) { val r = rows.next outi = outi.++(Iterator(Row(r.get(0 } //val ors = Row("abc") //outi =outi.++( Iterator(ors)) outi } } def transform1( outType:StructType) :((DataFrame) => DataFrame) = { (d:DataFrame) => { val inType = d.schema val rdd = d.rdd.mapPartitions(exePart(outType)) d.sqlContext.createDataFrame(rdd, outType) } } val df = spark.read.avro("file:///data/builds/a1.avro") val df1 = df.select($"id2").filter(false) val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, true)::Nil))).createOrReplaceTempView("tbl0") spark.sql("insert overwrite table testtable select p1 from tbl0") {noformat} was: I am using spark 2.0 Seeing class loading issue because the whole stage code gen is generating multiple classes with same name as "org.apache.spark.sql.catalyst.expressions.GeneratedClass" I am using dataframe transform. and within transform i use Osgi. Osgi replaces the thread context class loader to ContextFinder which looks at all the class loaders in the stack to find out the new generated class and finds the GeneratedClass with inner class GeneratedIterator byteclass loader(instead of falling back to the byte class loader created by janino compiler), since the class name is same that byte class loader loads the class and returns GeneratedClass$GeneratedIterator instead of expected GeneratedClass$UnsafeProjection. Can we generate different classes with different names or is it expected to generate one class only? This is the somewhat I am trying to do import org.apache.spark.sql._ import org.apache.spark.sql.types._ import com.databricks.spark.avro._ def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { //Initialize osgi (rows:Iterator[Row]) => { var outi = Iterator[Row]() while(rows.hasNext) { val r = rows.next outi = outi.++(Iterator(Row(r.get(0 } //val ors = Row("abc") //outi =outi.++( Iterator(ors)) outi } } def transform1( outType:StructType) :((DataFrame) => DataFrame) = { (d:DataFrame) => { val inType = d.schema val rdd = d.rdd.mapPartitions(exePart(outType)) d.sqlContext.createDataFrame(rdd, outType) } } val df = spark.read.avro("file:///data/builds/a1.avro") val df1 = df.select($"id2").filter(false) val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, true)::Nil))).createOrReplaceTempView("tbl0") spark.sql("insert overwrite table testtable select p1 from tbl0") > ClassCastException java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator > cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection > - > > Key: SPARK-17922 > URL: https://issues.apache.org/jira/browse/SPARK-17922 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > I am using spark 2.0 > Seeing class loading issue because the whole stage code gen is generating > multiple classes with same name as > "org.apache.spark.sql.catalyst.expressions.GeneratedClass" > I am usi
[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly
[ https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17922: -- Description: I am using spark 2.0 Seeing class loading issue because the whole stage code gen is generating multiple classes with same name as "org.apache.spark.sql.catalyst.expressions.GeneratedClass" I am using dataframe transform. and within transform i use Osgi. Osgi replaces the thread context class loader to ContextFinder which looks at all the class loaders in the stack to find out the new generated class and finds the GeneratedClass with inner class GeneratedIterator byteclass loader(instead of falling back to the byte class loader created by janino compiler), since the class name is same that byte class loader loads the class and returns GeneratedClass$GeneratedIterator instead of expected GeneratedClass$UnsafeProjection. Can we generate different classes with different names or is it expected to generate one class only? This is the somewhat I am trying to do import org.apache.spark.sql._ import org.apache.spark.sql.types._ import com.databricks.spark.avro._ def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { //Initialize osgi (rows:Iterator[Row]) => { var outi = Iterator[Row]() while(rows.hasNext) { val r = rows.next outi = outi.++(Iterator(Row(r.get(0 } //val ors = Row("abc") //outi =outi.++( Iterator(ors)) outi } } def transform1( outType:StructType) :((DataFrame) => DataFrame) = { (d:DataFrame) => { val inType = d.schema val rdd = d.rdd.mapPartitions(exePart(outType)) d.sqlContext.createDataFrame(rdd, outType) } } val df = spark.read.avro("file:///data/builds/a1.avro") val df1 = df.select($"id2").filter(false) val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, true)::Nil))).createOrReplaceTempView("tbl0") spark.sql("insert overwrite table testtable select p1 from tbl0") was: I am using spark 2.0 Seeing class loading issue because the whole stage code gen is generating multiple classes with same name as "org.apache.spark.sql.catalyst.expressions.GeneratedClass" I am using dataframe transform. and within transform i use Osgi. Osgi replaces the thread context class loader to ContextFinder which looks at all the class loaders in the stack to find out the new generated class and finds the GeneratedClass with inner class GeneratedIterator byteclass loader(instead of falling back to the byte class loader created by janino compiler), since the class name is same that byte class loader loads the class and returns GeneratedClass$GeneratedIterator instead of expected GeneratedClass$UnsafeProjection. Can we generate different classes with different names or is it expected to generate one class only. This is the rough repro import org.apache.spark.sql._ import org.apache.spark.sql.types._ import com.databricks.spark.avro._ def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { //Initialize osgi (rows:Iterator[Row]) => { var outi = Iterator[Row]() while(rows.hasNext) { val r = rows.next outi = outi.++(Iterator(Row(r.get(0 } //val ors = Row("abc") //outi =outi.++( Iterator(ors)) outi } } def transform1( outType:StructType) :((DataFrame) => DataFrame) = { (d:DataFrame) => { val inType = d.schema val rdd = d.rdd.mapPartitions(exePart(outType)) d.sqlContext.createDataFrame(rdd, outType) } } val df = spark.read.avro("file:///data/builds/a1.avro") val df1 = df.select($"id2").filter(false) val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, true)::Nil))).createOrReplaceTempView("tbl0") spark.sql("insert overwrite table testtable select p1 from tbl0") > ClassCastException java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator > cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection > - > > Key: SPARK-17922 > URL: https://issues.apache.org/jira/browse/SPARK-17922 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > I am using spark 2.0 > Seeing class loading issue because the whole stage code gen is generating > multiple classes with same name as > "org.apache.spark.sql.catalyst.expressions.GeneratedClass" > I am using dataframe transform. and within tran
[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly
[ https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17922: -- Description: I am using spark 2.0 Seeing class loading issue because the whole stage code gen is generating multiple classes with same name as "org.apache.spark.sql.catalyst.expressions.GeneratedClass" I am using dataframe transform. and within transform i use Osgi. Osgi replaces the thread context class loader to ContextFinder which looks at all the class loaders in the stack to find out the new generated class and finds the GeneratedClass with inner class GeneratedIterator byteclass loader(instead of falling back to the byte class loader created by janino compiler), since the class name is same that byte class loader loads the class and returns GeneratedClass$GeneratedIterator instead of expected GeneratedClass$UnsafeProjection. Can we generate different classes with different names or is it expected to generate one class only. This is the rough repro import org.apache.spark.sql._ import org.apache.spark.sql.types._ import com.databricks.spark.avro._ def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { //Initialize osgi (rows:Iterator[Row]) => { var outi = Iterator[Row]() while(rows.hasNext) { val r = rows.next outi = outi.++(Iterator(Row(r.get(0 } //val ors = Row("abc") //outi =outi.++( Iterator(ors)) outi } } def transform1( outType:StructType) :((DataFrame) => DataFrame) = { (d:DataFrame) => { val inType = d.schema val rdd = d.rdd.mapPartitions(exePart(outType)) d.sqlContext.createDataFrame(rdd, outType) } } val df = spark.read.avro("file:///data/builds/a1.avro") val df1 = df.select($"id2").filter(false) val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, true)::Nil))).createOrReplaceTempView("tbl0") spark.sql("insert overwrite table testtable select p1 from tbl0") was: I am using spark 2.0 Seeing class loading issue because the whole stage code gen is generating multiple classes with same name as "org.apache.spark.sql.catalyst.expressions.GeneratedClass" I am using dataframe transform. and within transform i use Osgi. Osgi replaces the thread context class loader to ContextFinder which looks at all the class loaders in the stack to find out the new generated class and finds the GeneratedClass with inner class GeneratedIterator byteclass loader(instead of falling back to the byte class loader created by janino compiler), since the class name is same that byte class loader loads the class and returns GeneratedClass$GeneratedIterator instead of expected GeneratedClass$UnsafeProjection. > ClassCastException java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator > cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection > - > > Key: SPARK-17922 > URL: https://issues.apache.org/jira/browse/SPARK-17922 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > I am using spark 2.0 > Seeing class loading issue because the whole stage code gen is generating > multiple classes with same name as > "org.apache.spark.sql.catalyst.expressions.GeneratedClass" > I am using dataframe transform. and within transform i use Osgi. > Osgi replaces the thread context class loader to ContextFinder which looks at > all the class loaders in the stack to find out the new generated class and > finds the GeneratedClass with inner class GeneratedIterator byteclass > loader(instead of falling back to the byte class loader created by janino > compiler), since the class name is same that byte class loader loads the > class and returns GeneratedClass$GeneratedIterator instead of expected > GeneratedClass$UnsafeProjection. > Can we generate different classes with different names or is it expected to > generate one class only. > This is the rough repro > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > import com.databricks.spark.avro._ > def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { > //Initialize osgi > (rows:Iterator[Row]) => { > var outi = Iterator[Row]() > while(rows.hasNext) { > val r = rows.next > outi = outi.++(Iterator(Row(r.get(0 > } > //val ors = Row("abc") > //outi =outi.++( Iterator(ors)) > outi > } >
[jira] [Created] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly
kanika dhuria created SPARK-17922: - Summary: ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection Key: SPARK-17922 URL: https://issues.apache.org/jira/browse/SPARK-17922 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: kanika dhuria I am using spark 2.0 Seeing class loading issue because the whole stage code gen is generating multiple classes with same name as "org.apache.spark.sql.catalyst.expressions.GeneratedClass" I am using dataframe transform. and within transform i use Osgi. Osgi replaces the thread context class loader to ContextFinder which looks at all the class loaders in the stack to find out the new generated class and finds the GeneratedClass with inner class GeneratedIterator byteclass loader(instead of falling back to the byte class loader created by janino compiler), since the class name is same that byte class loader loads the class and returns GeneratedClass$GeneratedIterator instead of expected GeneratedClass$UnsafeProjection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17753) Simple case in spark sql throws ParseException
[ https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539376#comment-15539376 ] kanika dhuria commented on SPARK-17753: --- Yeah, Thanks! > Simple case in spark sql throws ParseException > -- > > Key: SPARK-17753 > URL: https://issues.apache.org/jira/browse/SPARK-17753 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > Simple case in sql throws parser exception in spark 2.0. > The following query as well as similar queries fail in spark 2.0 > {noformat} > scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 > FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR > (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE > CAST(NULL AS INT) END))") > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', > 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', > 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60) > == SQL == > SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM > hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 > LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL > AS INT) END)) > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) > ... 48 elided > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17753) Simple case in spark sql throws ParseException
[ https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539358#comment-15539358 ] kanika dhuria commented on SPARK-17753: --- Hi Herman, I have tried with symbolic also, it doesn't work. Even this query fails with same error. SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END)) Any boolean condition after case doesn't work. > Simple case in spark sql throws ParseException > -- > > Key: SPARK-17753 > URL: https://issues.apache.org/jira/browse/SPARK-17753 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > Simple case in sql throws parser exception in spark 2.0. > The following query as well as similar queries fail in spark 2.0 > {noformat} > scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 > FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR > (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE > CAST(NULL AS INT) END))") > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', > 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', > 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60) > == SQL == > SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM > hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 > LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL > AS INT) END)) > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) > ... 48 elided > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17753) Simple case in spark sql throws ParseException
[ https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17753: -- Description: Simple case in sql throws parser exception in spark 2.0. The following query as well as similar queries fail in spark 2.0 scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END))") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60) == SQL == SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END)) ^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) ... 48 elided was: Simple case in sql throws parser exception in spark 2.0. The following query fails in spark 2.0 scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END))") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60) == SQL == SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END)) ^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) ... 48 elided > Simple case in spark sql throws ParseException > -- > > Key: SPARK-17753 > URL: https://issues.apache.org/jira/browse/SPARK-17753 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > Simple case in sql throws parser exception in spark 2.0. > The following query as well as similar queries fail in spark 2.0 > scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 > FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR > (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE > CAST(NULL AS INT) END))") > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', > 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', > 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60) > == SQL == > SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM > hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 > LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL > AS INT) END)) > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) > ... 48 elided -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17753) Simple case in spark sql throws ParseException
[ https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17753: -- Description: Simple case in sql throws parser exception in spark 2.0. The following query fails in spark 2.0 scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE CASE 'ab' = alias.p_text WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 111) == SQL == SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE CASE 'ab' EQ alias.p_text WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END ---^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) ... 48 elided was: Simple case in sql throws parser exception in spark 2.0. The following query fails in spark 2.0 spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 <= LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END))") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 111) == SQL == SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE CASE 'ab' EQ alias.p_text WHEN TRUE=TRUE THEN 1 WHEN TRUE=FALSE THEN 0 ELSE CAST(NULL AS INT) END ---^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) ... 48 elided > Simple case in spark sql throws ParseException > -- > > Key: SPARK-17753 > URL: https://issues.apache.org/jira/browse/SPARK-17753 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > Simple case in sql throws parser exception in spark 2.0. > The following query fails in spark 2.0 > scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 > FROM hadoop_tbl_all alias WHERE CASE 'ab' = alias.p_text WHEN TRUE > THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END") > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', > 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', > 'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, > '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', > 'DISTRIBUTE'}(line 1, pos 111) > == SQL == > SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM > hadoop_tbl_all alias WHERE CASE 'ab' EQ alias.p_text WHEN TRUE THEN > 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END > ---^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.sc
[jira] [Updated] (SPARK-17753) Simple case in spark sql throws ParseException
[ https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17753: -- Description: Simple case in sql throws parser exception in spark 2.0. The following query fails in spark 2.0 scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END))") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60) == SQL == SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END)) ^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) ... 48 elided was: Simple case in sql throws parser exception in spark 2.0. The following query fails in spark 2.0 scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE CASE 'ab' = alias.p_text WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 111) == SQL == SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE CASE 'ab' EQ alias.p_text WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END ---^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) ... 48 elided > Simple case in spark sql throws ParseException > -- > > Key: SPARK-17753 > URL: https://issues.apache.org/jira/browse/SPARK-17753 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > Simple case in sql throws parser exception in spark 2.0. > The following query fails in spark 2.0 > scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 > FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR > (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE > CAST(NULL AS INT) END))") > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', > 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', > 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60) > == SQL == > SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM > hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 > LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL > AS INT) END)) > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) > ... 48 elided -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issue
[jira] [Updated] (SPARK-17753) Simple case in spark sql throws ParseException
[ https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17753: -- Summary: Simple case in spark sql throws ParseException (was: Simple case in spark sql throws ParserException) > Simple case in spark sql throws ParseException > -- > > Key: SPARK-17753 > URL: https://issues.apache.org/jira/browse/SPARK-17753 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: kanika dhuria > > Simple case in sql throws parser exception in spark 2.0. > The following query fails in spark 2.0 > spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM > hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 <= > LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS > INT) END))") > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', > 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', > 'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, > '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', > 'DISTRIBUTE'}(line 1, pos 111) > == SQL == > SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM > hadoop_tbl_all alias WHERE CASE 'ab' EQ alias.p_text WHEN TRUE=TRUE > THEN 1 WHEN TRUE=FALSE THEN 0 ELSE CAST(NULL AS INT) END > ---^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) > ... 48 elided -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17753) Simple case in spark sql throws ParserException
kanika dhuria created SPARK-17753: - Summary: Simple case in spark sql throws ParserException Key: SPARK-17753 URL: https://issues.apache.org/jira/browse/SPARK-17753 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: kanika dhuria Simple case in sql throws parser exception in spark 2.0. The following query fails in spark 2.0 spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE (1 = (CASE ('ab' = alias.p_text) OR (8 <= LENGTH(alias.p_text)) WHEN TRUE THEN 1 WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END))") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 111) == SQL == SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all alias WHERE CASE 'ab' EQ alias.p_text WHEN TRUE=TRUE THEN 1 WHEN TRUE=FALSE THEN 0 ELSE CAST(NULL AS INT) END ---^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) ... 48 elided -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14854) Left outer join produces incorrect output when the join condition does not have left table key
[ https://issues.apache.org/jira/browse/SPARK-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272865#comment-15272865 ] kanika dhuria commented on SPARK-14854: --- Why do you think they are same issue? I was expecting all left table data when the join condition is false. Even when I have condition like $"num1".===(lit(10)), result is empty. > Left outer join produces incorrect output when the join condition does not > have left table key > -- > > Key: SPARK-14854 > URL: https://issues.apache.org/jira/browse/SPARK-14854 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: kanika dhuria > > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > val s = StructType(StructField("num", StringType, true)::Nil) > val s1 = StructType(StructField("num1", StringType, true)::Nil) > val m = > sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0))) > val d = > sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0))) > val m1 = sqlContext.createDataFrame(m, s1) > val d1 = sqlContext.createDataFrame(d, s) > val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer"); > j1.take(1) > Returns empty data set. Left table has data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14854) Left outer join produces incorrect output when the join condition does not have left table key
[ https://issues.apache.org/jira/browse/SPARK-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-14854: -- Description: import org.apache.spark.sql._ import org.apache.spark.sql.types._ val s = StructType(StructField("num", StringType, true)::Nil) val s1 = StructType(StructField("num1", StringType, true)::Nil) val m = sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0))) val d = sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0))) val m1 = sqlContext.createDataFrame(m, s1) val d1 = sqlContext.createDataFrame(d, s) val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer"); j1.take(1) Returns empty data set. Left table has data. was: import org.apache.spark.sql._ import org.apache.spark.sql.types._ val s = StructType(StructField("num", StringType, true)::Nil) val s1 = StructType(StructField("num1", StringType, true)::Nil) val m = sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0))) val d = sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0))) val m1 = sqlContext.createDataFrame(m, s1) val d1 = sqlContext.createDataFrame(d, s) val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer"); j1.take(1) Returns empty data set > Left outer join produces incorrect output when the join condition does not > have left table key > -- > > Key: SPARK-14854 > URL: https://issues.apache.org/jira/browse/SPARK-14854 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: kanika dhuria > > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > val s = StructType(StructField("num", StringType, true)::Nil) > val s1 = StructType(StructField("num1", StringType, true)::Nil) > val m = > sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0))) > val d = > sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0))) > val m1 = sqlContext.createDataFrame(m, s1) > val d1 = sqlContext.createDataFrame(d, s) > val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer"); > j1.take(1) > Returns empty data set. Left table has data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14854) Left outer join produces incorrect output when the join condition does not have left table key
kanika dhuria created SPARK-14854: - Summary: Left outer join produces incorrect output when the join condition does not have left table key Key: SPARK-14854 URL: https://issues.apache.org/jira/browse/SPARK-14854 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.1 Reporter: kanika dhuria import org.apache.spark.sql._ import org.apache.spark.sql.types._ val s = StructType(StructField("num", StringType, true)::Nil) val s1 = StructType(StructField("num1", StringType, true)::Nil) val m = sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0))) val d = sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0))) val m1 = sqlContext.createDataFrame(m, s1) val d1 = sqlContext.createDataFrame(d, s) val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer"); j1.take(1) Returns empty data set -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org