[jira] [Closed] (SPARK-32551) Ambiguous self join error in non self join with window

2020-08-06 Thread kanika dhuria (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria closed SPARK-32551.
-

Closing as duplicate.

> Ambiguous self join error in non self join with window
> --
>
> Key: SPARK-32551
> URL: https://issues.apache.org/jira/browse/SPARK-32551
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: kanika dhuria
>Priority: Major
> Fix For: 3.0.1
>
>
> Following code fails ambiguous self join analysis, even when it doesn't have 
> self join 
> val v1 = spark.range(3).toDF("m")
>  val v2 = spark.range(3).toDF("d")
>  val v3 = v1.join(v2, v1("m").===(v2("d")))
>  val v4 = v3("d");
>  val w1 = Window.partitionBy(v4)
>  val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))
> org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
> probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32551) Ambiguous self join error in non self join with window

2020-08-06 Thread kanika dhuria (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria resolved SPARK-32551.
---
Fix Version/s: 3.0.1
   Resolution: Fixed

> Ambiguous self join error in non self join with window
> --
>
> Key: SPARK-32551
> URL: https://issues.apache.org/jira/browse/SPARK-32551
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: kanika dhuria
>Priority: Major
> Fix For: 3.0.1
>
>
> Following code fails ambiguous self join analysis, even when it doesn't have 
> self join 
> val v1 = spark.range(3).toDF("m")
>  val v2 = spark.range(3).toDF("d")
>  val v3 = v1.join(v2, v1("m").===(v2("d")))
>  val v4 = v3("d");
>  val w1 = Window.partitionBy(v4)
>  val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))
> org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
> probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32551) Ambiguous self join error in non self join with window

2020-08-06 Thread kanika dhuria (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172586#comment-17172586
 ] 

kanika dhuria commented on SPARK-32551:
---

Thanks [~cloud_fan], it is fixed in latest 3.0 branch. 

Fixed as part of https://issues.apache.org/jira/browse/SPARK-31956.

> Ambiguous self join error in non self join with window
> --
>
> Key: SPARK-32551
> URL: https://issues.apache.org/jira/browse/SPARK-32551
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: kanika dhuria
>Priority: Major
>
> Following code fails ambiguous self join analysis, even when it doesn't have 
> self join 
> val v1 = spark.range(3).toDF("m")
>  val v2 = spark.range(3).toDF("d")
>  val v3 = v1.join(v2, v1("m").===(v2("d")))
>  val v4 = v3("d");
>  val w1 = Window.partitionBy(v4)
>  val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))
> org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
> probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32551) Ambiguous self join error in non self join with window

2020-08-05 Thread kanika dhuria (Jira)
kanika dhuria created SPARK-32551:
-

 Summary: Ambiguous self join error in non self join with window
 Key: SPARK-32551
 URL: https://issues.apache.org/jira/browse/SPARK-32551
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: kanika dhuria


Following code hits ambiguous join error even when it doesn't have self join 

val v1 = spark.range(3).toDF("m")
 val v2 = spark.range(3).toDF("d")
 val v3 = v1.join(v2, v1("m").===(v2("d")))
 val v4 = v3("d");
 val w1 = Window.partitionBy(v4)
 val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))

org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
probably because you joined several Datasets together, and some of these 
Datasets are the same. This column points to one of the Datasets but Spark is 
unable to figure out which one. Please alias the Datasets with different names 
via `Dataset.as` before joining them, and specify the column using qualified 
name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.;

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32551) Ambiguous self join error in non self join with window

2020-08-05 Thread kanika dhuria (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-32551:
--
Description: 
Following code fails ambiguous self join analysis, even when it doesn't have 
self join 

val v1 = spark.range(3).toDF("m")
 val v2 = spark.range(3).toDF("d")
 val v3 = v1.join(v2, v1("m").===(v2("d")))
 val v4 = v3("d");
 val w1 = Window.partitionBy(v4)
 val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))

org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
probably because you joined several Datasets together, and some of these 
Datasets are the same. This column points to one of the Datasets but Spark is 
unable to figure out which one. Please alias the Datasets with different names 
via `Dataset.as` before joining them, and specify the column using qualified 
name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.;

 

  was:
Following code hits ambiguous join error even when it doesn't have self join 

val v1 = spark.range(3).toDF("m")
 val v2 = spark.range(3).toDF("d")
 val v3 = v1.join(v2, v1("m").===(v2("d")))
 val v4 = v3("d");
 val w1 = Window.partitionBy(v4)
 val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))

org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
probably because you joined several Datasets together, and some of these 
Datasets are the same. This column points to one of the Datasets but Spark is 
unable to figure out which one. Please alias the Datasets with different names 
via `Dataset.as` before joining them, and specify the column using qualified 
name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.;

 


> Ambiguous self join error in non self join with window
> --
>
> Key: SPARK-32551
> URL: https://issues.apache.org/jira/browse/SPARK-32551
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: kanika dhuria
>Priority: Major
>
> Following code fails ambiguous self join analysis, even when it doesn't have 
> self join 
> val v1 = spark.range(3).toDF("m")
>  val v2 = spark.range(3).toDF("d")
>  val v3 = v1.join(v2, v1("m").===(v2("d")))
>  val v4 = v3("d");
>  val w1 = Window.partitionBy(v4)
>  val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))
> org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
> probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-22207) High memory usage when converting relational data to Hierarchical data

2019-06-28 Thread kanika dhuria (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria reopened SPARK-22207:
---

Same issue is seen in spark 2.4

> High memory usage when converting relational data to Hierarchical data
> --
>
> Key: SPARK-22207
> URL: https://issues.apache.org/jira/browse/SPARK-22207
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: kanika dhuria
>Priority: Major
>  Labels: bulk-closed
>
> Have 4 tables 
> lineitems ~1.4Gb,
> orders ~ 330MB
> customer ~47MB
> nations ~ 2.2K
> These tables are related as follows
> There are multiple lineitems per order (pk, fk:orderkey)
> There are multiple orders per customer(pk,fk: cust_key)
> There are multiple customers per nation(pk, fk:nation key)
> Data is almost evenly distributed.
> Building hierarchy till 3 levels i.e joining lineitems, orders, customers 
> works good with executor memory 4Gb/2cores
> Adding nations require 8GB/2 cores or 4GB/1 core memory.
> ==
> {noformat}
> val sqlContext = SparkSession.builder() .enableHiveSupport() 
> .config("spark.sql.retainGroupColumns", false) 
> .config("spark.sql.crossJoin.enabled", true) .getOrCreate()
>  
>   val orders = sqlContext.sql("select * from orders")
>   val lineItem = sqlContext.sql("select * from lineitems")
>   
>   val customer = sqlContext.sql("select * from customers")
>   
>   val nation = sqlContext.sql("select * from nations")
>   
>   val lineitemOrders = 
> lineItem.groupBy(col("l_orderkey")).agg(col("l_orderkey"), 
> collect_list(struct(col("l_partkey"), 
> col("l_suppkey"),col("l_linenumber"),col("l_quantity"),col("l_extendedprice"),col("l_discount"),col("l_tax"),col("l_returnflag"),col("l_linestatus"),col("l_shipdate"),col("l_commitdate"),col("l_receiptdate"),col("l_shipinstruct"),col("l_shipmode"))).as("lineitem")).join(orders,
>  orders("O_ORDERKEY")=== lineItem("l_orderkey")).select(col("O_ORDERKEY"), 
> col("O_CUSTKEY"),  col("O_ORDERSTATUS"), col("O_TOTALPRICE"), 
> col("O_ORDERDATE"), col("O_ORDERPRIORITY"), col("O_CLERK"), 
> col("O_SHIPPRIORITY"), col("O_COMMENT"),  col("lineitem"))  
>   
>   val customerList = 
> lineitemOrders.groupBy(col("o_custkey")).agg(col("o_custkey"),collect_list(struct(col("O_ORDERKEY"),
>  col("O_CUSTKEY"),  col("O_ORDERSTATUS"), col("O_TOTALPRICE"), 
> col("O_ORDERDATE"), col("O_ORDERPRIORITY"), col("O_CLERK"), 
> col("O_SHIPPRIORITY"), 
> col("O_COMMENT"),col("lineitem"))).as("items")).join(customer,customer("c_custkey")===
>  
> lineitemOrders("o_custkey")).select(col("c_custkey"),col("c_name"),col("c_nationkey"),col("items"))
>  val nationList = 
> customerList.groupBy(col("c_nationkey")).agg(col("c_nationkey"),collect_list(struct(col("c_custkey"),col("c_name"),col("c_nationkey"),col("items"))).as("custList")).join(nation,nation("n_nationkey")===customerList("c_nationkey")).select(col("n_nationkey"),col("n_name"),col("custList"))
>  
>   nationList.write.mode("overwrite").json("filePath")
> {noformat}
> 
> If the customeList is saved in a file and then the last agg/join is run 
> separately, it does run fine in 4GB/2 core .
> I can provide the data if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22207) High memory usage when converting relational data to Hierarchical data

2017-10-04 Thread kanika dhuria (JIRA)
kanika dhuria created SPARK-22207:
-

 Summary: High memory usage when converting relational data to 
Hierarchical data
 Key: SPARK-22207
 URL: https://issues.apache.org/jira/browse/SPARK-22207
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: kanika dhuria


Have 4 tables 
lineitems ~1.4Gb,
orders ~ 330MB
customer ~47MB
nations ~ 2.2K

These tables are related as follows
There are multiple lineitems per order (pk, fk:orderkey)
There are multiple orders per customer(pk,fk: cust_key)
There are multiple customers per nation(pk, fk:nation key)

Data is almost evenly distributed.

Building hierarchy till 3 levels i.e joining lineitems, orders, customers works 
good with executor memory 4Gb/2cores
Adding nations require 8GB/2 cores or 4GB/1 core memory.

==

{noformat}
val sqlContext = SparkSession.builder() .enableHiveSupport() 
.config("spark.sql.retainGroupColumns", false) 
.config("spark.sql.crossJoin.enabled", true) .getOrCreate()
 
  val orders = sqlContext.sql("select * from orders")
  val lineItem = sqlContext.sql("select * from lineitems")
  
  val customer = sqlContext.sql("select * from customers")
  
  val nation = sqlContext.sql("select * from nations")
  
  val lineitemOrders = 
lineItem.groupBy(col("l_orderkey")).agg(col("l_orderkey"), 
collect_list(struct(col("l_partkey"), 
col("l_suppkey"),col("l_linenumber"),col("l_quantity"),col("l_extendedprice"),col("l_discount"),col("l_tax"),col("l_returnflag"),col("l_linestatus"),col("l_shipdate"),col("l_commitdate"),col("l_receiptdate"),col("l_shipinstruct"),col("l_shipmode"))).as("lineitem")).join(orders,
 orders("O_ORDERKEY")=== lineItem("l_orderkey")).select(col("O_ORDERKEY"), 
col("O_CUSTKEY"),  col("O_ORDERSTATUS"), col("O_TOTALPRICE"), 
col("O_ORDERDATE"), col("O_ORDERPRIORITY"), col("O_CLERK"), 
col("O_SHIPPRIORITY"), col("O_COMMENT"),  col("lineitem"))  
  
  val customerList = 
lineitemOrders.groupBy(col("o_custkey")).agg(col("o_custkey"),collect_list(struct(col("O_ORDERKEY"),
 col("O_CUSTKEY"),  col("O_ORDERSTATUS"), col("O_TOTALPRICE"), 
col("O_ORDERDATE"), col("O_ORDERPRIORITY"), col("O_CLERK"), 
col("O_SHIPPRIORITY"), 
col("O_COMMENT"),col("lineitem"))).as("items")).join(customer,customer("c_custkey")===
 
lineitemOrders("o_custkey")).select(col("c_custkey"),col("c_name"),col("c_nationkey"),col("items"))


 val nationList = 
customerList.groupBy(col("c_nationkey")).agg(col("c_nationkey"),collect_list(struct(col("c_custkey"),col("c_name"),col("c_nationkey"),col("items"))).as("custList")).join(nation,nation("n_nationkey")===customerList("c_nationkey")).select(col("n_nationkey"),col("n_name"),col("custList"))
 
  nationList.write.mode("overwrite").json("filePath")

{noformat}


If the customeList is saved in a file and then the last agg/join is run 
separately, it does run fine in 4GB/2 core .

I can provide the data if needed.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22205) Incorrect result with user defined agg function followed by a non deterministic function

2017-10-04 Thread kanika dhuria (JIRA)
kanika dhuria created SPARK-22205:
-

 Summary: Incorrect result with  user defined agg function followed 
by a non deterministic function 
 Key: SPARK-22205
 URL: https://issues.apache.org/jira/browse/SPARK-22205
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: kanika dhuria


Repro 
Create a user defined function like 
lass AnyUdaf(dtype:DataType) extends UserDefinedAggregateFunction {
  def inputSchema:StructType = StructType(StructField("v", dtype) :: Nil)

  def bufferSchema:StructType = StructType(StructField("v", dtype) :: Nil)

  def dataType: DataType = dtype

  def deterministic: Boolean = true

  def initialize(buffer: MutableAggregationBuffer): Unit = { buffer(0) = null }

  def update(buffer: MutableAggregationBuffer,input: Row): Unit = {
if (buffer(0) == null) buffer(0) = input(0)
  }

  def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {
if(buffer1(0) == null) buffer1(0) = buffer2(0)
  }

  def evaluate(buffer: Row): Any = { buffer(0) }
}

Use this in an agg and follow it with non deterministic function like  
monotonically_increasing_id.
Seq(0,1).toDF("c1").select(col("c1"), lit(10)).toDF("c1", 
"c2").select(col("c1"), col("c2")).toDF("c1", "c2").groupBy(col("c1")).agg(new 
AnyUdaf()(col("c2"))).toDF("c1", "c2").select(lit(5), col("c2"), 
monotonically_increasing_id).show
+---+---+-+
|  5| c2|monotonically_increasing_id()|
+---+---+-+
|  5|10|0|
|  5|10|0|
+---+---+-+






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cata

2017-04-25 Thread kanika dhuria (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982700#comment-15982700
 ] 

kanika dhuria commented on SPARK-17922:
---

Hi , I have attached the repro case for this issue. The zip has ReadMe with 
details of configuration steps that are required.
Can somebody please use that and review the change requested?

> ClassCastException java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator 
> cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection 
> -
>
> Key: SPARK-17922
> URL: https://issues.apache.org/jira/browse/SPARK-17922
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
> Attachments: spark_17922.tar.gz
>
>
> I am using spark 2.0
> Seeing class loading issue because the whole stage code gen is generating 
> multiple classes with same name as 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass"
> I am using dataframe transform. and within transform i use Osgi.
> Osgi replaces the thread context class loader to ContextFinder which looks at 
> all the class loaders in the stack to find out the new generated class and 
> finds the GeneratedClass with inner class GeneratedIterator byteclass 
> loader(instead of falling back to the byte class loader created by janino 
> compiler), since the class name is same that byte class loader loads the 
> class and returns GeneratedClass$GeneratedIterator instead of expected 
> GeneratedClass$UnsafeProjection.
> Can we generate different classes with different names or is it expected to 
> generate one class only? 
> This is the somewhat I am trying to do 
> {noformat} 
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> import com.databricks.spark.avro._
>   def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
> //Initialize osgi
>  (rows:Iterator[Row]) => {
>  var outi = Iterator[Row]() 
>  while(rows.hasNext) {
>  val r = rows.next 
>  outi = outi.++(Iterator(Row(r.get(0  
>  } 
>  //val ors = Row("abc")   
>  //outi =outi.++( Iterator(ors))  
>  outi
>  }
>   }
> def transform1( outType:StructType) :((DataFrame) => DataFrame) = {
>  (d:DataFrame) => {
>   val inType = d.schema
>   val rdd = d.rdd.mapPartitions(exePart(outType))
>   d.sqlContext.createDataFrame(rdd, outType)
> }
>
>   }
> val df = spark.read.avro("file:///data/builds/a1.avro")
> val df1 = df.select($"id2").filter(false)
> val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, 
> true)::Nil))).createOrReplaceTempView("tbl0")
> spark.sql("insert overwrite table testtable select p1 from tbl0")
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly

2017-04-25 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-17922:
--
Attachment: spark_17922.tar.gz

Repro case

> ClassCastException java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator 
> cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection 
> -
>
> Key: SPARK-17922
> URL: https://issues.apache.org/jira/browse/SPARK-17922
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
> Attachments: spark_17922.tar.gz
>
>
> I am using spark 2.0
> Seeing class loading issue because the whole stage code gen is generating 
> multiple classes with same name as 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass"
> I am using dataframe transform. and within transform i use Osgi.
> Osgi replaces the thread context class loader to ContextFinder which looks at 
> all the class loaders in the stack to find out the new generated class and 
> finds the GeneratedClass with inner class GeneratedIterator byteclass 
> loader(instead of falling back to the byte class loader created by janino 
> compiler), since the class name is same that byte class loader loads the 
> class and returns GeneratedClass$GeneratedIterator instead of expected 
> GeneratedClass$UnsafeProjection.
> Can we generate different classes with different names or is it expected to 
> generate one class only? 
> This is the somewhat I am trying to do 
> {noformat} 
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> import com.databricks.spark.avro._
>   def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
> //Initialize osgi
>  (rows:Iterator[Row]) => {
>  var outi = Iterator[Row]() 
>  while(rows.hasNext) {
>  val r = rows.next 
>  outi = outi.++(Iterator(Row(r.get(0  
>  } 
>  //val ors = Row("abc")   
>  //outi =outi.++( Iterator(ors))  
>  outi
>  }
>   }
> def transform1( outType:StructType) :((DataFrame) => DataFrame) = {
>  (d:DataFrame) => {
>   val inType = d.schema
>   val rdd = d.rdd.mapPartitions(exePart(outType))
>   d.sqlContext.createDataFrame(rdd, outType)
> }
>
>   }
> val df = spark.read.avro("file:///data/builds/a1.avro")
> val df1 = df.select($"id2").filter(false)
> val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, 
> true)::Nil))).createOrReplaceTempView("tbl0")
> spark.sql("insert overwrite table testtable select p1 from tbl0")
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly

2016-10-13 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-17922:
--
Description: 
I am using spark 2.0
Seeing class loading issue because the whole stage code gen is generating 
multiple classes with same name as 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass"
I am using dataframe transform. and within transform i use Osgi.
Osgi replaces the thread context class loader to ContextFinder which looks at 
all the class loaders in the stack to find out the new generated class and 
finds the GeneratedClass with inner class GeneratedIterator byteclass 
loader(instead of falling back to the byte class loader created by janino 
compiler), since the class name is same that byte class loader loads the class 
and returns GeneratedClass$GeneratedIterator instead of expected 
GeneratedClass$UnsafeProjection.

Can we generate different classes with different names or is it expected to 
generate one class only? 
This is the somewhat I am trying to do 
{noformat} 
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import com.databricks.spark.avro._

  def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
//Initialize osgi
 (rows:Iterator[Row]) => {
 var outi = Iterator[Row]() 
 while(rows.hasNext) {
 val r = rows.next 
 outi = outi.++(Iterator(Row(r.get(0  
 } 
 //val ors = Row("abc")   
 //outi =outi.++( Iterator(ors))  
 outi
 }
  }

def transform1( outType:StructType) :((DataFrame) => DataFrame) = {
 (d:DataFrame) => {
  val inType = d.schema
  val rdd = d.rdd.mapPartitions(exePart(outType))
  d.sqlContext.createDataFrame(rdd, outType)
}
   
  }

val df = spark.read.avro("file:///data/builds/a1.avro")
val df1 = df.select($"id2").filter(false)
val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, 
true)::Nil))).createOrReplaceTempView("tbl0")

spark.sql("insert overwrite table testtable select p1 from tbl0")
{noformat} 

  was:
I am using spark 2.0
Seeing class loading issue because the whole stage code gen is generating 
multiple classes with same name as 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass"
I am using dataframe transform. and within transform i use Osgi.
Osgi replaces the thread context class loader to ContextFinder which looks at 
all the class loaders in the stack to find out the new generated class and 
finds the GeneratedClass with inner class GeneratedIterator byteclass 
loader(instead of falling back to the byte class loader created by janino 
compiler), since the class name is same that byte class loader loads the class 
and returns GeneratedClass$GeneratedIterator instead of expected 
GeneratedClass$UnsafeProjection.

Can we generate different classes with different names or is it expected to 
generate one class only? 
This is the somewhat I am trying to do 

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import com.databricks.spark.avro._

  def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
//Initialize osgi
 (rows:Iterator[Row]) => {
 var outi = Iterator[Row]() 
 while(rows.hasNext) {
 val r = rows.next 
 outi = outi.++(Iterator(Row(r.get(0  
 } 
 //val ors = Row("abc")   
 //outi =outi.++( Iterator(ors))  
 outi
 }
  }

def transform1( outType:StructType) :((DataFrame) => DataFrame) = {
 (d:DataFrame) => {
  val inType = d.schema
  val rdd = d.rdd.mapPartitions(exePart(outType))
  d.sqlContext.createDataFrame(rdd, outType)
}
   
  }

val df = spark.read.avro("file:///data/builds/a1.avro")
val df1 = df.select($"id2").filter(false)
val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, 
true)::Nil))).createOrReplaceTempView("tbl0")

spark.sql("insert overwrite table testtable select p1 from tbl0")



> ClassCastException java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator 
> cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection 
> -
>
> Key: SPARK-17922
> URL: https://issues.apache.org/jira/browse/SPARK-17922
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> I am using spark 2.0
> Seeing class loading issue because the whole stage code gen is generating 
> multiple classes with same name as 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass"
> I am usi

[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly

2016-10-13 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-17922:
--
Description: 
I am using spark 2.0
Seeing class loading issue because the whole stage code gen is generating 
multiple classes with same name as 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass"
I am using dataframe transform. and within transform i use Osgi.
Osgi replaces the thread context class loader to ContextFinder which looks at 
all the class loaders in the stack to find out the new generated class and 
finds the GeneratedClass with inner class GeneratedIterator byteclass 
loader(instead of falling back to the byte class loader created by janino 
compiler), since the class name is same that byte class loader loads the class 
and returns GeneratedClass$GeneratedIterator instead of expected 
GeneratedClass$UnsafeProjection.

Can we generate different classes with different names or is it expected to 
generate one class only? 
This is the somewhat I am trying to do 

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import com.databricks.spark.avro._

  def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
//Initialize osgi
 (rows:Iterator[Row]) => {
 var outi = Iterator[Row]() 
 while(rows.hasNext) {
 val r = rows.next 
 outi = outi.++(Iterator(Row(r.get(0  
 } 
 //val ors = Row("abc")   
 //outi =outi.++( Iterator(ors))  
 outi
 }
  }

def transform1( outType:StructType) :((DataFrame) => DataFrame) = {
 (d:DataFrame) => {
  val inType = d.schema
  val rdd = d.rdd.mapPartitions(exePart(outType))
  d.sqlContext.createDataFrame(rdd, outType)
}
   
  }

val df = spark.read.avro("file:///data/builds/a1.avro")
val df1 = df.select($"id2").filter(false)
val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, 
true)::Nil))).createOrReplaceTempView("tbl0")

spark.sql("insert overwrite table testtable select p1 from tbl0")


  was:
I am using spark 2.0
Seeing class loading issue because the whole stage code gen is generating 
multiple classes with same name as 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass"
I am using dataframe transform. and within transform i use Osgi.
Osgi replaces the thread context class loader to ContextFinder which looks at 
all the class loaders in the stack to find out the new generated class and 
finds the GeneratedClass with inner class GeneratedIterator byteclass 
loader(instead of falling back to the byte class loader created by janino 
compiler), since the class name is same that byte class loader loads the class 
and returns GeneratedClass$GeneratedIterator instead of expected 
GeneratedClass$UnsafeProjection.

Can we generate different classes with different names or is it expected to 
generate one class only.
This is the rough repro

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import com.databricks.spark.avro._

  def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
//Initialize osgi
 (rows:Iterator[Row]) => {
 var outi = Iterator[Row]() 
 while(rows.hasNext) {
 val r = rows.next 
 outi = outi.++(Iterator(Row(r.get(0  
 } 
 //val ors = Row("abc")   
 //outi =outi.++( Iterator(ors))  
 outi
 }
  }

def transform1( outType:StructType) :((DataFrame) => DataFrame) = {
 (d:DataFrame) => {
  val inType = d.schema
  val rdd = d.rdd.mapPartitions(exePart(outType))
  d.sqlContext.createDataFrame(rdd, outType)
}
   
  }

val df = spark.read.avro("file:///data/builds/a1.avro")
val df1 = df.select($"id2").filter(false)
val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, 
true)::Nil))).createOrReplaceTempView("tbl0")

spark.sql("insert overwrite table testtable select p1 from tbl0")



> ClassCastException java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator 
> cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection 
> -
>
> Key: SPARK-17922
> URL: https://issues.apache.org/jira/browse/SPARK-17922
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> I am using spark 2.0
> Seeing class loading issue because the whole stage code gen is generating 
> multiple classes with same name as 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass"
> I am using dataframe transform. and within tran

[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly

2016-10-13 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-17922:
--
Description: 
I am using spark 2.0
Seeing class loading issue because the whole stage code gen is generating 
multiple classes with same name as 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass"
I am using dataframe transform. and within transform i use Osgi.
Osgi replaces the thread context class loader to ContextFinder which looks at 
all the class loaders in the stack to find out the new generated class and 
finds the GeneratedClass with inner class GeneratedIterator byteclass 
loader(instead of falling back to the byte class loader created by janino 
compiler), since the class name is same that byte class loader loads the class 
and returns GeneratedClass$GeneratedIterator instead of expected 
GeneratedClass$UnsafeProjection.

Can we generate different classes with different names or is it expected to 
generate one class only.
This is the rough repro

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import com.databricks.spark.avro._

  def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
//Initialize osgi
 (rows:Iterator[Row]) => {
 var outi = Iterator[Row]() 
 while(rows.hasNext) {
 val r = rows.next 
 outi = outi.++(Iterator(Row(r.get(0  
 } 
 //val ors = Row("abc")   
 //outi =outi.++( Iterator(ors))  
 outi
 }
  }

def transform1( outType:StructType) :((DataFrame) => DataFrame) = {
 (d:DataFrame) => {
  val inType = d.schema
  val rdd = d.rdd.mapPartitions(exePart(outType))
  d.sqlContext.createDataFrame(rdd, outType)
}
   
  }

val df = spark.read.avro("file:///data/builds/a1.avro")
val df1 = df.select($"id2").filter(false)
val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, 
true)::Nil))).createOrReplaceTempView("tbl0")

spark.sql("insert overwrite table testtable select p1 from tbl0")


  was:
I am using spark 2.0
Seeing class loading issue because the whole stage code gen is generating 
multiple classes with same name as 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass"
I am using dataframe transform. and within transform i use Osgi.
Osgi replaces the thread context class loader to ContextFinder which looks at 
all the class loaders in the stack to find out the new generated class and 
finds the GeneratedClass with inner class GeneratedIterator byteclass 
loader(instead of falling back to the byte class loader created by janino 
compiler), since the class name is same that byte class loader loads the class 
and returns GeneratedClass$GeneratedIterator instead of expected 
GeneratedClass$UnsafeProjection.



> ClassCastException java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator 
> cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection 
> -
>
> Key: SPARK-17922
> URL: https://issues.apache.org/jira/browse/SPARK-17922
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> I am using spark 2.0
> Seeing class loading issue because the whole stage code gen is generating 
> multiple classes with same name as 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass"
> I am using dataframe transform. and within transform i use Osgi.
> Osgi replaces the thread context class loader to ContextFinder which looks at 
> all the class loaders in the stack to find out the new generated class and 
> finds the GeneratedClass with inner class GeneratedIterator byteclass 
> loader(instead of falling back to the byte class loader created by janino 
> compiler), since the class name is same that byte class loader loads the 
> class and returns GeneratedClass$GeneratedIterator instead of expected 
> GeneratedClass$UnsafeProjection.
> Can we generate different classes with different names or is it expected to 
> generate one class only.
> This is the rough repro
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> import com.databricks.spark.avro._
>   def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
> //Initialize osgi
>  (rows:Iterator[Row]) => {
>  var outi = Iterator[Row]() 
>  while(rows.hasNext) {
>  val r = rows.next 
>  outi = outi.++(Iterator(Row(r.get(0  
>  } 
>  //val ors = Row("abc")   
>  //outi =outi.++( Iterator(ors))  
>  outi
>  }
>   

[jira] [Created] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly

2016-10-13 Thread kanika dhuria (JIRA)
kanika dhuria created SPARK-17922:
-

 Summary: ClassCastException java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator 
cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection 
 Key: SPARK-17922
 URL: https://issues.apache.org/jira/browse/SPARK-17922
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: kanika dhuria


I am using spark 2.0
Seeing class loading issue because the whole stage code gen is generating 
multiple classes with same name as 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass"
I am using dataframe transform. and within transform i use Osgi.
Osgi replaces the thread context class loader to ContextFinder which looks at 
all the class loaders in the stack to find out the new generated class and 
finds the GeneratedClass with inner class GeneratedIterator byteclass 
loader(instead of falling back to the byte class loader created by janino 
compiler), since the class name is same that byte class loader loads the class 
and returns GeneratedClass$GeneratedIterator instead of expected 
GeneratedClass$UnsafeProjection.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17753) Simple case in spark sql throws ParseException

2016-10-01 Thread kanika dhuria (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539376#comment-15539376
 ] 

kanika dhuria commented on SPARK-17753:
---

Yeah, Thanks!

> Simple case in spark sql throws ParseException
> --
>
> Key: SPARK-17753
> URL: https://issues.apache.org/jira/browse/SPARK-17753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> Simple case in sql throws parser exception in spark 2.0.
> The following query as well as similar queries fail in spark 2.0 
> {noformat}
> scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
> FROM hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR 
> (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE 
> CAST(NULL AS INT) END))")
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 
> 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 
> 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60)
> == SQL ==
> SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
> hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 
> LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL 
> AS INT) END))
> ^^^
>   at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
>   ... 48 elided
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17753) Simple case in spark sql throws ParseException

2016-10-01 Thread kanika dhuria (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539358#comment-15539358
 ] 

kanika dhuria commented on SPARK-17753:
---

Hi Herman,
I have tried with symbolic also, it doesn't work. Even this query fails with 
same error.
SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all 
alias WHERE  (1 = (CASE ('ab' = alias.p_text) WHEN TRUE THEN 1  WHEN 
FALSE THEN 0 ELSE CAST(NULL AS INT) END))

Any boolean condition after case doesn't work.

> Simple case in spark sql throws ParseException
> --
>
> Key: SPARK-17753
> URL: https://issues.apache.org/jira/browse/SPARK-17753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> Simple case in sql throws parser exception in spark 2.0.
> The following query as well as similar queries fail in spark 2.0 
> {noformat}
> scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
> FROM hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR 
> (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE 
> CAST(NULL AS INT) END))")
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 
> 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 
> 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60)
> == SQL ==
> SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
> hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 
> LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL 
> AS INT) END))
> ^^^
>   at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
>   ... 48 elided
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17753) Simple case in spark sql throws ParseException

2016-09-30 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-17753:
--
Description: 
Simple case in sql throws parser exception in spark 2.0.
The following query as well as similar queries fail in spark 2.0 
scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
FROM hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 
LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
INT) END))")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 
'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 
'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60)

== SQL ==
SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all 
alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 LTE 
LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
INT) END))
^^^

  at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  ... 48 elided


  was:
Simple case in sql throws parser exception in spark 2.0.
The following query fails in spark 2.0 
scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
FROM hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 
LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
INT) END))")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 
'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 
'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60)

== SQL ==
SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all 
alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 LTE 
LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
INT) END))
^^^

  at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  ... 48 elided



> Simple case in spark sql throws ParseException
> --
>
> Key: SPARK-17753
> URL: https://issues.apache.org/jira/browse/SPARK-17753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> Simple case in sql throws parser exception in spark 2.0.
> The following query as well as similar queries fail in spark 2.0 
> scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
> FROM hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR 
> (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE 
> CAST(NULL AS INT) END))")
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 
> 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 
> 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60)
> == SQL ==
> SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
> hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 
> LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL 
> AS INT) END))
> ^^^
>   at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
>   ... 48 elided



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17753) Simple case in spark sql throws ParseException

2016-09-30 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-17753:
--
Description: 
Simple case in sql throws parser exception in spark 2.0.
The following query fails in spark 2.0 
scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
FROM hadoop_tbl_all alias WHERE  CASE 'ab' = alias.p_text  WHEN TRUE 
THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 
'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 
'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', 
'-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 
1, pos 111)

== SQL ==
SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all 
alias WHERE  CASE 'ab' EQ alias.p_text  WHEN TRUE THEN 1  WHEN FALSE 
THEN 0 ELSE CAST(NULL AS INT) END
---^^^

  at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  ... 48 elided



  was:
Simple case in sql throws parser exception in spark 2.0.
The following query fails in spark 2.0 

spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 <= 
LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
INT) END))")

org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 
'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 
'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', 
'-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 
1, pos 111)

== SQL ==
SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all 
alias WHERE  CASE 'ab' EQ alias.p_text  WHEN TRUE=TRUE THEN 1  WHEN 
TRUE=FALSE THEN 0 ELSE CAST(NULL AS INT) END
---^^^

  at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  ... 48 elided



> Simple case in spark sql throws ParseException
> --
>
> Key: SPARK-17753
> URL: https://issues.apache.org/jira/browse/SPARK-17753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> Simple case in sql throws parser exception in spark 2.0.
> The following query fails in spark 2.0 
> scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
> FROM hadoop_tbl_all alias WHERE  CASE 'ab' = alias.p_text  WHEN TRUE 
> THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END")
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 
> 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 
> 'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, 
> '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 
> 'DISTRIBUTE'}(line 1, pos 111)
> == SQL ==
> SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
> hadoop_tbl_all alias WHERE  CASE 'ab' EQ alias.p_text  WHEN TRUE THEN 
> 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END
> ---^^^
>   at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.sc

[jira] [Updated] (SPARK-17753) Simple case in spark sql throws ParseException

2016-09-30 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-17753:
--
Description: 
Simple case in sql throws parser exception in spark 2.0.
The following query fails in spark 2.0 
scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
FROM hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 
LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
INT) END))")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 
'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 
'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60)

== SQL ==
SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all 
alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 LTE 
LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
INT) END))
^^^

  at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  ... 48 elided


  was:
Simple case in sql throws parser exception in spark 2.0.
The following query fails in spark 2.0 
scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
FROM hadoop_tbl_all alias WHERE  CASE 'ab' = alias.p_text  WHEN TRUE 
THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS INT) END")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 
'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 
'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', 
'-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 
1, pos 111)

== SQL ==
SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all 
alias WHERE  CASE 'ab' EQ alias.p_text  WHEN TRUE THEN 1  WHEN FALSE 
THEN 0 ELSE CAST(NULL AS INT) END
---^^^

  at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  ... 48 elided




> Simple case in spark sql throws ParseException
> --
>
> Key: SPARK-17753
> URL: https://issues.apache.org/jira/browse/SPARK-17753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> Simple case in sql throws parser exception in spark 2.0.
> The following query fails in spark 2.0 
> scala> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 
> FROM hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR 
> (8 LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE 
> CAST(NULL AS INT) END))")
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 
> 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 
> 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 60)
> == SQL ==
> SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
> hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 
> LTE LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL 
> AS INT) END))
> ^^^
>   at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
>   ... 48 elided



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issue

[jira] [Updated] (SPARK-17753) Simple case in spark sql throws ParseException

2016-09-30 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-17753:
--
Summary: Simple case in spark sql throws ParseException  (was: Simple case 
in spark sql throws ParserException)

> Simple case in spark sql throws ParseException
> --
>
> Key: SPARK-17753
> URL: https://issues.apache.org/jira/browse/SPARK-17753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: kanika dhuria
>
> Simple case in sql throws parser exception in spark 2.0.
> The following query fails in spark 2.0 
> spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
> hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 <= 
> LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
> INT) END))")
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 
> 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 
> 'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, 
> '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 
> 'DISTRIBUTE'}(line 1, pos 111)
> == SQL ==
> SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
> hadoop_tbl_all alias WHERE  CASE 'ab' EQ alias.p_text  WHEN TRUE=TRUE 
> THEN 1  WHEN TRUE=FALSE THEN 0 ELSE CAST(NULL AS INT) END
> ---^^^
>   at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
>   ... 48 elided



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17753) Simple case in spark sql throws ParserException

2016-09-30 Thread kanika dhuria (JIRA)
kanika dhuria created SPARK-17753:
-

 Summary: Simple case in spark sql throws ParserException
 Key: SPARK-17753
 URL: https://issues.apache.org/jira/browse/SPARK-17753
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: kanika dhuria


Simple case in sql throws parser exception in spark 2.0.
The following query fails in spark 2.0 

spark.sql("SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM 
hadoop_tbl_all alias WHERE  (1 = (CASE ('ab' = alias.p_text) OR (8 <= 
LENGTH(alias.p_text)) WHEN TRUE THEN 1  WHEN FALSE THEN 0 ELSE CAST(NULL AS 
INT) END))")

org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'EQ' expecting {, '.', '[', 'GROUP', 'ORDER', 'HAVING', 
'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'WINDOW', 
'UNION', 'EXCEPT', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', 
'-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 
1, pos 111)

== SQL ==
SELECT alias.p_double as a0, alias.p_text as a1, NULL as a2 FROM hadoop_tbl_all 
alias WHERE  CASE 'ab' EQ alias.p_text  WHEN TRUE=TRUE THEN 1  WHEN 
TRUE=FALSE THEN 0 ELSE CAST(NULL AS INT) END
---^^^

  at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  ... 48 elided




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14854) Left outer join produces incorrect output when the join condition does not have left table key

2016-05-05 Thread kanika dhuria (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272865#comment-15272865
 ] 

kanika dhuria commented on SPARK-14854:
---

 Why do you think they are same issue? I was expecting all left table data  
when the join condition is false. Even when I have condition like 
$"num1".===(lit(10)), result is empty.

> Left outer join produces incorrect output when the join condition does not 
> have left table key
> --
>
> Key: SPARK-14854
> URL: https://issues.apache.org/jira/browse/SPARK-14854
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: kanika dhuria
>
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val s = StructType(StructField("num", StringType, true)::Nil)
> val s1 = StructType(StructField("num1", StringType, true)::Nil)
> val m = 
> sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0)))
> val d = 
> sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0)))
> val m1 = sqlContext.createDataFrame(m, s1)
> val d1 = sqlContext.createDataFrame(d, s)
> val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer");
> j1.take(1)
> Returns empty data set. Left table has data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14854) Left outer join produces incorrect output when the join condition does not have left table key

2016-04-22 Thread kanika dhuria (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-14854:
--
Description: 
import org.apache.spark.sql._
import org.apache.spark.sql.types._


val s = StructType(StructField("num", StringType, true)::Nil)
val s1 = StructType(StructField("num1", StringType, true)::Nil)

val m = sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0)))
val d = sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0)))
val m1 = sqlContext.createDataFrame(m, s1)
val d1 = sqlContext.createDataFrame(d, s)
val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer");
j1.take(1)

Returns empty data set. Left table has data.

  was:
import org.apache.spark.sql._
import org.apache.spark.sql.types._


val s = StructType(StructField("num", StringType, true)::Nil)
val s1 = StructType(StructField("num1", StringType, true)::Nil)

val m = sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0)))
val d = sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0)))
val m1 = sqlContext.createDataFrame(m, s1)
val d1 = sqlContext.createDataFrame(d, s)
val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer");
j1.take(1)

Returns empty data set


> Left outer join produces incorrect output when the join condition does not 
> have left table key
> --
>
> Key: SPARK-14854
> URL: https://issues.apache.org/jira/browse/SPARK-14854
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: kanika dhuria
>
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val s = StructType(StructField("num", StringType, true)::Nil)
> val s1 = StructType(StructField("num1", StringType, true)::Nil)
> val m = 
> sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0)))
> val d = 
> sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0)))
> val m1 = sqlContext.createDataFrame(m, s1)
> val d1 = sqlContext.createDataFrame(d, s)
> val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer");
> j1.take(1)
> Returns empty data set. Left table has data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14854) Left outer join produces incorrect output when the join condition does not have left table key

2016-04-22 Thread kanika dhuria (JIRA)
kanika dhuria created SPARK-14854:
-

 Summary: Left outer join produces incorrect output when the join 
condition does not have left table key
 Key: SPARK-14854
 URL: https://issues.apache.org/jira/browse/SPARK-14854
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.1
Reporter: kanika dhuria


import org.apache.spark.sql._
import org.apache.spark.sql.types._


val s = StructType(StructField("num", StringType, true)::Nil)
val s1 = StructType(StructField("num1", StringType, true)::Nil)

val m = sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0)))
val d = sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0)))
val m1 = sqlContext.createDataFrame(m, s1)
val d1 = sqlContext.createDataFrame(d, s)
val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer");
j1.take(1)

Returns empty data set



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org