[jira] [Created] (SPARK-26812) PushProjectionThroughUnion nullability issue

2019-02-01 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-26812:
---

 Summary: PushProjectionThroughUnion nullability issue
 Key: SPARK-26812
 URL: https://issues.apache.org/jira/browse/SPARK-26812
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Bogdan Raducanu


Union output data types are the output data types of the first child.
However the other union children may have different values nullability.
This means that we can't always push down a project on the children.

To reproduce
{code}
Seq(Map("foo" -> "bar")).toDF("a").write.saveAsTable("table1")
sql("SELECT 1 AS b").write.saveAsTable("table2")
sql("CREATE OR REPLACE VIEW test1 AS SELECT map() AS a FROM table2 UNION ALL 
SELECT a FROM table1")
 sql("select * from test1").show
{code}

This fails becaus the plan is no longer resolved.
The plan is broken by the PushProjectionThroughUnion rule which pushed down a 
cast to map with values nullability=true on a child with type 
map with values nullability=false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25209) Optimization in Dataset.apply for DataFrames

2018-08-23 Thread Bogdan Raducanu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-25209:

Issue Type: Improvement  (was: Bug)

> Optimization in Dataset.apply for DataFrames
> 
>
> Key: SPARK-25209
> URL: https://issues.apache.org/jira/browse/SPARK-25209
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Bogdan Raducanu
>Priority: Major
>
> {{Dataset.apply}} calls {{dataset.deserializer}} (to provide an early error) 
> which ends up calling the full {{Analyzer}} on the deserializer. This can 
> take tens of milliseconds, depending on how big the plan is.
>  Since {{Dataset.apply}} is called for many {{Dataset}} operations such as 
> {{Dataset.where}} it can be a significant overhead for short queries.
> In the following code: {{duration}} is *17 ms* in current spark *vs 1 ms* 
>  if I remove the line {{dataset.deserializer}}.
> It seems the resulting {{deserializer}} is particularly big in the case of 
> nested schema, but the same overhead can be observed if we have a very wide 
> flat schema.
>  According to a comment in the PR that introduced this check, we can at least 
> remove this check for {{DataFrames}}: 
> [https://github.com/apache/spark/pull/20402#discussion_r164338267]
> {code}
> val col = "named_struct(" +
>   (0 until 100).map { i => s"'col$i', id"}.mkString(",") + ")"
> val df = spark.range(10).selectExpr(col)
> val TRUE = lit(true)
> val numIter = 1000
> var startTime = System.nanoTime()
> for(i <- 0 until numIter) {
>   df.where(TRUE)
> }
> val durationMs = (System.nanoTime() - startTime) / numIter / 100
> println(s"duration $durationMs")
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25212) Support Filter in ConvertToLocalRelation

2018-08-23 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-25212:
---

 Summary: Support Filter in ConvertToLocalRelation
 Key: SPARK-25212
 URL: https://issues.apache.org/jira/browse/SPARK-25212
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.1
Reporter: Bogdan Raducanu


ConvertToLocalRelation can make short queries faster but currently it only 
supports Project and Limit.
It can be extended with other operators such as Filter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25209) Optimization in Dataset.apply for DataFrames

2018-08-23 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-25209:
---

 Summary: Optimization in Dataset.apply for DataFrames
 Key: SPARK-25209
 URL: https://issues.apache.org/jira/browse/SPARK-25209
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: Bogdan Raducanu


{{Dataset.apply}} calls {{dataset.deserializer}} (to provide an early error) 
which ends up calling the full {{Analyzer}} on the deserializer. This can take 
tens of milliseconds, depending on how big the plan is.
 Since {{Dataset.apply}} is called for many {{Dataset}} operations such as 
{{Dataset.where}} it can be a significant overhead for short queries.

In the following code: {{duration}} is *17 ms* in current spark *vs 1 ms* 
 if I remove the line {{dataset.deserializer}}.
It seems the resulting {{deserializer}} is particularly big in the case of 
nested schema, but the same overhead can be observed if we have a very wide 
flat schema.
 According to a comment in the PR that introduced this check, we can at least 
remove this check for {{DataFrames}}: 
[https://github.com/apache/spark/pull/20402#discussion_r164338267]
{code}
val col = "named_struct(" +
  (0 until 100).map { i => s"'col$i', id"}.mkString(",") + ")"
val df = spark.range(10).selectExpr(col)
val TRUE = lit(true)
val numIter = 1000
var startTime = System.nanoTime()
for(i <- 0 until numIter) {
  df.where(TRUE)
}
val durationMs = (System.nanoTime() - startTime) / numIter / 100
println(s"duration $durationMs")
 {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24500) UnsupportedOperationException when trying to execute Union plan with Stream of children

2018-06-08 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-24500:
---

 Summary: UnsupportedOperationException when trying to execute 
Union plan with Stream of children
 Key: SPARK-24500
 URL: https://issues.apache.org/jira/browse/SPARK-24500
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Bogdan Raducanu


To reproduce:
{code}
import org.apache.spark.sql.catalyst.plans.logical._
def range(i: Int) = Range(1, i, 1, 1)
val union = Union(Stream(range(3), range(5), range(7)))
spark.sessionState.planner.plan(union).next().execute()
{code}

produces

{code}
java.lang.UnsupportedOperationException
  at 
org.apache.spark.sql.execution.PlanLater.doExecute(SparkStrategies.scala:55)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
{code}

The SparkPlan looks like this:

{code}
:- Range (1, 3, step=1, splits=1)
:- PlanLater Range (1, 5, step=1, splits=Some(1))
+- PlanLater Range (1, 7, step=1, splits=Some(1))
{code}

So not all of it was planned (some PlanLater still in there).
This appears to be a longstanding issue.
I traced it to the use of var in TreeNode.
For example in mapChildren:
{code}
case args: Traversable[_] => args.map {
  case arg: TreeNode[_] if containsChild(arg) =>
val newChild = f(arg.asInstanceOf[BaseType])
if (!(newChild fastEquals arg)) {
  changed = true
{code}

If args is a Stream then changed will never be set here, ultimately causing the 
method to return the original plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24495) SortMergeJoin with duplicate keys wrong results

2018-06-08 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-24495:
---

 Summary: SortMergeJoin with duplicate keys wrong results
 Key: SPARK-24495
 URL: https://issues.apache.org/jira/browse/SPARK-24495
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Bogdan Raducanu


To reproduce:
{code:java}
// the bug is in SortMergeJoin but the Shuffles are correct. with the default 
200 it might split the data in such small partitions that the SortMergeJoin 
cannot return wrong results anymore
spark.conf.set("spark.sql.shuffle.partitions", "1")
// disable this, otherwise it would filter results before join, hiding the bug
spark.conf.set("spark.sql.constraintPropagation.enabled", "false")
sql("select id as a1 from range(1000)").createOrReplaceTempView("t1")
sql("select id * 2 as b1, -id as b2 from 
range(1000)").createOrReplaceTempView("t2")

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "-1")
sql("""select b1, a1, b2 FROM t1 INNER JOIN t2 ON b1 = a1 AND b2 = a1""").show
{code}
In the results, it's expected that all columns are equal (see join condition).

But the result is:
{code:java}
+---+---+---+
| b1| a1| b2|
+---+---+---+
|  0|  0|  0|
|  2|  2| -1|
|  4|  4| -2|
|  6|  6| -3|
|  8|  8| -4|

{code}
I traced it to {{EnsureRequirements.reorder}} which was introduced by 
[https://github.com/apache/spark/pull/16985] and 
[https://github.com/apache/spark/pull/20041]

It leads to an incorrect plan:
{code:java}
== Physical Plan ==
*(5) Project [b1#735672L, a1#735669L, b2#735673L]
+- *(5) SortMergeJoin [a1#735669L, a1#735669L], [b1#735672L, b1#735672L], Inner
   :- *(2) Sort [a1#735669L ASC NULLS FIRST, a1#735669L ASC NULLS FIRST], 
false, 0
   :  +- Exchange hashpartitioning(a1#735669L, a1#735669L, 1)
   : +- *(1) Project [id#735670L AS a1#735669L]
   :+- *(1) Range (0, 1000, step=1, splits=8)
   +- *(4) Sort [b1#735672L ASC NULLS FIRST, b2#735673L ASC NULLS FIRST], 
false, 0
  +- Exchange hashpartitioning(b1#735672L, b2#735673L, 1)
 +- *(3) Project [(id#735674L * 2) AS b1#735672L, -id#735674L AS 
b2#735673L]
+- *(3) Range (0, 1000, step=1, splits=8)
{code}
The SortMergeJoin keys are wrong: key b2 is missing completely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350196#comment-16350196
 ] 

Bogdan Raducanu commented on SPARK-23316:
-

I'll work on a fix

> AnalysisException after max iteration reached for IN query
> --
>
> Key: SPARK-23316
> URL: https://issues.apache.org/jira/browse/SPARK-23316
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Bogdan Raducanu
>Priority: Major
>
> Query to reproduce:
> {code:scala}
> spark.range(10).where("(id,id) in (select id, null from range(3))").show
> {code}
> {code}
> 18/02/02 11:32:31 WARN BaseSessionStateBuilder$$anon$1: Max iterations (100) 
> reached for batch Resolution
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('id', 
> `id`, 'id', `id`) IN (listquery()))' due to data type mismatch:
> The data type of one or more elements in the left hand side of an IN subquery
> is not compatible with the data type of the output of the subquery
> Mismatched columns:
> []
> Left side:
> [bigint, bigint].
> Right side:
> [bigint, bigint].;;
> {code}
> The error message includes the last plan which contains ~100 useless Projects.
> Does not happen in branch-2.2.
> It has something to do with TypeCoercion, it is doing a futile attempt  to 
> change nullability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-23316:

Affects Version/s: 2.4.0

> AnalysisException after max iteration reached for IN query
> --
>
> Key: SPARK-23316
> URL: https://issues.apache.org/jira/browse/SPARK-23316
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Bogdan Raducanu
>Priority: Major
>
> Query to reproduce:
> {code:scala}
> spark.range(10).where("(id,id) in (select id, null from range(3))").show
> {code}
> {code}
> 18/02/02 11:32:31 WARN BaseSessionStateBuilder$$anon$1: Max iterations (100) 
> reached for batch Resolution
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('id', 
> `id`, 'id', `id`) IN (listquery()))' due to data type mismatch:
> The data type of one or more elements in the left hand side of an IN subquery
> is not compatible with the data type of the output of the subquery
> Mismatched columns:
> []
> Left side:
> [bigint, bigint].
> Right side:
> [bigint, bigint].;;
> {code}
> The error message includes the last plan which contains ~100 useless Projects.
> Does not happen in branch-2.2.
> It has something to do with TypeCoercion, it is doing a futile attempt  to 
> change nullability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-23316:
---

 Summary: AnalysisException after max iteration reached for IN query
 Key: SPARK-23316
 URL: https://issues.apache.org/jira/browse/SPARK-23316
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Bogdan Raducanu


Query to reproduce:
{code:scala}
spark.range(10).where("(id,id) in (select id, null from range(3))").show
{code}

{code}
18/02/02 11:32:31 WARN BaseSessionStateBuilder$$anon$1: Max iterations (100) 
reached for batch Resolution
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('id', 
`id`, 'id', `id`) IN (listquery()))' due to data type mismatch:
The data type of one or more elements in the left hand side of an IN subquery
is not compatible with the data type of the output of the subquery
Mismatched columns:
[]
Left side:
[bigint, bigint].
Right side:
[bigint, bigint].;;
{code}

The error message includes the last plan which contains ~100 useless Projects.
Does not happen in branch-2.2.
It has something to do with TypeCoercion, it is doing a futile attempt  to 
change nullability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350188#comment-16350188
 ] 

Bogdan Raducanu edited comment on SPARK-23316 at 2/2/18 11:29 AM:
--

I think it's related to {{In.checkInputTypes}} in 2.2: it does not check 
nullability while in 2.3 it does , by using {{DataType.equalsStructurally}}


was (Author: bograd):
I think it's related to {{In.checkInputTypes}} in 2.2 it does not check 
nullability while in 2.3 it does , by using {{DataType.equalsStructurally}}

> AnalysisException after max iteration reached for IN query
> --
>
> Key: SPARK-23316
> URL: https://issues.apache.org/jira/browse/SPARK-23316
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>Priority: Major
>
> Query to reproduce:
> {code:scala}
> spark.range(10).where("(id,id) in (select id, null from range(3))").show
> {code}
> {code}
> 18/02/02 11:32:31 WARN BaseSessionStateBuilder$$anon$1: Max iterations (100) 
> reached for batch Resolution
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('id', 
> `id`, 'id', `id`) IN (listquery()))' due to data type mismatch:
> The data type of one or more elements in the left hand side of an IN subquery
> is not compatible with the data type of the output of the subquery
> Mismatched columns:
> []
> Left side:
> [bigint, bigint].
> Right side:
> [bigint, bigint].;;
> {code}
> The error message includes the last plan which contains ~100 useless Projects.
> Does not happen in branch-2.2.
> It has something to do with TypeCoercion, it is doing a futile attempt  to 
> change nullability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350188#comment-16350188
 ] 

Bogdan Raducanu commented on SPARK-23316:
-

I think it's related to {{In.checkInputTypes}} in 2.2 it does not check 
nullability while in 2.3 it does , by using {{DataType.equalsStructurally}}

> AnalysisException after max iteration reached for IN query
> --
>
> Key: SPARK-23316
> URL: https://issues.apache.org/jira/browse/SPARK-23316
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>Priority: Major
>
> Query to reproduce:
> {code:scala}
> spark.range(10).where("(id,id) in (select id, null from range(3))").show
> {code}
> {code}
> 18/02/02 11:32:31 WARN BaseSessionStateBuilder$$anon$1: Max iterations (100) 
> reached for batch Resolution
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('id', 
> `id`, 'id', `id`) IN (listquery()))' due to data type mismatch:
> The data type of one or more elements in the left hand side of an IN subquery
> is not compatible with the data type of the output of the subquery
> Mismatched columns:
> []
> Left side:
> [bigint, bigint].
> Right side:
> [bigint, bigint].;;
> {code}
> The error message includes the last plan which contains ~100 useless Projects.
> Does not happen in branch-2.2.
> It has something to do with TypeCoercion, it is doing a futile attempt  to 
> change nullability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23148) spark.read.csv with multiline=true gives FileNotFoundException if path contains spaces

2018-01-19 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332041#comment-16332041
 ] 

Bogdan Raducanu edited comment on SPARK-23148 at 1/19/18 10:18 AM:
---

I updated the description with manul escape, if that is what you meant


was (Author: bograd):
What do you mean?

> spark.read.csv with multiline=true gives FileNotFoundException if path 
> contains spaces
> --
>
> Key: SPARK-23148
> URL: https://issues.apache.org/jira/browse/SPARK-23148
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Bogdan Raducanu
>Priority: Major
>
> Repro code:
> {code:java}
> spark.range(10).write.csv("/tmp/a b c/a.csv")
> spark.read.option("multiLine", false).csv("/tmp/a b c/a.csv").count
> 10
> spark.read.option("multiLine", true).csv("/tmp/a b c/a.csv").count
> java.io.FileNotFoundException: File 
> file:/tmp/a%20b%20c/a.csv/part-0-cf84f9b2-5fe6-4f54-a130-a1737689db00-c000.csv
>  does not exist
> {code}
> Trying to manually escape fails in a different place:
> {code}
> spark.read.option("multiLine", true).csv("/tmp/a%20b%20c/a.csv").count
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/tmp/a%20b%20c/a.csv;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:683)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:387)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:387)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23148) spark.read.csv with multiline=true gives FileNotFoundException if path contains spaces

2018-01-19 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-23148:

Description: 
Repro code:
{code:java}
spark.range(10).write.csv("/tmp/a b c/a.csv")
spark.read.option("multiLine", false).csv("/tmp/a b c/a.csv").count
10
spark.read.option("multiLine", true).csv("/tmp/a b c/a.csv").count
java.io.FileNotFoundException: File 
file:/tmp/a%20b%20c/a.csv/part-0-cf84f9b2-5fe6-4f54-a130-a1737689db00-c000.csv
 does not exist
{code}

Trying to manually escape fails in a different place:
{code}
spark.read.option("multiLine", true).csv("/tmp/a%20b%20c/a.csv").count
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/tmp/a%20b%20c/a.csv;
  at 
org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:683)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:387)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:387)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
{code}

  was:
Repro code:
{code:java}
spark.range(10).write.csv("/tmp/a b c/a.csv")
spark.read.option("multiLine", false).csv("/tmp/a b c/a.csv").count
10
spark.read.option("multiLine", true).csv("/tmp/a b c/a.csv").count
java.io.FileNotFoundException: File 
file:/tmp/a%20b%20c/a.csv/part-0-cf84f9b2-5fe6-4f54-a130-a1737689db00-c000.csv
 does not exist
{code}


> spark.read.csv with multiline=true gives FileNotFoundException if path 
> contains spaces
> --
>
> Key: SPARK-23148
> URL: https://issues.apache.org/jira/browse/SPARK-23148
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Bogdan Raducanu
>Priority: Major
>
> Repro code:
> {code:java}
> spark.range(10).write.csv("/tmp/a b c/a.csv")
> spark.read.option("multiLine", false).csv("/tmp/a b c/a.csv").count
> 10
> spark.read.option("multiLine", true).csv("/tmp/a b c/a.csv").count
> java.io.FileNotFoundException: File 
> file:/tmp/a%20b%20c/a.csv/part-0-cf84f9b2-5fe6-4f54-a130-a1737689db00-c000.csv
>  does not exist
> {code}
> Trying to manually escape fails in a different place:
> {code}
> spark.read.option("multiLine", true).csv("/tmp/a%20b%20c/a.csv").count
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/tmp/a%20b%20c/a.csv;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:683)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:387)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:387)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23148) spark.read.csv with multiline=true gives FileNotFoundException if path contains spaces

2018-01-19 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332041#comment-16332041
 ] 

Bogdan Raducanu commented on SPARK-23148:
-

What do you mean?

> spark.read.csv with multiline=true gives FileNotFoundException if path 
> contains spaces
> --
>
> Key: SPARK-23148
> URL: https://issues.apache.org/jira/browse/SPARK-23148
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Bogdan Raducanu
>Priority: Major
>
> Repro code:
> {code:java}
> spark.range(10).write.csv("/tmp/a b c/a.csv")
> spark.read.option("multiLine", false).csv("/tmp/a b c/a.csv").count
> 10
> spark.read.option("multiLine", true).csv("/tmp/a b c/a.csv").count
> java.io.FileNotFoundException: File 
> file:/tmp/a%20b%20c/a.csv/part-0-cf84f9b2-5fe6-4f54-a130-a1737689db00-c000.csv
>  does not exist
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23148) spark.read.csv with multiline=true gives FileNotFoundException if path contains spaces

2018-01-18 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-23148:
---

 Summary: spark.read.csv with multiline=true gives 
FileNotFoundException if path contains spaces
 Key: SPARK-23148
 URL: https://issues.apache.org/jira/browse/SPARK-23148
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Bogdan Raducanu


Repro code:
{code:java}
spark.range(10).write.csv("/tmp/a b c/a.csv")
spark.read.option("multiLine", false).csv("/tmp/a b c/a.csv").count
10
spark.read.option("multiLine", true).csv("/tmp/a b c/a.csv").count
java.io.FileNotFoundException: File 
file:/tmp/a%20b%20c/a.csv/part-0-cf84f9b2-5fe6-4f54-a130-a1737689db00-c000.csv
 does not exist
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22398) Partition directories with leading 0s cause wrong results

2017-10-30 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-22398:
---

 Summary: Partition directories with leading 0s cause wrong results
 Key: SPARK-22398
 URL: https://issues.apache.org/jira/browse/SPARK-22398
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Bogdan Raducanu


Repro case:
{code}
spark.range(8).selectExpr("'0' || cast(id as string) as id", "id as 
b").write.mode("overwrite").partitionBy("id").parquet("/tmp/bug1")
spark.read.parquet("/tmp/bug1").where("id in ('01')").show
+---+---+
|  b| id|
+---+---+
+---+---+
spark.read.parquet("/tmp/bug1").where("id = '01'").show
+---+---+
|  b| id|
+---+---+
|  1|  1|
+---+---+
{code}

I think somewhere there is some special handling of this case for equals but 
not the same for IN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21969) CommandUtils.updateTableStats should call refreshTable

2017-09-10 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21969:

Description: 
The table is cached so even though statistics are removed, they will still be 
used by the existing sessions.


{code}
spark.range(100).write.saveAsTable("tab1")
sql("analyze table tab1 compute statistics")
sql("explain cost select distinct * from tab1").show(false)
{code}

Produces:
{code}
Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
{code}


{code}
spark.range(100).write.mode("append").saveAsTable("tab1")
sql("explain cost select distinct * from tab1").show(false)
{code}

After append something, the same stats are used
{code}
Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
{code}

Manually refreshing the table removes the stats
{code}
spark.sessionState.catalog.refreshTable(TableIdentifier("tab1"))
sql("explain cost select distinct * from tab1").show(false)
{code}

{code}
Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none)
{code}

  was:
The table is cached so even though statistics are removed, they will still be 
used by the existing sessions.


{{
spark.range(100).write.saveAsTable("tab1")
sql("analyze table tab1 compute statistics")
sql("explain cost select distinct * from tab1").show(false)
}}

Produces:
{{
Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
}}


{{
spark.range(100).write.mode("append").saveAsTable("tab1")
sql("explain cost select distinct * from tab1").show(false)
}}

After append something, the same stats are used
{{
Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
}}

Manually refreshing the table removes the stats
{{
spark.sessionState.catalog.refreshTable(TableIdentifier("tab1"))
sql("explain cost select distinct * from tab1").show(false)
}}

{{
Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none)
}}


> CommandUtils.updateTableStats should call refreshTable
> --
>
> Key: SPARK-21969
> URL: https://issues.apache.org/jira/browse/SPARK-21969
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The table is cached so even though statistics are removed, they will still be 
> used by the existing sessions.
> {code}
> spark.range(100).write.saveAsTable("tab1")
> sql("analyze table tab1 compute statistics")
> sql("explain cost select distinct * from tab1").show(false)
> {code}
> Produces:
> {code}
> Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
> hints=none)
> {code}
> {code}
> spark.range(100).write.mode("append").saveAsTable("tab1")
> sql("explain cost select distinct * from tab1").show(false)
> {code}
> After append something, the same stats are used
> {code}
> Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
> hints=none)
> {code}
> Manually refreshing the table removes the stats
> {code}
> spark.sessionState.catalog.refreshTable(TableIdentifier("tab1"))
> sql("explain cost select distinct * from tab1").show(false)
> {code}
> {code}
> Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21969) CommandUtils.updateTableStats should call refreshTable

2017-09-10 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21969:

Description: 
The table is cached so even though statistics are removed, they will still be 
used by the existing sessions.


{{
spark.range(100).write.saveAsTable("tab1")
sql("analyze table tab1 compute statistics")
sql("explain cost select distinct * from tab1").show(false)
}}

Produces:
{{
Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
}}


{{
spark.range(100).write.mode("append").saveAsTable("tab1")
sql("explain cost select distinct * from tab1").show(false)
}}

After append something, the same stats are used
{{
Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
}}

Manually refreshing the table removes the stats
{{
spark.sessionState.catalog.refreshTable(TableIdentifier("tab1"))
sql("explain cost select distinct * from tab1").show(false)
}}

{{
Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none)
}}

  was:
The table is cached so even though statistics are removed, they will still be 
used by the existing sessions.


{{code}}
spark.range(100).write.saveAsTable("tab1")
sql("analyze table tab1 compute statistics")
sql("explain cost select distinct * from tab1").show(false)
{{code}}

Produces:
{{code}}
Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
{{code}}


{{code}}
spark.range(100).write.mode("append").saveAsTable("tab1")
sql("explain cost select distinct * from tab1").show(false)
{{code}}

After append something, the same stats are used
{{code}}
Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
{{code}}

Manually refreshing the table removes the stats
{{code}}
spark.sessionState.catalog.refreshTable(TableIdentifier("tab1"))
sql("explain cost select distinct * from tab1").show(false)
{{code}}

{{code}}
Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none)
{{code}}


> CommandUtils.updateTableStats should call refreshTable
> --
>
> Key: SPARK-21969
> URL: https://issues.apache.org/jira/browse/SPARK-21969
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The table is cached so even though statistics are removed, they will still be 
> used by the existing sessions.
> {{
> spark.range(100).write.saveAsTable("tab1")
> sql("analyze table tab1 compute statistics")
> sql("explain cost select distinct * from tab1").show(false)
> }}
> Produces:
> {{
> Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
> hints=none)
> }}
> {{
> spark.range(100).write.mode("append").saveAsTable("tab1")
> sql("explain cost select distinct * from tab1").show(false)
> }}
> After append something, the same stats are used
> {{
> Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
> hints=none)
> }}
> Manually refreshing the table removes the stats
> {{
> spark.sessionState.catalog.refreshTable(TableIdentifier("tab1"))
> sql("explain cost select distinct * from tab1").show(false)
> }}
> {{
> Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none)
> }}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21969) CommandUtils.updateTableStats should call refreshTable

2017-09-10 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-21969:
---

 Summary: CommandUtils.updateTableStats should call refreshTable
 Key: SPARK-21969
 URL: https://issues.apache.org/jira/browse/SPARK-21969
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Bogdan Raducanu


The table is cached so even though statistics are removed, they will still be 
used by the existing sessions.


{{code}}
spark.range(100).write.saveAsTable("tab1")
sql("analyze table tab1 compute statistics")
sql("explain cost select distinct * from tab1").show(false)
{{code}}

Produces:
{{code}}
Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
{{code}}


{{code}}
spark.range(100).write.mode("append").saveAsTable("tab1")
sql("explain cost select distinct * from tab1").show(false)
{{code}}

After append something, the same stats are used
{{code}}
Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, 
hints=none)
{{code}}

Manually refreshing the table removes the stats
{{code}}
spark.sessionState.catalog.refreshTable(TableIdentifier("tab1"))
sql("explain cost select distinct * from tab1").show(false)
{{code}}

{{code}}
Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none)
{{code}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-21627) analyze hive table compute stats for columns with mixed case exception

2017-08-04 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu closed SPARK-21627.
---
Resolution: Duplicate

> analyze hive table compute stats for columns with mixed case exception
> --
>
> Key: SPARK-21627
> URL: https://issues.apache.org/jira/browse/SPARK-21627
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> {code}
> sql("create table tabel1(b int) partitioned by (partColumn int)")
> sql("analyze table tabel1 compute statistics for columns partColumn, b")
> {code}
> {code}
> java.util.NoSuchElementException: key not found: partColumn
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at scala.collection.AbstractMap.default(Map.scala:59)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at scala.collection.AbstractMap.apply(Map.scala:59)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
>   at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
>   at 
> org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
>   ... 39 elided
> {code}
> Looks like regression introduced by https://github.com/apache/spark/pull/18248
> In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower 
> case column names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21627) analyze hive table compute stats for columns with mixed case exception

2017-08-04 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114163#comment-16114163
 ] 

Bogdan Raducanu commented on SPARK-21627:
-

You're right, it's fixed.

> analyze hive table compute stats for columns with mixed case exception
> --
>
> Key: SPARK-21627
> URL: https://issues.apache.org/jira/browse/SPARK-21627
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> {code}
> sql("create table tabel1(b int) partitioned by (partColumn int)")
> sql("analyze table tabel1 compute statistics for columns partColumn, b")
> {code}
> {code}
> java.util.NoSuchElementException: key not found: partColumn
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at scala.collection.AbstractMap.default(Map.scala:59)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at scala.collection.AbstractMap.apply(Map.scala:59)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
>   at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
>   at 
> org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
>   ... 39 elided
> {code}
> Looks like regression introduced by https://github.com/apache/spark/pull/18248
> In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower 
> case column names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21627) analyze hive table compute stats for columns with mixed case exception

2017-08-03 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21627:

Summary: analyze hive table compute stats for columns with mixed case 
exception  (was: hive compute stats for columns exception with column name 
camel case)

> analyze hive table compute stats for columns with mixed case exception
> --
>
> Key: SPARK-21627
> URL: https://issues.apache.org/jira/browse/SPARK-21627
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> {code}
> sql("create table tabel1(b int) partitioned by (partColumn int)")
> sql("analyze table tabel1 compute statistics for columns partColumn, b")
> {code}
> {code}
> java.util.NoSuchElementException: key not found: partColumn
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at scala.collection.AbstractMap.default(Map.scala:59)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at scala.collection.AbstractMap.apply(Map.scala:59)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
>   at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
>   at 
> org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
>   ... 39 elided
> {code}
> Looks like regression introduced by https://github.com/apache/spark/pull/18248
> In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower 
> case column names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21627) hive compute stats for columns exception with column name camel case

2017-08-03 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112851#comment-16112851
 ] 

Bogdan Raducanu commented on SPARK-21627:
-

I expect it fails only in master branch. That's why it's 3.0

> hive compute stats for columns exception with column name camel case
> 
>
> Key: SPARK-21627
> URL: https://issues.apache.org/jira/browse/SPARK-21627
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Bogdan Raducanu
>
> {code}
> sql("create table tabel1(b int) partitioned by (partColumn int)")
> sql("analyze table tabel1 compute statistics for columns partColumn, b")
> {code}
> {code}
> java.util.NoSuchElementException: key not found: partColumn
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at scala.collection.AbstractMap.default(Map.scala:59)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at scala.collection.AbstractMap.apply(Map.scala:59)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
>   at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
>   at 
> org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
>   ... 39 elided
> {code}
> Looks like regression introduced by https://github.com/apache/spark/pull/18248
> In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower 
> case column names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21627) hive compute stats for columns exception with column name camel case

2017-08-03 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21627:

Description: 
{code}
sql("create table tabel1(b int) partitioned by (partColumn int)")
sql("analyze table tabel1 compute statistics for columns partColumn, b")
{code}
{code}
java.util.NoSuchElementException: key not found: partColumn
  at scala.collection.MapLike$class.default(MapLike.scala:228)
  at scala.collection.AbstractMap.default(Map.scala:59)
  at scala.collection.MapLike$class.apply(MapLike.scala:141)
  at scala.collection.AbstractMap.apply(Map.scala:59)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
  at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
  at 
org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035)
  at org.apache.spark.sql.Dataset.(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
  ... 39 elided
{code}

Looks like regression introduced by https://github.com/apache/spark/pull/18248

In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower 
case column names.



  was:
{code}
sql("create table tabel1(b int) partitioned by (partColumn int)")
sql("analyze table tabel1 compute statistics for columns partColumn, b")
{code}
{code}
java.util.NoSuchElementException: key not found: partColumn
  at scala.collection.MapLike$class.default(MapLike.scala:228)
  at scala.collection.AbstractMap.default(Map.scala:59)
  at scala.collection.MapLike$class.apply(MapLike.scala:141)
  at scala.collection.AbstractMap.apply(Map.scala:59)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
  at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
  at 
org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
  at 

[jira] [Updated] (SPARK-21627) hive compute stats for columns exception with column name camel case

2017-08-03 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21627:

Description: 
{code}
sql("create table tabel1(b int) partitioned by (partColumn int)")
sql("analyze table tabel1 compute statistics for columns partColumn, b")
{code}
{code}
java.util.NoSuchElementException: key not found: partColumn
  at scala.collection.MapLike$class.default(MapLike.scala:228)
  at scala.collection.AbstractMap.default(Map.scala:59)
  at scala.collection.MapLike$class.apply(MapLike.scala:141)
  at scala.collection.AbstractMap.apply(Map.scala:59)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
  at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
  at 
org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035)
  at org.apache.spark.sql.Dataset.(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
  ... 39 elided
{code}

Looks like regression introduced by https://github.com/apache/spark/pull/18248

In {code}HiveExternalCatalog.alterTableStats{code} {code}colNameTypeMap{code} 
contains lower case column names.



  was:
{code}
sql("create table tabel1(b int) partitioned by (partColumn int)")
sql("analyze table tabel1 compute statistics for columns partColumn, b")
{code}
{code}
java.util.NoSuchElementException: key not found: partColumn
  at scala.collection.MapLike$class.default(MapLike.scala:228)
  at scala.collection.AbstractMap.default(Map.scala:59)
  at scala.collection.MapLike$class.apply(MapLike.scala:141)
  at scala.collection.AbstractMap.apply(Map.scala:59)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
  at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
  at 
org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
  at 

[jira] [Created] (SPARK-21627) hive compute stats for columns exception with column name camel case

2017-08-03 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-21627:
---

 Summary: hive compute stats for columns exception with column name 
camel case
 Key: SPARK-21627
 URL: https://issues.apache.org/jira/browse/SPARK-21627
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Bogdan Raducanu


{code}
sql("create table tabel1(b int) partitioned by (partColumn int)")
sql("analyze table tabel1 compute statistics for columns partColumn, b")
{code}
{code}
java.util.NoSuchElementException: key not found: partColumn
  at scala.collection.MapLike$class.default(MapLike.scala:228)
  at scala.collection.AbstractMap.default(Map.scala:59)
  at scala.collection.MapLike$class.apply(MapLike.scala:141)
  at scala.collection.AbstractMap.apply(Map.scala:59)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647)
  at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375)
  at 
org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035)
  at org.apache.spark.sql.Dataset.(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636)
  ... 39 elided
{code}

Looks like regression introduced by https://github.com/apache/spark/pull/18248
in {code}HiveExternalCatalog.alterTable{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21271) UnsafeRow.hashCode assertion when sizeInBytes not multiple of 8

2017-06-30 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-21271:
---

 Summary: UnsafeRow.hashCode assertion when sizeInBytes not 
multiple of 8
 Key: SPARK-21271
 URL: https://issues.apache.org/jira/browse/SPARK-21271
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Bogdan Raducanu


The method is:

{code}
public int hashCode() {
return Murmur3_x86_32.hashUnsafeWords(baseObject, baseOffset, sizeInBytes, 
42);
  }
{code}

but sizeInBytes is not always a multiple of 8 (in which case hashUnsafeWords 
throws assertion) - for example here: 
{code}FixedLengthRowBasedKeyValueBatch.appendRow{code}

The fix could be to use hashUnsafeBytes or to use hashUnsafeWords but on a 
prefix that is multiple of 8.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21228) InSet incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064970#comment-16064970
 ] 

Bogdan Raducanu commented on SPARK-21228:
-

InSubquery.doCodeGen is using InSet directly (although InSubquery itself is 
never used) so a fix should consider this too.

> InSet incorrect handling of structs
> ---
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet uses hset.contains (both in 
> doCodeGen and eval) which will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
> UnsafeRows while the list of structs that will become hset will be 
> GenericInternalRows
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen uses compareStructs and seems to work. In.eval might not work 
> but not sure how to reproduce.
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet or not 
> trigger InSet optimization at all in this case.
> Need to investigate if In.eval is affected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21228) InSet incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064945#comment-16064945
 ] 

Bogdan Raducanu commented on SPARK-21228:
-

I tested manually (since there is no flag to disable codegen for expressions) 
that In.eval also fails, so only In.doCodeGen appears correct.

> InSet incorrect handling of structs
> ---
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet uses hset.contains (both in 
> doCodeGen and eval) which will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
> UnsafeRows while the list of structs that will become hset will be 
> GenericInternalRows
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen uses compareStructs and seems to work. In.eval might not work 
> but not sure how to reproduce.
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet or not 
> trigger InSet optimization at all in this case.
> Need to investigate if In.eval is affected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21228) InSet incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21228:

Description: 
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet uses hset.contains (both in 
doCodeGen and eval) which will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
UnsafeRows while the list of structs that will become hset will be 
GenericInternalRows
++
|minA|
++
++
{code}

In.doCodeGen uses compareStructs and seems to work. In.eval might not work but 
not sure how to reproduce.

{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet or not trigger 
InSet optimization at all in this case.
Need to investigate if In.eval is affected.


  was:
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
UnsafeRows while the list of structs that will become hset will be 
GenericInternalRows
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.



> InSet incorrect handling of structs
> ---
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet uses hset.contains (both in 
> doCodeGen and eval) which will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
> UnsafeRows while the list of structs that will become hset will be 
> GenericInternalRows
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen uses compareStructs and seems to work. In.eval might not work 
> but not sure how to reproduce.
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet or not 
> trigger InSet optimization at all in this case.
> Need to investigate if In.eval is affected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Updated] (SPARK-21228) InSet incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21228:

Summary: InSet incorrect handling of structs  (was: InSet.doCodeGen 
incorrect handling of structs)

> InSet incorrect handling of structs
> ---
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
> will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
> UnsafeRows while the list of structs that will become hset will be 
> GenericInternalRows
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen appears to be correct:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
> not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21228) InSet.doCodeGen incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21228:

Description: 
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
UnsafeRows while the list of structs that will become hset will be 
GenericInternalRows
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.


  was:
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
UnsafeRows while the list of structs that will become hset will be 
GenericInternalRows
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.


> InSet.doCodeGen incorrect handling of structs
> -
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
> will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
> UnsafeRows while the list of structs that will become hset will be 
> GenericInternalRows
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen appears to be correct:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
> not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21228) InSet.doCodeGen incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21228:

Description: 
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
UnsafeRows while the list of structs that will become hset will be 
GenericInternalRows
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.

  was:
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show -- the Aggregate here will return 
UnsafeRows while the list of structs that will become hset will be 
GenericInternalRows
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.


> InSet.doCodeGen incorrect handling of structs
> -
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
> will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return 
> UnsafeRows while the list of structs that will become hset will be 
> GenericInternalRows
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen appears to be correct:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
> not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21228) InSet.doCodeGen incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21228:

Description: 
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show -- the Aggregate here will return 
UnsafeRows while the list of structs that will become hset will be 
GenericInternalRows
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.

  was:
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.


> InSet.doCodeGen incorrect handling of structs
> -
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
> will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show -- the Aggregate here will return 
> UnsafeRows while the list of structs that will become hset will be 
> GenericInternalRows
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen appears to be correct:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
> not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21228) InSet.doCodeGen incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21228:

Description: 
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.

  was:
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
```
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show
++
|minA|
++
++
```
In.doCodeGen appears to be correct:
```
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
+-+
| minA|
+-+
|[1,1]|
+-+
```

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.


> InSet.doCodeGen incorrect handling of structs
> -
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
> will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen appears to be correct:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
> not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21228) InSet.doCodeGen incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21228:

Description: 
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.

  was:
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show
++
|minA|
++
++
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
+-+
| minA|
+-+
|[1,1]|
+-+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.


> InSet.doCodeGen incorrect handling of structs
> -
>
> Key: SPARK-21228
> URL: https://issues.apache.org/jira/browse/SPARK-21228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
> will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> ++
> |minA|
> ++
> ++
> {code}
> In.doCodeGen appears to be correct:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-+
> | minA|
> +-+
> |[1,1]|
> +-+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
> not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21228) InSet.doCodeGen incorrect handling of structs

2017-06-27 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-21228:
---

 Summary: InSet.doCodeGen incorrect handling of structs
 Key: SPARK-21228
 URL: https://issues.apache.org/jira/browse/SPARK-21228
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Bogdan Raducanu


In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
```
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show
++
|minA|
++
++
```
In.doCodeGen appears to be correct:
```
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
+-+
| minA|
+-+
|[1,1]|
+-+
```

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21009) SparkListenerTaskEnd.taskInfo.accumulables might not be accurate

2017-06-07 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu resolved SPARK-21009.
-
Resolution: Duplicate

> SparkListenerTaskEnd.taskInfo.accumulables might not be accurate
> 
>
> Key: SPARK-21009
> URL: https://issues.apache.org/jira/browse/SPARK-21009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces it:
> {code}
>   test("test") {
> val foundMetrics = mutable.Set.empty[String]
> spark.sparkContext.addSparkListener(new SparkListener {
>   override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
> taskEnd.taskInfo.accumulables.foreach { a =>
>   if (a.name.isDefined) {
> foundMetrics.add(a.name.get)
>   }
> }
>   }
> })
> for (iter <- 0 until 100) {
>   foundMetrics.clear()
>   println(s"iter = $iter")
>   spark.range(10).groupBy().agg("id" -> "sum").collect
>   spark.sparkContext.listenerBus.waitUntilEmpty(3000)
>   assert(foundMetrics.size > 0)
> }
>   }
> {code}
> The problem comes from DAGScheduler.handleTaskCompletion.
> The SparkListenerTaskEnd event is sent before updateAccumulators is called, 
> so it might not be up to date.
> The code there looks like it needs refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21009) SparkListenerTaskEnd.taskInfo.accumulables might not be accurate

2017-06-07 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041055#comment-16041055
 ] 

Bogdan Raducanu commented on SPARK-21009:
-

Yes, looks like duplicate. I posted the repro code in that one. I'll close this 
one.

> SparkListenerTaskEnd.taskInfo.accumulables might not be accurate
> 
>
> Key: SPARK-21009
> URL: https://issues.apache.org/jira/browse/SPARK-21009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces it:
> {code}
>   test("test") {
> val foundMetrics = mutable.Set.empty[String]
> spark.sparkContext.addSparkListener(new SparkListener {
>   override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
> taskEnd.taskInfo.accumulables.foreach { a =>
>   if (a.name.isDefined) {
> foundMetrics.add(a.name.get)
>   }
> }
>   }
> })
> for (iter <- 0 until 100) {
>   foundMetrics.clear()
>   println(s"iter = $iter")
>   spark.range(10).groupBy().agg("id" -> "sum").collect
>   spark.sparkContext.listenerBus.waitUntilEmpty(3000)
>   assert(foundMetrics.size > 0)
> }
>   }
> {code}
> The problem comes from DAGScheduler.handleTaskCompletion.
> The SparkListenerTaskEnd event is sent before updateAccumulators is called, 
> so it might not be up to date.
> The code there looks like it needs refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20342) DAGScheduler sends SparkListenerTaskEnd before updating task's accumulators

2017-06-07 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041054#comment-16041054
 ] 

Bogdan Raducanu commented on SPARK-20342:
-

This code fails because of this issue:

{code}
test("test") {
val foundMetrics = mutable.Set.empty[String]
spark.sparkContext.addSparkListener(new SparkListener {
  override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
taskEnd.taskInfo.accumulables.foreach { a =>
  if (a.name.isDefined) {
foundMetrics.add(a.name.get)
  }
}
  }
})
for (iter <- 0 until 100) {
  foundMetrics.clear()
  println(s"iter = $iter")
  spark.range(10).groupBy().agg("id" -> "sum").collect
  spark.sparkContext.listenerBus.waitUntilEmpty(3000)
  assert(foundMetrics.size > 0)
}
  }
{code}

> DAGScheduler sends SparkListenerTaskEnd before updating task's accumulators
> ---
>
> Key: SPARK-20342
> URL: https://issues.apache.org/jira/browse/SPARK-20342
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>
> Hit this on 2.2, but probably has been there forever. This is similar in 
> spirit to SPARK-20205.
> Event is sent here, around L1154:
> {code}
> listenerBus.post(SparkListenerTaskEnd(
>stageId, task.stageAttemptId, taskType, event.reason, event.taskInfo, 
> taskMetrics))
> {code}
> Accumulators are updated later, around L1173:
> {code}
> val stage = stageIdToStage(task.stageId)
> event.reason match {
>   case Success =>
> task match {
>   case rt: ResultTask[_, _] =>
> // Cast to ResultStage here because it's part of the ResultTask
> // TODO Refactor this out to a function that accepts a ResultStage
> val resultStage = stage.asInstanceOf[ResultStage]
> resultStage.activeJob match {
>   case Some(job) =>
> if (!job.finished(rt.outputId)) {
>   updateAccumulators(event)
> {code}
> Same thing applies here; UI shows correct info because it's pointing at the 
> mutable {{TaskInfo}} structure. But the event log, for example, may record 
> the wrong information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21009) SparkListenerTaskEnd.taskInfo.accumulables might not be accurate

2017-06-07 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21009:

Description: 
The following code reproduces it:
{code}
  test("test") {
val foundMetrics = mutable.Set.empty[String]
spark.sparkContext.addSparkListener(new SparkListener {
  override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
taskEnd.taskInfo.accumulables.foreach { a =>
  if (a.name.isDefined) {
foundMetrics.add(a.name.get)
  }
}
  }
})
for (iter <- 0 until 100) {
  foundMetrics.clear()
  println(s"iter = $iter")
  spark.range(10).groupBy().agg("id" -> "sum").collect
  spark.sparkContext.listenerBus.waitUntilEmpty(3000)
  assert(foundMetrics.size > 0)
}
  }
{code}

The problem comes from DAGScheduler.handleTaskCompletion.
The SparkListenerTaskEnd event is sent before updateAccumulators is called, so 
it might not be up to date.
The code there looks like it needs refactoring.

  was:
The following code reproduces it:
{code}
  test("test") {
val foundMetrics = mutable.Set.empty[String]
spark.sparkContext.addSparkListener(new SparkListener {
  override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
taskEnd.taskInfo.accumulables.foreach { a =>
  if (a.name.isDefined) {
foundMetrics.add(a.name.get)
  }
}
  }
})
for (iter <- 0 until 100) {
  foundMetrics.clear()
  println(s"iter = $iter")
  spark.range(10).groupBy().agg("id" -> "sum").collect
  spark.sparkContext.listenerBus.waitUntilEmpty(3000)
  assert(foundMetrics.size > 0)
}
  }
{code}


> SparkListenerTaskEnd.taskInfo.accumulables might not be accurate
> 
>
> Key: SPARK-21009
> URL: https://issues.apache.org/jira/browse/SPARK-21009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces it:
> {code}
>   test("test") {
> val foundMetrics = mutable.Set.empty[String]
> spark.sparkContext.addSparkListener(new SparkListener {
>   override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
> taskEnd.taskInfo.accumulables.foreach { a =>
>   if (a.name.isDefined) {
> foundMetrics.add(a.name.get)
>   }
> }
>   }
> })
> for (iter <- 0 until 100) {
>   foundMetrics.clear()
>   println(s"iter = $iter")
>   spark.range(10).groupBy().agg("id" -> "sum").collect
>   spark.sparkContext.listenerBus.waitUntilEmpty(3000)
>   assert(foundMetrics.size > 0)
> }
>   }
> {code}
> The problem comes from DAGScheduler.handleTaskCompletion.
> The SparkListenerTaskEnd event is sent before updateAccumulators is called, 
> so it might not be up to date.
> The code there looks like it needs refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21009) SparkListenerTaskEnd.taskInfo.accumulables might not be accurate

2017-06-07 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-21009:
---

 Summary: SparkListenerTaskEnd.taskInfo.accumulables might not be 
accurate
 Key: SPARK-21009
 URL: https://issues.apache.org/jira/browse/SPARK-21009
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Bogdan Raducanu


The following code reproduces it:
{code}
  test("test") {
val foundMetrics = mutable.Set.empty[String]
spark.sparkContext.addSparkListener(new SparkListener {
  override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
taskEnd.taskInfo.accumulables.foreach { a =>
  if (a.name.isDefined) {
foundMetrics.add(a.name.get)
  }
}
  }
})
for (iter <- 0 until 100) {
  foundMetrics.clear()
  println(s"iter = $iter")
  spark.range(10).groupBy().agg("id" -> "sum").collect
  spark.sparkContext.listenerBus.waitUntilEmpty(3000)
  assert(foundMetrics.size > 0)
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20744) Predicates with multiple columns do not work

2017-06-01 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032679#comment-16032679
 ] 

Bogdan Raducanu commented on SPARK-20744:
-

Array generally needs all components to be same type. Casts are added 
automatically but it's not always possible:

```sql("select array(now(), 1)").show```

```org.apache.spark.sql.AnalysisException: cannot resolve 
'array(current_timestamp(), 1)' due to data type mismatch: input to function 
array should all be the same type, but it's [timestamp, int]; line 1 pos 7;```

> Predicates with multiple columns do not work
> 
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
> ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type 
> mismatch: Arguments must be same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
>+- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB 
> support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as 
> b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
> named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments 
> must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
> 1),named_struct(col1, 2, col2, 2))
>+- SubqueryAlias tab1
>   +- Project [id#47L AS a#50L, id#47L AS b#51L]
>  +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Other examples:
> {code}
> scala> sql("select * from tab1 where (a,b) =(1,1)").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data 
> type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', 
> tab1.`b`) = named_struct('col1', 1, 'col2', 1))' (struct 
> and struct).; line 1 pos 25;
> 'Project [*]
> +- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
>+- SubqueryAlias tab1
>   +- Project [id#47L AS a#50L, id#47L AS b#51L]
>  +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Expressions such as (1,1) are apparently read as structs and then the types 
> do not match. Perhaps they should be arrays.
> The following code works:
> {code}
> sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
> {code}
> This also works, but requires the cast:
> {code}
> sql("select * from tab1 where (a,b) in (named_struct('a', cast(1 as bigint), 
> 'b', cast(1 as bigint)))").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20744) Predicates with multiple columns do not work

2017-06-01 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032679#comment-16032679
 ] 

Bogdan Raducanu edited comment on SPARK-20744 at 6/1/17 9:19 AM:
-

Array generally needs all components to be same type. Casts are added 
automatically but it's not always possible:

{code}
sql("select array(now(), 1)").show
{code}

{code}
org.apache.spark.sql.AnalysisException: cannot resolve 
'array(current_timestamp(), 1)' due to data type mismatch: input to function 
array should all be the same type, but it's [timestamp, int]; line 1 pos 7;
{code}


was (Author: bograd):
Array generally needs all components to be same type. Casts are added 
automatically but it's not always possible:

```sql("select array(now(), 1)").show```

```org.apache.spark.sql.AnalysisException: cannot resolve 
'array(current_timestamp(), 1)' due to data type mismatch: input to function 
array should all be the same type, but it's [timestamp, int]; line 1 pos 7;```

> Predicates with multiple columns do not work
> 
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
> ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type 
> mismatch: Arguments must be same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
>+- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB 
> support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as 
> b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
> named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments 
> must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
> 1),named_struct(col1, 2, col2, 2))
>+- SubqueryAlias tab1
>   +- Project [id#47L AS a#50L, id#47L AS b#51L]
>  +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Other examples:
> {code}
> scala> sql("select * from tab1 where (a,b) =(1,1)").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data 
> type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', 
> tab1.`b`) = named_struct('col1', 1, 'col2', 1))' (struct 
> and struct).; line 1 pos 25;
> 'Project [*]
> +- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
>+- SubqueryAlias tab1
>   +- Project [id#47L AS a#50L, id#47L AS b#51L]
>  +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Expressions such as (1,1) are apparently read as structs and then the types 
> do not match. Perhaps they should be arrays.
> The following code works:
> {code}
> sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
> {code}
> This also works, but requires the cast:
> {code}
> sql("select * from tab1 where (a,b) in (named_struct('a', cast(1 as bigint), 
> 'b', cast(1 as bigint)))").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20854) extend hint syntax to support any expression, not just identifiers or strings

2017-05-23 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-20854:
---

 Summary: extend hint syntax to support any expression, not just 
identifiers or strings
 Key: SPARK-20854
 URL: https://issues.apache.org/jira/browse/SPARK-20854
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Bogdan Raducanu


Currently the SQL hint syntax supports as parameters only identifiers while the 
Dataset hint syntax supports only strings.

They should support any expression as parameters, for example numbers. This is 
useful for implementing other hints in the future.

Examples:
{code}
df.hint("hint1", Seq(1, 2, 3))
df.hint("hint2", "A", 1)

sql("select /*+ hint1((1,2,3)) */")
sql("select /*+ hint2('A', 1) */")
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20744) Predicates with multiple columns do not work

2017-05-15 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20744:

Description: 
The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}

Similarly it won't work from SQL either, which is something that other SQL DB 
support:

{code}
scala> spark.range(10).selectExpr("id as a", "id as 
b").createOrReplaceTempView("tab1")

scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must 
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
1),named_struct(col1, 2, col2, 2))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

Other examples:
{code}
scala> sql("select * from tab1 where (a,b) =(1,1)").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data 
type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`) 
= named_struct('col1', 1, 'col2', 1))' (struct and 
struct).; line 1 pos 25;
'Project [*]
+- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

Expressions such as (1,1) are apparently read as structs and then the types do 
not match. Perhaps they should be arrays.
The following code works:
{code}
sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
{code}

This also works, but requires the cast:
{code}
sql("select * from tab1 where (a,b) in (named_struct('a', cast(1 as bigint), 
'b', cast(1 as bigint)))").show
{code}


  was:
The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}

Similarly it won't work from SQL either, which is something that other SQL DB 
support:

{code}
scala> spark.range(10).selectExpr("id as a", "id as 
b").createOrReplaceTempView("tab1")

scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must 
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
1),named_struct(col1, 2, col2, 2))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

Other examples:
{code}
scala> sql("select * from tab1 where (a,b) =(1,1)").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data 
type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`) 
= named_struct('col1', 1, 'col2', 1))' (struct and 
struct).; line 1 pos 25;
'Project [*]
+- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

Expressions such as (1,1) are apparently read as structs and then the types do 
not match. Perhaps they should be arrays.
The following code works:
{code}
sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
{code}



> Predicates with multiple columns do not work
> 
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
>  

[jira] [Updated] (SPARK-20744) Predicates with multiple columns do not work

2017-05-15 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20744:

Description: 
The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}

Similarly it won't work from SQL either, which is something that other SQL DB 
support:

{code}
scala> spark.range(10).selectExpr("id as a", "id as 
b").createOrReplaceTempView("tab1")

scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must 
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
1),named_struct(col1, 2, col2, 2))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

Other examples:
{code}
scala> sql("select * from tab1 where (a,b) =(1,1)").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data 
type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`) 
= named_struct('col1', 1, 'col2', 1))' (struct and 
struct).; line 1 pos 25;
'Project [*]
+- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

Expressions such as (1,1) are apparently read as structs and then the types do 
not match. Perhaps they should be arrays.
The following code works:
{code}
sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
{code}


  was:
The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}

Similarly it won't work from SQL either, which is something that other SQL DB 
support:

{code}
scala> spark.range(10).selectExpr("id as a", "id as 
b").createOrReplaceTempView("tab1")

scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must 
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
1),named_struct(col1, 2, col2, 2))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

Other examples:
{code}
scala> sql("select * from tab1 where (a,b) =(1,1)").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data 
type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`) 
= named_struct('col1', 1, 'col2', 1))' (struct and 
struct).; line 1 pos 25;
'Project [*]
+- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}


> Predicates with multiple columns do not work
> 
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
> ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data 

[jira] [Updated] (SPARK-20744) Predicates with multiple columns do not work

2017-05-15 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20744:

Description: 
The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}

Similarly it won't work from SQL either, which is something that other SQL DB 
support:

{code}
scala> spark.range(10).selectExpr("id as a", "id as 
b").createOrReplaceTempView("tab1")

scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must 
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
1),named_struct(col1, 2, col2, 2))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

Other examples:
{code}
scala> sql("select * from tab1 where (a,b) =(1,1)").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data 
type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`) 
= named_struct('col1', 1, 'col2', 1))' (struct and 
struct).; line 1 pos 25;
'Project [*]
+- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

  was:
The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}

Similarly it won't work from SQL either, which is something that other SQL DB 
support:

{code}
scala> spark.range(10).selectExpr("id as a", "id as 
b").createOrReplaceTempView("tab1")

scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must 
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
1),named_struct(col1, 2, col2, 2))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}


> Predicates with multiple columns do not work
> 
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
> ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type 
> mismatch: Arguments must be same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
>+- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB 
> support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as 
> b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
> named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments 
> must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
> 1),named_struct(col1, 2, col2, 

[jira] [Updated] (SPARK-20744) Predicates with multiple columns does not work

2017-05-15 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20744:

Summary: Predicates with multiple columns does not work  (was: IN with 
multiple columns does not work)

> Predicates with multiple columns does not work
> --
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
> ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type 
> mismatch: Arguments must be same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
>+- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB 
> support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as 
> b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
> named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments 
> must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
> 1),named_struct(col1, 2, col2, 2))
>+- SubqueryAlias tab1
>   +- Project [id#47L AS a#50L, id#47L AS b#51L]
>  +- Range (0, 10, step=1, splits=Some(1))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20744) Predicates with multiple columns do not work

2017-05-15 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20744:

Summary: Predicates with multiple columns do not work  (was: Predicates 
with multiple columns does not work)

> Predicates with multiple columns do not work
> 
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
> ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type 
> mismatch: Arguments must be same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
>+- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB 
> support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as 
> b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
> named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments 
> must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
> 1),named_struct(col1, 2, col2, 2))
>+- SubqueryAlias tab1
>   +- Project [id#47L AS a#50L, id#47L AS b#51L]
>  +- Range (0, 10, step=1, splits=Some(1))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20744) IN with multiple columns does not work

2017-05-15 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20744:

Description: 
The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}

Similarly it won't work from SQL either, which is something that other SQL DB 
support:

{code}
scala> spark.range(10).selectExpr("id as a", "id as 
b").createOrReplaceTempView("tab1")

scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must 
be same type; line 1 pos 31;
'Project [*]
+- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
1),named_struct(col1, 2, col2, 2))
   +- SubqueryAlias tab1
  +- Project [id#47L AS a#50L, id#47L AS b#51L]
 +- Range (0, 10, step=1, splits=Some(1))
{code}

  was:
The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}


> IN with multiple columns does not work
> --
>
> Key: SPARK-20744
> URL: https://issues.apache.org/jira/browse/SPARK-20744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
> ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type 
> mismatch: Arguments must be same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
>+- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB 
> support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as 
> b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', 
> tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), 
> named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments 
> must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 
> 1),named_struct(col1, 2, col2, 2))
>+- SubqueryAlias tab1
>   +- Project [id#47L AS a#50L, id#47L AS b#51L]
>  +- Range (0, 10, step=1, splits=Some(1))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20744) IN with multiple columns does not work

2017-05-15 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-20744:
---

 Summary: IN with multiple columns does not work
 Key: SPARK-20744
 URL: https://issues.apache.org/jira/browse/SPARK-20744
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Bogdan Raducanu


The following code reproduces the problem:

{code}
scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in 
((1,1))").show
org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 
'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: 
Arguments must be same type; line 1 pos 6;
'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
+- Project [id#39L AS a#42L, id#39L AS b#43L]
   +- Range (0, 10, step=1, splits=Some(1))
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20407) ParquetQuerySuite 'Enabling/disabling ignoreCorruptFiles' flaky test

2017-04-20 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20407:

Summary: ParquetQuerySuite 'Enabling/disabling ignoreCorruptFiles' flaky 
test  (was: ParquetQuerySuite flaky test)

> ParquetQuerySuite 'Enabling/disabling ignoreCorruptFiles' flaky test
> 
>
> Key: SPARK-20407
> URL: https://issues.apache.org/jira/browse/SPARK-20407
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> ParquetQuerySuite test "Enabling/disabling ignoreCorruptFiles" can sometimes 
> fail. This is caused by the fact that when one task fails, the driver call 
> returns and test code continues, but there might still be tasks running that 
> will be killed at the next killing point.
> There are 2 specific issues created by this:
> 1. Files can be closed some time after the test finishes, so 
> DebugFilesystem.assertNoOpenStreams fails. One solution for this is to change 
> SharedSqlContext and call assertNoOpenStreams inside eventually {}
> 2. ParquetFileReader constructor from apache parquet 1.8.2 can leak a stream 
> at line 538. This happens when the next line throws an exception. So, the 
> constructor fails and Spark doesn't have any way to close the file.
> This happens in this test because the test deletes the temporary directory at 
> the end (but while tasks might still be running). Deleting the directory 
> causes the constructor to fail.
> The solution for this could be to Thread.sleep at the end of the test or to 
> somehow wait for all tasks to be definitely killed before finishing the test



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20407) ParquetQuerySuite flaky test

2017-04-20 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-20407:
---

 Summary: ParquetQuerySuite flaky test
 Key: SPARK-20407
 URL: https://issues.apache.org/jira/browse/SPARK-20407
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.2.0
Reporter: Bogdan Raducanu


ParquetQuerySuite test "Enabling/disabling ignoreCorruptFiles" can sometimes 
fail. This is caused by the fact that when one task fails the driver call 
returns and test code continues, but there might still be tasks running that 
will be killed at the next killing point.

There are 2 specific issues creates by this:
1. Files are closed after the test finishes, so 
DebugFilesystem.assertNoOpenStreams fails. One solution for this is to change 
SharedSqlContext and call assertNoOpenStreams inside eventually {}

2. ParquetFileReader constructor from apache parquet 1.8.2 can leak a stream at 
line 538. This happens when the next line throws an exception. So, the 
constructor fails and Spark doesn't have any way to close the file.
This happens in this test because the test deletes the temporary directory at 
the end (but while tasks might still be running). Deleting the directory causes 
the constructor to fail.
The solution for this could be to Thread.sleep at the end of the test or to 
somehow wait for all tasks to be definitely killed before finishing the test



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20407) ParquetQuerySuite flaky test

2017-04-20 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20407:

Description: 
ParquetQuerySuite test "Enabling/disabling ignoreCorruptFiles" can sometimes 
fail. This is caused by the fact that when one task fails, the driver call 
returns and test code continues, but there might still be tasks running that 
will be killed at the next killing point.

There are 2 specific issues created by this:
1. Files can be closed some time after the test finishes, so 
DebugFilesystem.assertNoOpenStreams fails. One solution for this is to change 
SharedSqlContext and call assertNoOpenStreams inside eventually {}

2. ParquetFileReader constructor from apache parquet 1.8.2 can leak a stream at 
line 538. This happens when the next line throws an exception. So, the 
constructor fails and Spark doesn't have any way to close the file.
This happens in this test because the test deletes the temporary directory at 
the end (but while tasks might still be running). Deleting the directory causes 
the constructor to fail.
The solution for this could be to Thread.sleep at the end of the test or to 
somehow wait for all tasks to be definitely killed before finishing the test

  was:
ParquetQuerySuite test "Enabling/disabling ignoreCorruptFiles" can sometimes 
fail. This is caused by the fact that when one task fails the driver call 
returns and test code continues, but there might still be tasks running that 
will be killed at the next killing point.

There are 2 specific issues creates by this:
1. Files are closed after the test finishes, so 
DebugFilesystem.assertNoOpenStreams fails. One solution for this is to change 
SharedSqlContext and call assertNoOpenStreams inside eventually {}

2. ParquetFileReader constructor from apache parquet 1.8.2 can leak a stream at 
line 538. This happens when the next line throws an exception. So, the 
constructor fails and Spark doesn't have any way to close the file.
This happens in this test because the test deletes the temporary directory at 
the end (but while tasks might still be running). Deleting the directory causes 
the constructor to fail.
The solution for this could be to Thread.sleep at the end of the test or to 
somehow wait for all tasks to be definitely killed before finishing the test


> ParquetQuerySuite flaky test
> 
>
> Key: SPARK-20407
> URL: https://issues.apache.org/jira/browse/SPARK-20407
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> ParquetQuerySuite test "Enabling/disabling ignoreCorruptFiles" can sometimes 
> fail. This is caused by the fact that when one task fails, the driver call 
> returns and test code continues, but there might still be tasks running that 
> will be killed at the next killing point.
> There are 2 specific issues created by this:
> 1. Files can be closed some time after the test finishes, so 
> DebugFilesystem.assertNoOpenStreams fails. One solution for this is to change 
> SharedSqlContext and call assertNoOpenStreams inside eventually {}
> 2. ParquetFileReader constructor from apache parquet 1.8.2 can leak a stream 
> at line 538. This happens when the next line throws an exception. So, the 
> constructor fails and Spark doesn't have any way to close the file.
> This happens in this test because the test deletes the temporary directory at 
> the end (but while tasks might still be running). Deleting the directory 
> causes the constructor to fail.
> The solution for this could be to Thread.sleep at the end of the test or to 
> somehow wait for all tasks to be definitely killed before finishing the test



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20280) SharedInMemoryCache Weigher integer overflow

2017-04-10 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20280:

Description: 
in FileStatusCache.scala:
{code}
.weigher(new Weigher[(ClientId, Path), Array[FileStatus]] {
  override def weigh(key: (ClientId, Path), value: Array[FileStatus]): Int 
= {
(SizeEstimator.estimate(key) + SizeEstimator.estimate(value)).toInt
  }})
{code}

Weigher.weigh returns Int but the size of an Array[FileStatus] could be bigger 
than Int.maxValue. Then, a negative value is returned, leading to this 
exception:

{code}
* [info]   java.lang.IllegalStateException: Weights must be non-negative
* [info]   at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
* [info]   at 
com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2223)
* [info]   at 
com.google.common.cache.LocalCache$Segment.put(LocalCache.java:2944)
* [info]   at com.google.common.cache.LocalCache.put(LocalCache.java:4212)
* [info]   at 
com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:4804)
* [info]   at 
org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$3.putLeafFiles(FileStatusCache.scala:131)

{code}

  was:
{code}
.weigher(new Weigher[(ClientId, Path), Array[FileStatus]] {
  override def weigh(key: (ClientId, Path), value: Array[FileStatus]): Int 
= {
(SizeEstimator.estimate(key) + SizeEstimator.estimate(value)).toInt
  }})
{code}

Weigher.weigh returns Int but the size of an Array[FileStatus] could be bigger 
than Int.maxValue. Then, a negative value is returned, leading to this 
exception:

{code}
* [info]   java.lang.IllegalStateException: Weights must be non-negative
* [info]   at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
* [info]   at 
com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2223)
* [info]   at 
com.google.common.cache.LocalCache$Segment.put(LocalCache.java:2944)
* [info]   at com.google.common.cache.LocalCache.put(LocalCache.java:4212)
* [info]   at 
com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:4804)
* [info]   at 
org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$3.putLeafFiles(FileStatusCache.scala:131)

{code}


> SharedInMemoryCache Weigher integer overflow
> 
>
> Key: SPARK-20280
> URL: https://issues.apache.org/jira/browse/SPARK-20280
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Bogdan Raducanu
>
> in FileStatusCache.scala:
> {code}
> .weigher(new Weigher[(ClientId, Path), Array[FileStatus]] {
>   override def weigh(key: (ClientId, Path), value: Array[FileStatus]): 
> Int = {
> (SizeEstimator.estimate(key) + SizeEstimator.estimate(value)).toInt
>   }})
> {code}
> Weigher.weigh returns Int but the size of an Array[FileStatus] could be 
> bigger than Int.maxValue. Then, a negative value is returned, leading to this 
> exception:
> {code}
> * [info]   java.lang.IllegalStateException: Weights must be non-negative
> * [info]   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> * [info]   at 
> com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2223)
> * [info]   at 
> com.google.common.cache.LocalCache$Segment.put(LocalCache.java:2944)
> * [info]   at com.google.common.cache.LocalCache.put(LocalCache.java:4212)
> * [info]   at 
> com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:4804)
> * [info]   at 
> org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$3.putLeafFiles(FileStatusCache.scala:131)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20280) SharedInMemoryCache Weigher integer overflow

2017-04-10 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-20280:
---

 Summary: SharedInMemoryCache Weigher integer overflow
 Key: SPARK-20280
 URL: https://issues.apache.org/jira/browse/SPARK-20280
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0, 2.2.0
Reporter: Bogdan Raducanu


{code}
.weigher(new Weigher[(ClientId, Path), Array[FileStatus]] {
  override def weigh(key: (ClientId, Path), value: Array[FileStatus]): Int 
= {
(SizeEstimator.estimate(key) + SizeEstimator.estimate(value)).toInt
  }})
{code}

Weigher.weigh returns Int but the size of an Array[FileStatus] could be bigger 
than Int.maxValue. Then, a negative value is returned, leading to this 
exception:

{code}
* [info]   java.lang.IllegalStateException: Weights must be non-negative
* [info]   at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
* [info]   at 
com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2223)
* [info]   at 
com.google.common.cache.LocalCache$Segment.put(LocalCache.java:2944)
* [info]   at com.google.common.cache.LocalCache.put(LocalCache.java:4212)
* [info]   at 
com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:4804)
* [info]   at 
org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$3.putLeafFiles(FileStatusCache.scala:131)

{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20243) DebugFilesystem.assertNoOpenStreams thread race

2017-04-06 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-20243:

Description: 
Introduced by SPARK-19946.

DebugFilesystem.assertNoOpenStreams gets the size of the openStreams 
ConcurrentHashMap and then later, if the size was > 0, accesses the first 
element in openStreams.values. But, the ConcurrentHashMap might be cleared by 
another thread between getting its size and accessing it, resulting in an 
exception when trying to call .head on an empty collection.


  was:Introduced by SPARK-19946


> DebugFilesystem.assertNoOpenStreams thread race
> ---
>
> Key: SPARK-20243
> URL: https://issues.apache.org/jira/browse/SPARK-20243
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> Introduced by SPARK-19946.
> DebugFilesystem.assertNoOpenStreams gets the size of the openStreams 
> ConcurrentHashMap and then later, if the size was > 0, accesses the first 
> element in openStreams.values. But, the ConcurrentHashMap might be cleared by 
> another thread between getting its size and accessing it, resulting in an 
> exception when trying to call .head on an empty collection.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20243) DebugFilesystem.assertNoOpenStreams thread race

2017-04-06 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-20243:
---

 Summary: DebugFilesystem.assertNoOpenStreams thread race
 Key: SPARK-20243
 URL: https://issues.apache.org/jira/browse/SPARK-20243
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.2.0
Reporter: Bogdan Raducanu


Introduced by SPARK-19946



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19946) DebugFilesystem.assertNoOpenStreams should report the open streams to help debugging

2017-03-14 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-19946:

Summary: DebugFilesystem.assertNoOpenStreams should report the open streams 
to help debugging  (was: DebugFilesystem.assertNoOpenStreams should report open 
streams to help debugging)

> DebugFilesystem.assertNoOpenStreams should report the open streams to help 
> debugging
> 
>
> Key: SPARK-19946
> URL: https://issues.apache.org/jira/browse/SPARK-19946
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.1.0
>Reporter: Bogdan Raducanu
>
> In DebugFilesystem.assertNoOpenStreams if there are open streams an exception 
> is thrown showing the number of open streams. This doesn't help much to debug 
> where the open streams were leaked.
> The exception should also report where the stream was leaked. This can be 
> done through a cause exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19946) DebugFilesystem.assertNoOpenStreams should report open streams to help debugging

2017-03-14 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-19946:
---

 Summary: DebugFilesystem.assertNoOpenStreams should report open 
streams to help debugging
 Key: SPARK-19946
 URL: https://issues.apache.org/jira/browse/SPARK-19946
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.1.0
Reporter: Bogdan Raducanu


In DebugFilesystem.assertNoOpenStreams if there are open streams an exception 
is thrown showing the number of open streams. This doesn't help much to debug 
where the open streams were leaked.
The exception should also report where the stream was leaked. This can be done 
through a cause exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19512) codegen for compare structs fails

2017-02-08 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-19512:

Description: 
This (1 struct field)

{code:java|title=1 struct field}
spark.range(10)
  .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) as 
col2")
  .filter("col1 = col2").count
{code}

fails with

{code}
[info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
failed to compile: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 144, Column 32: Expression "range_value" is not an rvalue
{code}

This (2 struct fields)
{code:java|title=2 struct fields}
spark.range(10)
.selectExpr("named_struct('a', id, 'b', id) as col1", 
"named_struct('a',id+2, 'b',id+2) as col2")
.filter($"col1" === $"col2").count
{code}

fails with 
{code}

Caused by: java.lang.IndexOutOfBoundsException: 1
  at 
scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
  at scala.collection.immutable.List.apply(List.scala:84)
  at 
org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
{code}


  was:
This (1 struct field)

{code:java|title=1 struct field}
spark.range(10)
  .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) as 
col2")
  .filter("col1 = col2").count
{code}

fails with

{code}
[info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
failed to compile: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 144, Column 32: Expression "range_value" is not an rvalue
{code}

This (2 struct fields)
{code:java|title=2 struct fields}
spark.range(10).selectExpr("named_struct('a', id, 'b', id) as col1", 
"named_struct('a',id+2, 'b',id+2) as col2").filter($"col1" === $"col2").count
{code}

fails with 
{code}

Caused by: java.lang.IndexOutOfBoundsException: 1
  at 
scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
  at scala.collection.immutable.List.apply(List.scala:84)
  at 
org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
{code}



> codegen for compare structs fails
> -
>
> Key: SPARK-19512
> URL: https://issues.apache.org/jira/browse/SPARK-19512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Bogdan Raducanu
>
> This (1 struct field)
> {code:java|title=1 struct field}
> spark.range(10)
>   .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) 
> as col2")
>   .filter("col1 = col2").count
> {code}
> fails with
> {code}
> [info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 144, Column 32: Expression "range_value" is not an 
> rvalue
> {code}
> This (2 struct fields)
> {code:java|title=2 struct fields}
> spark.range(10)
> .selectExpr("named_struct('a', id, 'b', id) as col1", 
> "named_struct('a',id+2, 'b',id+2) as col2")
> .filter($"col1" === $"col2").count
> {code}
> fails with 
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19512) codegen for compare structs fails

2017-02-08 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-19512:

Description: 
This (1 struct field)

{code:scala|title=1 struct field}
spark.range(10)
  .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) as 
col2")
  .filter("col1 = col2").count
{code}

fails with

{code}
[info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
failed to compile: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 144, Column 32: Expression "range_value" is not an rvalue
{code}

This (2 struct fields)
{code:scala|title=2 struct fields}
spark.range(10).selectExpr("named_struct('a', id, 'b', id) as col1", 
"named_struct('a',id+2, 'b',id+2) as col2").filter($"col1" === $"col2").count
{code}

fails with 
{code}

Caused by: java.lang.IndexOutOfBoundsException: 1
  at 
scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
  at scala.collection.immutable.List.apply(List.scala:84)
  at 
org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
{code}


  was:
This (1 struct field)

{{
spark.range(10)
  .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) as 
col2")
  .filter("col1 = col2").count
}}

fails with

{{
[info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
failed to compile: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 144, Column 32: Expression "range_value" is not an rvalue
}}

This (2 struct fields)
{{
spark.range(10).selectExpr("named_struct('a', id, 'b', id) as col1", 
"named_struct('a',id+2, 'b',id+2) as col2").filter($"col1" === $"col2").count
}}

fails with 
{{

Caused by: java.lang.IndexOutOfBoundsException: 1
  at 
scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
  at scala.collection.immutable.List.apply(List.scala:84)
  at 
org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
}}



> codegen for compare structs fails
> -
>
> Key: SPARK-19512
> URL: https://issues.apache.org/jira/browse/SPARK-19512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Bogdan Raducanu
>
> This (1 struct field)
> {code:scala|title=1 struct field}
> spark.range(10)
>   .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) 
> as col2")
>   .filter("col1 = col2").count
> {code}
> fails with
> {code}
> [info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 144, Column 32: Expression "range_value" is not an 
> rvalue
> {code}
> This (2 struct fields)
> {code:scala|title=2 struct fields}
> spark.range(10).selectExpr("named_struct('a', id, 'b', id) as col1", 
> "named_struct('a',id+2, 'b',id+2) as col2").filter($"col1" === $"col2").count
> {code}
> fails with 
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19512) codegen for compare structs fails

2017-02-08 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-19512:

Description: 
This (1 struct field)

{code:java|title=1 struct field}
spark.range(10)
  .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) as 
col2")
  .filter("col1 = col2").count
{code}

fails with

{code}
[info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
failed to compile: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 144, Column 32: Expression "range_value" is not an rvalue
{code}

This (2 struct fields)
{code:java|title=2 struct fields}
spark.range(10).selectExpr("named_struct('a', id, 'b', id) as col1", 
"named_struct('a',id+2, 'b',id+2) as col2").filter($"col1" === $"col2").count
{code}

fails with 
{code}

Caused by: java.lang.IndexOutOfBoundsException: 1
  at 
scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
  at scala.collection.immutable.List.apply(List.scala:84)
  at 
org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
{code}


  was:
This (1 struct field)

{code:scala|title=1 struct field}
spark.range(10)
  .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) as 
col2")
  .filter("col1 = col2").count
{code}

fails with

{code}
[info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
failed to compile: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 144, Column 32: Expression "range_value" is not an rvalue
{code}

This (2 struct fields)
{code:scala|title=2 struct fields}
spark.range(10).selectExpr("named_struct('a', id, 'b', id) as col1", 
"named_struct('a',id+2, 'b',id+2) as col2").filter($"col1" === $"col2").count
{code}

fails with 
{code}

Caused by: java.lang.IndexOutOfBoundsException: 1
  at 
scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
  at scala.collection.immutable.List.apply(List.scala:84)
  at 
org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
{code}



> codegen for compare structs fails
> -
>
> Key: SPARK-19512
> URL: https://issues.apache.org/jira/browse/SPARK-19512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Bogdan Raducanu
>
> This (1 struct field)
> {code:java|title=1 struct field}
> spark.range(10)
>   .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) 
> as col2")
>   .filter("col1 = col2").count
> {code}
> fails with
> {code}
> [info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 144, Column 32: Expression "range_value" is not an 
> rvalue
> {code}
> This (2 struct fields)
> {code:java|title=2 struct fields}
> spark.range(10).selectExpr("named_struct('a', id, 'b', id) as col1", 
> "named_struct('a',id+2, 'b',id+2) as col2").filter($"col1" === $"col2").count
> {code}
> fails with 
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19512) codegen for compare structs fails

2017-02-08 Thread Bogdan Raducanu (JIRA)
Bogdan Raducanu created SPARK-19512:
---

 Summary: codegen for compare structs fails
 Key: SPARK-19512
 URL: https://issues.apache.org/jira/browse/SPARK-19512
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0, 2.0.0
Reporter: Bogdan Raducanu


This (1 struct field)

{{
spark.range(10)
  .selectExpr("named_struct('a', id) as col1", "named_struct('a', id+2) as 
col2")
  .filter("col1 = col2").count
}}

fails with

{{
[info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
failed to compile: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 144, Column 32: Expression "range_value" is not an rvalue
}}

This (2 struct fields)
{{
spark.range(10).selectExpr("named_struct('a', id, 'b', id) as col1", 
"named_struct('a',id+2, 'b',id+2) as col2").filter($"col1" === $"col2").count
}}

fails with 
{{

Caused by: java.lang.IndexOutOfBoundsException: 1
  at 
scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
  at scala.collection.immutable.List.apply(List.scala:84)
  at 
org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
}}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org