date:20190722

[jira] [Commented] (SPARK-28450) When scan hive data of a not existed partition, it return an error

2019-07-22 Thread angerszhu (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890713#comment-16890713
 ] 

angerszhu commented on SPARK-28450:
---

[~shivuson...@gmail.com]

Just select a partition table with an not exist table.

select * fromn partitiontable where part_col='not exist' ;

Since in hive we can just return a result of 0 row. In spark , it will catch 
this error in HiveExternalCatalog. Maybe we should make it same as Hive?

> When scan hive data of a not existed partition, it return an error
> --
>
> Key: SPARK-28450
> URL: https://issues.apache.org/jira/browse/SPARK-28450
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-07-19-20-51-12-861.png
>
>
> When we select data of a un-existed hive partition table's partition, it will 
> return error, bu it should just return empty.
> !image-2019-07-19-20-51-12-861.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28450) When scan hive data of a not existed partition, it return an error

2019-07-22 Thread Shivu Sondur (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890710#comment-16890710
 ] 

Shivu Sondur commented on SPARK-28450:
--

[~angerszhuuu]

i want to check this issue.

can you give me all detailed steps to reproduce this issue?

 

> When scan hive data of a not existed partition, it return an error
> --
>
> Key: SPARK-28450
> URL: https://issues.apache.org/jira/browse/SPARK-28450
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-07-19-20-51-12-861.png
>
>
> When we select data of a un-existed hive partition table's partition, it will 
> return error, bu it should just return empty.
> !image-2019-07-19-20-51-12-861.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Ivan Tsukanov (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890694#comment-16890694
 ] 

Ivan Tsukanov commented on SPARK-28480:
---

ok, let's close the ticket. [~shivuson...@gmail.com], thanks for the help!

> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
> Fix For: 2.4.3
>
> Attachments: image-2019-07-23-10-58-45-768.png
>
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Ivan Tsukanov (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Tsukanov updated SPARK-28480:
--
Fix Version/s: 2.4.3

> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
> Fix For: 2.4.3
>
> Attachments: image-2019-07-23-10-58-45-768.png
>
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Shivu Sondur (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890685#comment-16890685
 ] 

Shivu Sondur commented on SPARK-28480:
--

[~itsukanov]

In the latest master branch it works fine. Check below snap

!image-2019-07-23-10-58-45-768.png!

 

> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
> Attachments: image-2019-07-23-10-58-45-768.png
>
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Shivu Sondur (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-28480:
-
Attachment: image-2019-07-23-10-58-45-768.png

> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
> Attachments: image-2019-07-23-10-58-45-768.png
>
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Shivu Sondur (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890674#comment-16890674
 ] 

Shivu Sondur commented on SPARK-28480:
--

i am checking this issue

> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27282) Spark incorrect results when using UNION with GROUP BY clause

2019-07-22 Thread Josh Rosen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-27282:
---
Labels: correctness  (was: )

> Spark incorrect results when using UNION with GROUP BY clause
> -
>
> Key: SPARK-27282
> URL: https://issues.apache.org/jira/browse/SPARK-27282
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit, SQL
>Affects Versions: 2.3.2
> Environment: I'm using :
> IntelliJ  IDEA ==> 2018.1.4
> spark-sql and spark-core ==> 2.3.2.3.1.0.0-78 (for HDP 3.1)
> scala ==> 2.11.8
>Reporter: Sofia
>Priority: Major
>  Labels: correctness
>
> When using UNION clause after a GROUP BY clause in spark, the results 
> obtained are wrong.
> The following example explicit this issue:
> {code:java}
> CREATE TABLE test_un (
> col1 varchar(255),
> col2 varchar(255),
> col3 varchar(255),
> col4 varchar(255)
> );
> INSERT INTO test_un (col1, col2, col3, col4)
> VALUES (1,1,2,4),
> (1,1,2,4),
> (1,1,3,5),
> (2,2,2,null);
> {code}
> I used the following code :
> {code:java}
> val x = Toolkit.HiveToolkit.getDataFromHive("test","test_un")
> val  y = x
>.filter(col("col4")isNotNull)
>   .groupBy("col1", "col2","col3")
>   .agg(count(col("col3")).alias("cnt"))
>   .withColumn("col_name", lit("col3"))
>   .select(col("col1"), col("col2"), 
> col("col_name"),col("col3").alias("col_value"), col("cnt"))
> val z = x
>   .filter(col("col4")isNotNull)
>   .groupBy("col1", "col2","col4")
>   .agg(count(col("col4")).alias("cnt"))
>   .withColumn("col_name", lit("col4"))
>   .select(col("col1"), col("col2"), 
> col("col_name"),col("col4").alias("col_value"), col("cnt"))
> y.union(z).show()
> {code}
>  And i obtained the following results:
> ||col1||col2||col_name||col_value||cnt||
> |1|1|col3|5|1|
> |1|1|col3|4|2|
> |1|1|col4|5|1|
> |1|1|col4|4|2|
> Expected results:
> ||col1||col2||col_name||col_value||cnt||
> |1|1|col3|3|1|
> |1|1|col3|2|2|
> |1|1|col4|4|2|
> |1|1|col4|5|1|
> But when i remove the last row of the table, i obtain the correct results.
> {code:java}
> (2,2,2,null){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates

2019-07-22 Thread Josh Rosen (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890672#comment-16890672
 ] 

Josh Rosen commented on SPARK-24079:


Update: I just realized that SPARK-27915 is actually a closer duplicate of 
SPARK-24080, which is very closely related to this ticket. 

> Update the nullability of Join output based on inferred predicates
> --
>
> Key: SPARK-24079
> URL: https://issues.apache.org/jira/browse/SPARK-24079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> In the master, a logical `Join` node does not respect the nullability that 
> the optimizer rule `InferFiltersFromConstraints`
> might change when inferred predicates have `IsNotNull`, e.g.,
> {code}
> scala> val df1 = Seq((Some(1), Some(2))).toDF("k", "v0")
> scala> val df2 = Seq((Some(1), Some(3))).toDF("k", "v1")
> scala> val joinedDf = df1.join(df2, df1("k") === df2("k"), "inner")
> scala> joinedDf.explain
> == Physical Plan ==
> *(2) BroadcastHashJoin [k#83], [k#92], Inner, BuildRight
> :- *(2) Project [_1#80 AS k#83, _2#81 AS v0#84]
> :  +- *(2) Filter isnotnull(_1#80)
> : +- LocalTableScan [_1#80, _2#81]
> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> true] as bigint)))
>+- *(1) Project [_1#89 AS k#92, _2#90 AS v1#93]
>   +- *(1) Filter isnotnull(_1#89)
>  +- LocalTableScan [_1#89, _2#90]
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(true, true, true, true)
> {code}
> But, these `nullable` values should be:
> {code}
> scala> joinedDf.queryExecution.optimizedPlan.output.map(_.nullable)
> res15: Seq[Boolean] = List(false, true, false, true)
> {code}
> This ticket comes from the previous discussion: 
> https://github.com/apache/spark/pull/18576#pullrequestreview-107585997



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28481) More expressions should extend NullIntolerant

2019-07-22 Thread Josh Rosen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-28481:
---
Description: 
SPARK-13995 introduced the {{NullIntolerant}} trait to generalize the logic for 
inferring {{IsNotNull}} constraints from expressions. An expression is 
_null-intolerant_ if it returns {{null}} when any of its input expressions are 
{{null}}.

I've noticed that _most_ expressions are null-intolerant: anything which 
extends UnaryExpression / BinaryExpression and keeps the default {{eval}} 
method will be null-intolerant. However, only a subset of these expressions mix 
in the {{NullIntolerant}} trait. As a result, we're missing out on the 
opportunity to infer certain types of non-null constraints: for example, if we 
see a {{WHERE length(x) > 10}} condition then we know that the column {{x}} 
must be non-null and can push this non-null filter down to our datasource scan.

I can think of a few ways to fix this:
 # Modify every relevant expression to mix in the {{NullIntolerant}} trait. We 
can use IDEs or other code-analysis tools (e.g. {{ClassUtil}} plus reflection) 
to help automate the process of identifying expressions which do not override 
the default {{eval}}.
 # Make a backwards-incompatible change to our abstract base class hierarchy to 
add {{NullSafe*aryExpression}} abstract base classes which define the 
{{nullSafeEval}} method and implement a {{final eval}} method, then leave 
{{eval}} unimplemented in the regular {{*aryExpression}} base classes.
 ** This would fix the somewhat weird code smell that we have today where 
{{nullSafeEval}} has a default implementation which calls {{sys.error}}.
 ** This would negatively impact users who have implemented custom Catalyst 
expressions.
 # Use runtime reflection to determine whether expressions are null-intolerant 
by virtue of using one of the default null-intolerant {{eval}} implementations. 
We can then use this in an {{isNullIntolerant}} helper method which checks that 
classes either (a) extend {{NullIntolerant}} or (b) are null-intolerant 
according to the reflective check (which is basically just figuring out which 
concrete implementation the {{eval}} method resolves to).
 ** We only need to perform the reflection once _per-class_ and can cache the 
result for the lifetime of the JVM, so the performance overheads would be 
pretty small (especially compared to other non-cacheable reflection / traversal 
costs in Catalyst).
 ** The downside is additional complexity in the code which pattern-matches / 
checks for null-intolerance.

Of these approaches, I'm currently leaning towards option 1 (semi-automated 
identification and manual update of hundreds of expressions): if we go with 
that approach then we can perform a one-time catch-up to fix all existing 
expressions. To handle ongoing maintenance (as we add new expressions), I'd 
propose to add "is this null-intolerant?" to a checklist to use when reviewing 
PRs which add new Catalyst expressions. 

/cc [~maropu] [~viirya]

  was:
SPARK-13995 introduced the {{NullIntolerant}} trait to generalize the logic for 
inferring {{IsNotNull}} constraints from expressions. An expression is 
_null-intolerant_ if it returns {{null}} when any of its input expressions are 
{{null}}.

I've noticed that _most_ expressions are null-intolerant: anything which 
extends UnaryExpression / BinaryExpression and keeps the default {{eval}} 
method will be null-intolerant. However, only a subset of these expressions mix 
in the {{NullIntolerant}} trait. As a result, we're missing out on the 
opportunity to infer certain types of non-null constraints: for example, if we 
see a {{WHERE length(x) > 10}} condition then we know that the column {{x}} 
must be non-null and can push this non-null filter down to our datasource scan.

I can think of a few ways to fix this:
 # Modify every relevant expression to mix in the {{NullIntolerant}} trait. We 
can use IDEs or other code-analysis tools (e.g. {{ClassUtil}} plus reflection) 
to help automate the process of identifying expressions which do not override 
the default {{eval}}.
 # Make a backwards-incompatible change to our abstract base class hierarchy to 
add {{NullSafe*aryExpression}} abstract base classes which define the 
{{nullSafeEval}} method and implement a {{final eval}} method, then leave 
{{eval}} unimplemented in the regular {{*aryExpression}} base classes.
 ** This would fix the somewhat weird code smell that we have today where 
{{nullSafeEval}} has a default implementation which calls {{sys.error}}.
 ** This would negatively impact users who have implemented custom Catalyst 
expressions.
 # Use runtime reflection to determine whether expressions are null-intolerant 
by virtue of using one of the default null-intolerant {{eval}} implementations. 
We can then use this in an {{isNullIntolerant}} helper method which checks that 
classes either (a) extend {{N

[jira] [Updated] (SPARK-28481) More expressions should extend NullIntolerant

2019-07-22 Thread Josh Rosen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-28481:
---
Description: 
SPARK-13995 introduced the {{NullIntolerant}} trait to generalize the logic for 
inferring {{IsNotNull}} constraints from expressions. An expression is 
_null-intolerant_ if it returns {{null}} when any of its input expressions are 
{{null}}.

I've noticed that _most_ expressions are null-intolerant: anything which 
extends UnaryExpression / BinaryExpression and keeps the default {{eval}} 
method will be null-intolerant. However, only a subset of these expressions mix 
in the {{NullIntolerant}} trait. As a result, we're missing out on the 
opportunity to infer certain types of non-null constraints: for example, if we 
see a {{WHERE length\(x\) > 10}} condition then we know that the column {{x}} 
must be non-null and can push this non-null filter down to our datasource scan.

I can think of a few ways to fix this:
 # Modify every relevant expression to mix in the {{NullIntolerant}} trait. We 
can use IDEs or other code-analysis tools (e.g. {{ClassUtil}} plus reflection) 
to help automate the process of identifying expressions which do not override 
the default {{eval}}.
 # Make a backwards-incompatible change to our abstract base class hierarchy to 
add {{NullSafe*aryExpression}} abstract base classes which define the 
{{nullSafeEval}} method and implement a {{final eval}} method, then leave 
{{eval}} unimplemented in the regular {{*aryExpression}} base classes.
 ** This would fix the somewhat weird code smell that we have today where 
{{nullSafeEval}} has a default implementation which calls {{sys.error}}.
 ** This would negatively impact users who have implemented custom Catalyst 
expressions.
 # Use runtime reflection to determine whether expressions are null-intolerant 
by virtue of using one of the default null-intolerant {{eval}} implementations. 
We can then use this in an {{isNullIntolerant}} helper method which checks that 
classes either (a) extend {{NullIntolerant}} or (b) are null-intolerant 
according to the reflective check (which is basically just figuring out which 
concrete implementation the {{eval}} method resolves to).
 ** We only need to perform the reflection once _per-class_ and can cache the 
result for the lifetime of the JVM, so the performance overheads would be 
pretty small (especially compared to other non-cacheable reflection / traversal 
costs in Catalyst).
 ** The downside is additional complexity in the code which pattern-matches / 
checks for null-intolerance.

Of these approaches, I'm currently leaning towards option 1 (semi-automated 
identification and manual update of hundreds of expressions): if we go with 
that approach then we can perform a one-time catch-up to fix all existing 
expressions. To handle ongoing maintenance (as we add new expressions), I'd 
propose to add "is this null-intolerant?" to a checklist to use when reviewing 
PRs which add new Catalyst expressions. 

/cc [~maropu] [~viirya]

  was:
SPARK-13995 introduced the {{NullIntolerant}} trait to generalize the logic for 
inferring {{IsNotNull}} constraints from expressions. An expression is 
_null-intolerant_ if it returns {{null}} when any of its input expressions are 
{{null}}.

I've noticed that _most_ expressions are null-intolerant: anything which 
extends UnaryExpression / BinaryExpression and keeps the default {{eval}} 
method will be null-intolerant. However, only a subset of these expressions mix 
in the {{NullIntolerant}} trait. As a result, we're missing out on the 
opportunity to infer certain types of non-null constraints: for example, if we 
see a {{WHERE length(x) > 10}} condition then we know that the column {{x}} 
must be non-null and can push this non-null filter down to our datasource scan.

I can think of a few ways to fix this:
 # Modify every relevant expression to mix in the {{NullIntolerant}} trait. We 
can use IDEs or other code-analysis tools (e.g. {{ClassUtil}} plus reflection) 
to help automate the process of identifying expressions which do not override 
the default {{eval}}.
 # Make a backwards-incompatible change to our abstract base class hierarchy to 
add {{NullSafe*aryExpression}} abstract base classes which define the 
{{nullSafeEval}} method and implement a {{final eval}} method, then leave 
{{eval}} unimplemented in the regular {{*aryExpression}} base classes.
 ** This would fix the somewhat weird code smell that we have today where 
{{nullSafeEval}} has a default implementation which calls {{sys.error}}.
 ** This would negatively impact users who have implemented custom Catalyst 
expressions.
 # Use runtime reflection to determine whether expressions are null-intolerant 
by virtue of using one of the default null-intolerant {{eval}} implementations. 
We can then use this in an {{isNullIntolerant}} helper method which checks that 
classes either (a) extend {

[jira] [Updated] (SPARK-28481) More expressions should extend NullIntolerant

2019-07-22 Thread Josh Rosen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-28481:
---
Description: 
SPARK-13995 introduced the {{NullIntolerant}} trait to generalize the logic for 
inferring {{IsNotNull}} constraints from expressions. An expression is 
_null-intolerant_ if it returns {{null}} when any of its input expressions are 
{{null}}.

I've noticed that _most_ expressions are null-intolerant: anything which 
extends UnaryExpression / BinaryExpression and keeps the default {{eval}} 
method will be null-intolerant. However, only a subset of these expressions mix 
in the {{NullIntolerant}} trait. As a result, we're missing out on the 
opportunity to infer certain types of non-null constraints: for example, if we 
see a {{WHERE length(x) > 10}} condition then we know that the column {{x}} 
must be non-null and can push this non-null filter down to our datasource scan.

I can think of a few ways to fix this:
 # Modify every relevant expression to mix in the {{NullIntolerant}} trait. We 
can use IDEs or other code-analysis tools (e.g. {{ClassUtil}} plus reflection) 
to help automate the process of identifying expressions which do not override 
the default {{eval}}.
 # Make a backwards-incompatible change to our abstract base class hierarchy to 
add {{NullSafe*aryExpression}} abstract base classes which define the 
{{nullSafeEval}} method and implement a {{final eval}} method, then leave 
{{eval}} unimplemented in the regular {{*aryExpression}} base classes.
 ** This would fix the somewhat weird code smell that we have today where 
{{nullSafeEval}} has a default implementation which calls {{sys.error}}.
 ** This would negatively impact users who have implemented custom Catalyst 
expressions.
 # Use runtime reflection to determine whether expressions are null-intolerant 
by virtue of using one of the default null-intolerant {{eval}} implementations. 
We can then use this in an {{isNullIntolerant}} helper method which checks that 
classes either (a) extend {{NullIntolerant}} or (b) are null-intolerant 
according to the reflective check (which is basically just figuring out which 
concrete implementation the {{eval}} method resolves to).
 ** We only need to perform the reflection once _per-class_ and can cache the 
result for the lifetime of the JVM, so the performance overheads would be 
pretty small (especially compared to other non-cacheable reflection / traversal 
costs in Catalyst).
 ** The downside is additional complexity in the code which pattern-matches / 
checks for null-intolerance.

Of these approaches, I'm currently leaning towards option 1 (semi-automated 
identification and manual update of hundreds of expressions): if we go with 
that approach then we can perform a one-time catch-up to fix all existing 
expressions. To handle ongoing maintenance (as we add new expressions), I'd 
propose to add "is this null-intolerant?" to a checklist to use when reviewing 
PRs which add new Catalyst expressions. 

/cc [~maropu] [~viirya]

  was:
SPARK-13995 introduced the {{NullIntolerant}} trait to generalize the logic for 
inferring {{IsNotNull}} constraints from expressions. An expression is 
_null-intolerant_ if it returns {{null}} when any of its input expressions are 
{{null}}.

I've noticed that _most_ expressions are null-intolerant: anything which 
extends UnaryExpression / BinaryExpression and keeps the default {{eval}} 
method will be null-intolerant. However, only a subset of these expressions mix 
in the {{NullIntolerant}} trait. As a result, we're missing out on the 
opportunity to infer certain types of non-null constraints: for example, if we 
see a {{WHERE length(x) > 10}} condition then we know that the column {{x}} 
must be non-null and can push this non-null filter down to our datasource scan.

I can think of a few ways to fix this:
 # Modify every relevant expression to mix in the {{NullIntolerant}} trait. We 
can use IDEs or other code-analysis tools (e.g. {{ClassUtil}} plus reflection) 
to help automate the process of identifying expressions which do not override 
the default {{eval}}.
 # Make a backwards-incompatible change to our abstract base class hierarchy to 
add {{NullSafe*aryExpression}} abstract base classes which define the 
{{nullSafeEval}} method and implement a {{final eval}} method, then leave 
{{eval}} unimplemented in the regular {{*aryExpression}} base classes.
 ** This would fix the somewhat weird code smell that we have today where 
{{nullSafeEval}} has a default implementation which calls {{sys.error}}.
 ** This would negatively impact users who have implemented custom Catalyst 
expressions.
 # Use runtime reflection to determine whether expressions are null-intolerant 
by virtue of using one of the default null-intolerant {{eval}} implementations. 
We can then use this in an {{isNullIntolerant}} helper method which checks that 
classes either (a) extend {{N

[jira] [Created] (SPARK-28481) More expressions should extend NullIntolerant

2019-07-22 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-28481:
--

 Summary: More expressions should extend NullIntolerant
 Key: SPARK-28481
 URL: https://issues.apache.org/jira/browse/SPARK-28481
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Josh Rosen


SPARK-13995 introduced the {{NullIntolerant}} trait to generalize the logic for 
inferring {{IsNotNull}} constraints from expressions. An expression is 
_null-intolerant_ if it returns {{null}} when any of its input expressions are 
{{null}}.

I've noticed that _most_ expressions are null-intolerant: anything which 
extends UnaryExpression / BinaryExpression and keeps the default {{eval}} 
method will be null-intolerant. However, only a subset of these expressions mix 
in the {{NullIntolerant}} trait. As a result, we're missing out on the 
opportunity to infer certain types of non-null constraints: for example, if we 
see a {{WHERE length(x) > 10}} condition then we know that the column {{x}} 
must be non-null and can push this non-null filter down to our datasource scan.

I can think of a few ways to fix this:
 # Modify every relevant expression to mix in the {{NullIntolerant}} trait. We 
can use IDEs or other code-analysis tools (e.g. {{ClassUtil}} plus reflection) 
to help automate the process of identifying expressions which do not override 
the default {{eval}}.
 # Make a backwards-incompatible change to our abstract base class hierarchy to 
add {{NullSafe*aryExpression}} abstract base classes which define the 
{{nullSafeEval}} method and implement a {{final eval}} method, then leave 
{{eval}} unimplemented in the regular {{*aryExpression}} base classes.
 ** This would fix the somewhat weird code smell that we have today where 
{{nullSafeEval}} has a default implementation which calls {{sys.error}}.
 ** This would negatively impact users who have implemented custom Catalyst 
expressions.
 # Use runtime reflection to determine whether expressions are null-intolerant 
by virtue of using one of the default null-intolerant {{eval}} implementations. 
We can then use this in an {{isNullIntolerant}} helper method which checks that 
classes either (a) extend {{NullIntolerant}} or (b) are null-intolerant 
according to the reflective check (which is basically just figuring out which 
concrete implementation the {{eval}} method resolves to).
 ** We only need to perform the reflection once _per-class_ and can cache the 
result for the lifetime of the JVM, so the performance overheads would be 
pretty small (especially compared to other non-cacheable reflection / traversal 
costs in Catalyst).
 ** The downside is additional complexity in the code which pattern-matches / 
checks for null-intolerance.

Of these approaches, I'm currently leaning towards option 1 (semi-automated 
identification and manual update of hundreds of expressions): if we go with 
that approach then we can perform a one-time catch-up to fix all existing 
expressions. To handle ongoing maintenance (as we add new expressions), I'd 
propose to add "is this null-intolerant?" to a checklist to use when reviewing 
PRs which add new Catalyst expressions. 

/cc [~maropu] [~viirya]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28469) Change CalendarIntervalType's readable string representation from calendarinterval to interval

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28469.
---
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/25225

> Change CalendarIntervalType's readable string representation from 
> calendarinterval to interval
> --
>
> Key: SPARK-28469
> URL: https://issues.apache.org/jira/browse/SPARK-28469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> We should update CalendarIntervalType's simpleString from calendarinterval to 
> interval.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28451) substr returns different values

2019-07-22 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890652#comment-16890652
 ] 

Dongjoon Hyun commented on SPARK-28451:
---

Thank you for the explanation, [~maropu].

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Ivan Tsukanov (JIRA)

Ivan Tsukanov created SPARK-28480:
-

 Summary: Types of input parameters of a UDF affect the ability to 
cache the result
 Key: SPARK-28480
 URL: https://issues.apache.org/jira/browse/SPARK-28480
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.1
Reporter: Ivan Tsukanov


When I define a parameter in a UDF as Boolean or Int the result DataFrame can't 
be cached

 

 
{code:java}
import org.apache.spark.sql.functions.{lit, udf}
val empty = sparkSession.emptyDataFrame
val table = "table"

def test(customUDF: UserDefinedFunction, col: Column): Unit = {
  val df = empty.select(customUDF(col))
  df.cache()
  df.createOrReplaceTempView(table)
  println(sparkSession.catalog.isCached(table))
}

test(udf { _: String => 42 }, lit("")) // true
test(udf { _: Any => 42 }, lit("")) // true
test(udf { _: Int => 42 }, lit(42)) // false
test(udf { _: Boolean => 42 }, lit(false)) // false
{code}
or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Ivan Tsukanov (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Tsukanov updated SPARK-28480:
--
Description: 
When I define a parameter in a UDF as Boolean or Int the result DataFrame can't 
be cached 
{code:java}
import org.apache.spark.sql.functions.{lit, udf}
val empty = sparkSession.emptyDataFrame
val table = "table"

def test(customUDF: UserDefinedFunction, col: Column): Unit = {
  val df = empty.select(customUDF(col))
  df.cache()
  df.createOrReplaceTempView(table)
  println(sparkSession.catalog.isCached(table))
}

test(udf { _: String => 42 }, lit("")) // true
test(udf { _: Any => 42 }, lit("")) // true
test(udf { _: Int => 42 }, lit(42)) // false
test(udf { _: Boolean => 42 }, lit(false)) // false
{code}
or sparkSession.catalog.isCached gives irrelevant information.

  was:
When I define a parameter in a UDF as Boolean or Int the result DataFrame can't 
be cached

 

 
{code:java}
import org.apache.spark.sql.functions.{lit, udf}
val empty = sparkSession.emptyDataFrame
val table = "table"

def test(customUDF: UserDefinedFunction, col: Column): Unit = {
  val df = empty.select(customUDF(col))
  df.cache()
  df.createOrReplaceTempView(table)
  println(sparkSession.catalog.isCached(table))
}

test(udf { _: String => 42 }, lit("")) // true
test(udf { _: Any => 42 }, lit("")) // true
test(udf { _: Int => 42 }, lit(42)) // false
test(udf { _: Boolean => 42 }, lit(false)) // false
{code}
or sparkSession.catalog.isCached gives irrelevant information.


> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28479) Parser error when enabling ANSI mode

2019-07-22 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-28479:
---

 Summary: Parser error when enabling ANSI mode
 Key: SPARK-28479
 URL: https://issues.apache.org/jira/browse/SPARK-28479
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


Case 1:
{code:sql}
spark-sql> set spark.sql.parser.ansi.enabled=true;
spark.sql.parser.ansi.enabled   true
spark-sql> select extract(year from timestamp '2001-02-16 20:38:40')  ;
Error in query:
no viable alternative at input 'year'(line 1, pos 15)

== SQL ==
select extract(year from timestamp '2001-02-16 20:38:40')
---^^^

spark-sql> set spark.sql.parser.ansi.enabled=false;
spark.sql.parser.ansi.enabled   false
spark-sql> select extract(year from timestamp '2001-02-16 20:38:40')  ;
2001
{code}

Case 2:
{code:sql}
spark-sql> select left('12345', 2);
12
spark-sql> set spark.sql.parser.ansi.enabled=true;
spark.sql.parser.ansi.enabled   true
spark-sql> select left('12345', 2);
Error in query:
no viable alternative at input 'left'(line 1, pos 7)

== SQL ==
select left('12345', 2)
---^^^
{code}

https://github.com/apache/spark/pull/25114#issuecomment-512229758



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25349) Support sample pushdown in Data Source V2

2019-07-22 Thread Weichen Xu (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890623#comment-16890623
 ] 

Weichen Xu commented on SPARK-25349:


I will work on this. Thanks!

> Support sample pushdown in Data Source V2
> -
>
> Key: SPARK-25349
> URL: https://issues.apache.org/jira/browse/SPARK-25349
> Project: Spark
>  Issue Type: Story
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> Support sample pushdown would help file-based data source implementation save 
> I/O cost significantly if it can decide whether to read a file or not.
>  
> cc: [~cloud_fan]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28478) Optimizer rule to remove unnecessary explicit null checks for null-intolerant expressions (e.g. if(x is null, x, f(x)))

2019-07-22 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-28478:
--

 Summary: Optimizer rule to remove unnecessary explicit null checks 
for null-intolerant expressions (e.g. if(x is null, x, f(x)))
 Key: SPARK-28478
 URL: https://issues.apache.org/jira/browse/SPARK-28478
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer, SQL
Affects Versions: 3.0.0
Reporter: Josh Rosen


I ran across a family of expressions like
{code:java}
if(x is null, x, substring(x, 0, 1024)){code}
or 
{code:java}
when($"x".isNull, $"x", substring($"x", 0, 1024)){code}
that were written this way because the query author was unsure about whether 
{{substring}} would return {{null}} when its input string argument is null.

This explicit null-handling is unnecessary and adds bloat to the generated 
code, especially if it's done via a {{CASE}} statement (which compiles down to 
a {{do-while}} loop).

In another case I saw a query compiler which automatically generated this type 
of code.

It would be cool if Spark could automatically optimize such queries to remove 
these redundant null checks. Here's a sketch of what such a rule might look 
like (assuming that SPARK-28477 has been implement so we only need to worry 
about the {{IF}} case):
 * In the pattern match, check the following three conditions in the following 
order (to benefit from short-circuiting)
 ** The {{IF}} condition is an explicit null-check of a column {{c}}
 ** The {{true}} expression returns either {{c}} or {{null}}
 ** The {{false}} expression is a _null-intolerant_ expression with {{c}} as a 
_direct_ child. 
 * If this condition matches, replace the entire {{If}} with the {{false}} 
branch's expression..

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28477) Rewrite `CASE WHEN cond THEN ifTrue OTHERWISE ifFalse` END into `IF(cond, ifTrue, ifFalse`

2019-07-22 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-28477:
--

 Summary: Rewrite `CASE WHEN cond THEN ifTrue OTHERWISE ifFalse` 
END into `IF(cond, ifTrue, ifFalse`
 Key: SPARK-28477
 URL: https://issues.apache.org/jira/browse/SPARK-28477
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer, SQL
Affects Versions: 3.0.0
Reporter: Josh Rosen


Spark SQL has both {{CASE WHEN}} and {{IF}} expressions.

I've seen many cases where end-users write
{code:java}
when(x, ifTrue).otherwise(ifFalse){code}
because Spark doesn't have a {{org.apache.spark.sql.functions._}} method for 
the {{If}} expression.

Unfortunately, {{CASE WHEN}} generates substantial code bloat because its 
codgen is implemented using a {{do-while}} loop. In some performance-critical 
frameworks, I've modified our code to directly construct the Catalyst {{If}} 
expression, but this is toilsome and confusing to end-users.

If we have a {{CASE WHEN}} which has only two branches, like the example given 
above, then Spark should automatically rewrite it into a simple {{IF}} 
expression.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28477) Rewrite `CASE WHEN cond THEN ifTrue OTHERWISE ifFalse` END into `IF(cond, ifTrue, ifFalse)`

2019-07-22 Thread Josh Rosen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-28477:
---
Summary: Rewrite `CASE WHEN cond THEN ifTrue OTHERWISE ifFalse` END into 
`IF(cond, ifTrue, ifFalse)`  (was: Rewrite `CASE WHEN cond THEN ifTrue 
OTHERWISE ifFalse` END into `IF(cond, ifTrue, ifFalse`)

> Rewrite `CASE WHEN cond THEN ifTrue OTHERWISE ifFalse` END into `IF(cond, 
> ifTrue, ifFalse)`
> ---
>
> Key: SPARK-28477
> URL: https://issues.apache.org/jira/browse/SPARK-28477
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Priority: Major
>
> Spark SQL has both {{CASE WHEN}} and {{IF}} expressions.
> I've seen many cases where end-users write
> {code:java}
> when(x, ifTrue).otherwise(ifFalse){code}
> because Spark doesn't have a {{org.apache.spark.sql.functions._}} method for 
> the {{If}} expression.
> Unfortunately, {{CASE WHEN}} generates substantial code bloat because its 
> codgen is implemented using a {{do-while}} loop. In some performance-critical 
> frameworks, I've modified our code to directly construct the Catalyst {{If}} 
> expression, but this is toilsome and confusing to end-users.
> If we have a {{CASE WHEN}} which has only two branches, like the example 
> given above, then Spark should automatically rewrite it into a simple {{IF}} 
> expression.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28431) CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message

2019-07-22 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28431.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25184
[https://github.com/apache/spark/pull/25184]

> CSV datasource throw com.univocity.parsers.common.TextParsingException with 
> large size message 
> ---
>
> Key: SPARK-28431
> URL: https://issues.apache.org/jira/browse/SPARK-28431
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Minor
> Fix For: 3.0.0
>
>
> CSV datasource throw com.univocity.parsers.common.TextParsingException with 
> large size message, which will make log output consume large disk space.
> Reproduce code
> {code:java}
> val s = "a" * 40 * 100
> Seq(s).toDF.write.mode("overwrite").csv("/tmp/bogdan/es4196.csv")
> spark.read .option("maxCharsPerColumn", 3000) 
> .csv("/tmp/bogdan/es4196.csv").count{code}
> Because of maxCharsPerColumn limit of 30M, there will be a 
> TextParsingException. The message of this exception actually includes what 
> was parsed so far, in this case 30M chars.
>  
> This issue is troublesome when sometimes we need parse CSV with large column.
> We should truncate the large size message in the TextParsingException.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28431) CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message

2019-07-22 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28431:


Assignee: Weichen Xu

> CSV datasource throw com.univocity.parsers.common.TextParsingException with 
> large size message 
> ---
>
> Key: SPARK-28431
> URL: https://issues.apache.org/jira/browse/SPARK-28431
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Minor
>
> CSV datasource throw com.univocity.parsers.common.TextParsingException with 
> large size message, which will make log output consume large disk space.
> Reproduce code
> {code:java}
> val s = "a" * 40 * 100
> Seq(s).toDF.write.mode("overwrite").csv("/tmp/bogdan/es4196.csv")
> spark.read .option("maxCharsPerColumn", 3000) 
> .csv("/tmp/bogdan/es4196.csv").count{code}
> Because of maxCharsPerColumn limit of 30M, there will be a 
> TextParsingException. The message of this exception actually includes what 
> was parsed so far, in this case 30M chars.
>  
> This issue is troublesome when sometimes we need parse CSV with large column.
> We should truncate the large size message in the TextParsingException.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28085) Spark Scala API documentation URLs not working properly in Chrome

2019-07-22 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890564#comment-16890564
 ] 

Hyukjin Kwon commented on SPARK-28085:
--

My Chrome is 75 and I am seeing this issue FWIW.

> Spark Scala API documentation URLs not working properly in Chrome
> -
>
> Key: SPARK-28085
> URL: https://issues.apache.org/jira/browse/SPARK-28085
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: Andrew Leverentz
>Priority: Minor
>
> In Chrome version 75, URLs in the Scala API documentation are not working 
> properly, which makes them difficult to bookmark.
> For example, URLs like the following get redirected to a generic "root" 
> package page:
> [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html]
> [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset]
> Here's the URL that I get redirected to:
> [https://spark.apache.org/docs/latest/api/scala/index.html#package]
> This issue seems to have appeared between versions 74 and 75 of Chrome, but 
> the documentation URLs still work in Safari.  I suspect that this has 
> something to do with security-related changes to how Chrome 75 handles frames 
> and/or redirects.  I've reported this issue to the Chrome team via the 
> in-browser help menu, but I don't have any visibility into their response, so 
> it's not clear whether they'll consider this a bug or "working as intended".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28451) substr returns different values

2019-07-22 Thread Takeshi Yamamuro (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890558#comment-16890558
 ] 

Takeshi Yamamuro commented on SPARK-28451:
--

I don't have any standard reference for this behaivour though, +1 for the 
Dongjoon opnion; if the standard defines this behaviour explicitly, it might be 
worth fixing this.

btw, the current ansi mode we have (spark.sql.parser.ansi.enabled) only affects 
the spark parser behaviour now, so we might need another new option for this 
kind of behaviour changes to follow the standard.

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28451) substr returns different values

2019-07-22 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890552#comment-16890552
 ] 

Dongjoon Hyun commented on SPARK-28451:
---

We already have `ansi` mode and default(non-ansi) mode. Do you have a reference 
for the standard?
(cc [~smilegator] and [~maropu].)

If there is no reference, I'd like to stick to the current existing Spark 
behavior only.

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28474) Lower JDBC client cannot read binary type

2019-07-22 Thread Yuming Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890549#comment-16890549
 ] 

Yuming Wang commented on SPARK-28474:
-

I'm working on.

> Lower JDBC client cannot read binary type
> -
>
> Key: SPARK-28474
> URL: https://issues.apache.org/jira/browse/SPARK-28474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Logs:
> {noformat}
> java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible 
> with java.lang.String
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:770)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy26.fetchResults(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String
>   at 
> org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:148)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   ... 18 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28432) Date/Time Functions: make_date

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28432.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25210
[https://github.com/apache/spark/pull/25210]

> Date/Time Functions: make_date
> --
>
> Key: SPARK-28432
> URL: https://issues.apache.org/jira/browse/SPARK-28432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
> }}{{int}}{{)}}|{{date}}|Create date from year, month and day 
> fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28432) Add `make_date` function

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28432:
--
Summary: Add `make_date` function  (was: Date/Time Functions: make_date)

> Add `make_date` function
> 
>
> Key: SPARK-28432
> URL: https://issues.apache.org/jira/browse/SPARK-28432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
> }}{{int}}{{)}}|{{date}}|Create date from year, month and day 
> fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28432) Date/Time Functions: make_date

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28432:
-

Assignee: Maxim Gekk

> Date/Time Functions: make_date
> --
>
> Key: SPARK-28432
> URL: https://issues.apache.org/jira/browse/SPARK-28432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Maxim Gekk
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
> }}{{int}}{{)}}|{{date}}|Create date from year, month and day 
> fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28469) Change CalendarIntervalType's readable string representation from calendarinterval to interval

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28469:
--
Summary: Change CalendarIntervalType's readable string representation from 
calendarinterval to interval  (was: Add simpleString for CalendarIntervalType)

> Change CalendarIntervalType's readable string representation from 
> calendarinterval to interval
> --
>
> Key: SPARK-28469
> URL: https://issues.apache.org/jira/browse/SPARK-28469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> We should update CalendarIntervalType's simpleString from calendarinterval to 
> interval.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28455) Executor may be timed out too soon because of overflow in tracking code

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28455:
-

Assignee: Marcelo Vanzin

> Executor may be timed out too soon because of overflow in tracking code
> ---
>
> Key: SPARK-28455
> URL: https://issues.apache.org/jira/browse/SPARK-28455
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
>
> This affects the new code added in SPARK-27963 (so normal dynamic allocation 
> is fine). There's an overflow issue in that code that may cause executors to 
> be timed out early with the default configuration.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28455) Executor may be timed out too soon because of overflow in tracking code

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28455.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25208
[https://github.com/apache/spark/pull/25208]

> Executor may be timed out too soon because of overflow in tracking code
> ---
>
> Key: SPARK-28455
> URL: https://issues.apache.org/jira/browse/SPARK-28455
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> This affects the new code added in SPARK-27963 (so normal dynamic allocation 
> is fine). There's an overflow issue in that code that may cause executors to 
> be timed out early with the default configuration.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28476) Support ALTER DATABASE SET LOCATION

2019-07-22 Thread Xiao Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28476:

Target Version/s: 3.0.0

> Support ALTER DATABASE SET LOCATION
> ---
>
> Key: SPARK-28476
> URL: https://issues.apache.org/jira/browse/SPARK-28476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> We can support the syntax of ALTER (DATABASE|SCHEMA) database_name SET 
> LOCATION path
> Ref: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28476) Support ALTER DATABASE SET LOCATION

2019-07-22 Thread Xiao Li (JIRA)

Xiao Li created SPARK-28476:
---

 Summary: Support ALTER DATABASE SET LOCATION
 Key: SPARK-28476
 URL: https://issues.apache.org/jira/browse/SPARK-28476
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xiao Li


We can support the syntax of ALTER (DATABASE|SCHEMA) database_name SET LOCATION 
path

Ref: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL]

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28468) `do-release-docker.sh` fails at `sphinx` installation to `Python 2.7`

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28468.
---
   Resolution: Fixed
Fix Version/s: 2.4.4

This is resolved via https://github.com/apache/spark/pull/25226

> `do-release-docker.sh` fails at `sphinx` installation to `Python 2.7`
> -
>
> Key: SPARK-28468
> URL: https://issues.apache.org/jira/browse/SPARK-28468
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.4
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 2.4.4
>
>
> `do-release-docker.sh` fails at `sphinx` installation to `Python 2.7`.
> {code}
> $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n
> {code}
> The following is the same reproducible step.
> {code}
> $ docker build -t spark-rm-test2 --build-arg UID=501 
> dev/create-release/spark-rm
> {code}
> This happens in `branch-2.4` only.
> {code}
> root@4e196b3d7611:/# lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:Ubuntu 16.04.6 LTS
> Release:16.04
> Codename:   xenial
> root@4e196b3d7611:/# pip install sphinx
> Collecting sphinx
>   Downloading 
> https://files.pythonhosted.org/packages/89/1e/64c77163706556b647f99d67b42fced9d39ae6b1b86673965a2cd28037b5/Sphinx-2.1.2.tar.gz
>  (6.3MB)
> 100% || 6.3MB 316kB/s
> Complete output from command python setup.py egg_info:
> ERROR: Sphinx requires at least Python 3.5 to run.
> 
> Command "python setup.py egg_info" failed with error code 1 in 
> /tmp/pip-build-7usNN9/sphinx/
> You are using pip version 8.1.1, however version 19.1.1 is available.
> You should consider upgrading via the 'pip install --upgrade pip' command.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28468) Upgrade pip to fix `sphinx` install error

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28468:
--
Summary: Upgrade pip to fix `sphinx` install error  (was: 
`do-release-docker.sh` fails at `sphinx` installation to `Python 2.7`)

> Upgrade pip to fix `sphinx` install error
> -
>
> Key: SPARK-28468
> URL: https://issues.apache.org/jira/browse/SPARK-28468
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.4
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 2.4.4
>
>
> `do-release-docker.sh` fails at `sphinx` installation to `Python 2.7`.
> {code}
> $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n
> {code}
> The following is the same reproducible step.
> {code}
> $ docker build -t spark-rm-test2 --build-arg UID=501 
> dev/create-release/spark-rm
> {code}
> This happens in `branch-2.4` only.
> {code}
> root@4e196b3d7611:/# lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:Ubuntu 16.04.6 LTS
> Release:16.04
> Codename:   xenial
> root@4e196b3d7611:/# pip install sphinx
> Collecting sphinx
>   Downloading 
> https://files.pythonhosted.org/packages/89/1e/64c77163706556b647f99d67b42fced9d39ae6b1b86673965a2cd28037b5/Sphinx-2.1.2.tar.gz
>  (6.3MB)
> 100% || 6.3MB 316kB/s
> Complete output from command python setup.py egg_info:
> ERROR: Sphinx requires at least Python 3.5 to run.
> 
> Command "python setup.py egg_info" failed with error code 1 in 
> /tmp/pip-build-7usNN9/sphinx/
> You are using pip version 8.1.1, however version 19.1.1 is available.
> You should consider upgrading via the 'pip install --upgrade pip' command.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28475) Add regex MetricFilter to GraphiteSink

2019-07-22 Thread Nick Karpov (JIRA)

Nick Karpov created SPARK-28475:
---

 Summary: Add regex MetricFilter to GraphiteSink
 Key: SPARK-28475
 URL: https://issues.apache.org/jira/browse/SPARK-28475
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.3
Reporter: Nick Karpov


Today all registered metric sources are reported to GraphiteSink with no 
filtering mechanism, although the codahale project does support it.

GraphiteReporter (ScheduledReporter) from the codahale project requires you 
implement and supply the MetricFilter interface (there is only a single 
implementation by default in the codahale project, MetricFilter.ALL).

Propose to add an additional regex config to match and filter metrics to the 
GraphiteSink



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28085) Spark Scala API documentation URLs not working properly in Chrome

2019-07-22 Thread Andrew Leverentz (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890314#comment-16890314
 ] 

Andrew Leverentz commented on SPARK-28085:
--

This issue still remains, more than a month after the Chrome update that caused 
it.  It's not clear whether Google considers it a bug that needs fixing.  I've 
reported the issue to Google, as mentioned above, but if anyone else has a 
better way of contacting the Chrome team, I'd appreciate it if you could try to 
get in touch with them to see whether they are aware of this bug and planning 
to fix it.

> Spark Scala API documentation URLs not working properly in Chrome
> -
>
> Key: SPARK-28085
> URL: https://issues.apache.org/jira/browse/SPARK-28085
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: Andrew Leverentz
>Priority: Minor
>
> In Chrome version 75, URLs in the Scala API documentation are not working 
> properly, which makes them difficult to bookmark.
> For example, URLs like the following get redirected to a generic "root" 
> package page:
> [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html]
> [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset]
> Here's the URL that I get redirected to:
> [https://spark.apache.org/docs/latest/api/scala/index.html#package]
> This issue seems to have appeared between versions 74 and 75 of Chrome, but 
> the documentation URLs still work in Safari.  I suspect that this has 
> something to do with security-related changes to how Chrome 75 handles frames 
> and/or redirects.  I've reported this issue to the Chrome team via the 
> in-browser help menu, but I don't have any visibility into their response, so 
> it's not clear whether they'll consider this a bug or "working as intended".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28225) Unexpected behavior for Window functions

2019-07-22 Thread Andrew Leverentz (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Leverentz resolved SPARK-28225.
--
Resolution: Not A Problem

> Unexpected behavior for Window functions
> 
>
> Key: SPARK-28225
> URL: https://issues.apache.org/jira/browse/SPARK-28225
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Andrew Leverentz
>Priority: Major
>
> I've noticed some odd behavior when combining the "first" aggregate function 
> with an ordered Window.
> In particular, I'm working with columns created using the syntax
> {code}
> first($"y", ignoreNulls = true).over(Window.orderBy($"x"))
> {code}
> Below, I'm including some code which reproduces this issue in a Databricks 
> notebook.
> *Code:*
> {code:java}
> import org.apache.spark.sql.functions.first
> import org.apache.spark.sql.expressions.Window
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types.{StructType,StructField,IntegerType}
> val schema = StructType(Seq(
>   StructField("x", IntegerType, false),
>   StructField("y", IntegerType, true),
>   StructField("z", IntegerType, true)
> ))
> val input =
>   spark.createDataFrame(sc.parallelize(Seq(
> Row(101, null, 11),
> Row(102, null, 12),
> Row(103, null, 13),
> Row(203, 24, null),
> Row(201, 26, null),
> Row(202, 25, null)
>   )), schema = schema)
> input.show
> val output = input
>   .withColumn("u1", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u2", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u3", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u4", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
>   .withColumn("u5", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u6", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u7", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u8", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
> output.show
> {code}
> *Expectation:*
> Based on my understanding of how ordered-Window and aggregate functions work, 
> the results I expected to see were:
>  * u1 = u2 = constant value of 26
>  * u3 = u4 = constant value of 24
>  * u5 = u6 = constant value of 11
>  * u7 = u8 = constant value of 13
> However, columns u1, u2, u7, and u8 contain some unexpected nulls. 
> *Results:*
> {code:java}
> +---+++++---+---+---+---+++
> |  x|   y|   z|  u1|  u2| u3| u4| u5| u6|  u7|  u8|
> +---+++++---+---+---+---+++
> |203|  24|null|  26|  26| 24| 24| 11| 11|null|null|
> |202|  25|null|  26|  26| 24| 24| 11| 11|null|null|
> |201|  26|null|  26|  26| 24| 24| 11| 11|null|null|
> |103|null|  13|null|null| 24| 24| 11| 11|  13|  13|
> |102|null|  12|null|null| 24| 24| 11| 11|  13|  13|
> |101|null|  11|null|null| 24| 24| 11| 11|  13|  13|
> +---+++++---+---+---+---+++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28225) Unexpected behavior for Window functions

2019-07-22 Thread Andrew Leverentz (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890310#comment-16890310
 ] 

Andrew Leverentz commented on SPARK-28225:
--

Marco, thanks for the explanation.  In this case, the workaround in Scala is to 
use

{{Window.orderBy($"x").rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)}}

This issue can be marked resolved.

> Unexpected behavior for Window functions
> 
>
> Key: SPARK-28225
> URL: https://issues.apache.org/jira/browse/SPARK-28225
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Andrew Leverentz
>Priority: Major
>
> I've noticed some odd behavior when combining the "first" aggregate function 
> with an ordered Window.
> In particular, I'm working with columns created using the syntax
> {code}
> first($"y", ignoreNulls = true).over(Window.orderBy($"x"))
> {code}
> Below, I'm including some code which reproduces this issue in a Databricks 
> notebook.
> *Code:*
> {code:java}
> import org.apache.spark.sql.functions.first
> import org.apache.spark.sql.expressions.Window
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types.{StructType,StructField,IntegerType}
> val schema = StructType(Seq(
>   StructField("x", IntegerType, false),
>   StructField("y", IntegerType, true),
>   StructField("z", IntegerType, true)
> ))
> val input =
>   spark.createDataFrame(sc.parallelize(Seq(
> Row(101, null, 11),
> Row(102, null, 12),
> Row(103, null, 13),
> Row(203, 24, null),
> Row(201, 26, null),
> Row(202, 25, null)
>   )), schema = schema)
> input.show
> val output = input
>   .withColumn("u1", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u2", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u3", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u4", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
>   .withColumn("u5", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u6", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u7", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u8", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
> output.show
> {code}
> *Expectation:*
> Based on my understanding of how ordered-Window and aggregate functions work, 
> the results I expected to see were:
>  * u1 = u2 = constant value of 26
>  * u3 = u4 = constant value of 24
>  * u5 = u6 = constant value of 11
>  * u7 = u8 = constant value of 13
> However, columns u1, u2, u7, and u8 contain some unexpected nulls. 
> *Results:*
> {code:java}
> +---+++++---+---+---+---+++
> |  x|   y|   z|  u1|  u2| u3| u4| u5| u6|  u7|  u8|
> +---+++++---+---+---+---+++
> |203|  24|null|  26|  26| 24| 24| 11| 11|null|null|
> |202|  25|null|  26|  26| 24| 24| 11| 11|null|null|
> |201|  26|null|  26|  26| 24| 24| 11| 11|null|null|
> |103|null|  13|null|null| 24| 24| 11| 11|  13|  13|
> |102|null|  12|null|null| 24| 24| 11| 11|  13|  13|
> |101|null|  11|null|null| 24| 24| 11| 11|  13|  13|
> +---+++++---+---+---+---+++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28225) Unexpected behavior for Window functions

2019-07-22 Thread Andrew Leverentz (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890310#comment-16890310
 ] 

Andrew Leverentz edited comment on SPARK-28225 at 7/22/19 4:54 PM:
---

Marco, thanks for the explanation.  In this case, the solution in Scala is to 
use

{{Window.orderBy($"x").rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)}}

This issue can be marked resolved.


was (Author: alev_etx):
Marco, thanks for the explanation.  In this case, the workaround in Scala is to 
use

{{Window.orderBy($"x").rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)}}

This issue can be marked resolved.

> Unexpected behavior for Window functions
> 
>
> Key: SPARK-28225
> URL: https://issues.apache.org/jira/browse/SPARK-28225
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Andrew Leverentz
>Priority: Major
>
> I've noticed some odd behavior when combining the "first" aggregate function 
> with an ordered Window.
> In particular, I'm working with columns created using the syntax
> {code}
> first($"y", ignoreNulls = true).over(Window.orderBy($"x"))
> {code}
> Below, I'm including some code which reproduces this issue in a Databricks 
> notebook.
> *Code:*
> {code:java}
> import org.apache.spark.sql.functions.first
> import org.apache.spark.sql.expressions.Window
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types.{StructType,StructField,IntegerType}
> val schema = StructType(Seq(
>   StructField("x", IntegerType, false),
>   StructField("y", IntegerType, true),
>   StructField("z", IntegerType, true)
> ))
> val input =
>   spark.createDataFrame(sc.parallelize(Seq(
> Row(101, null, 11),
> Row(102, null, 12),
> Row(103, null, 13),
> Row(203, 24, null),
> Row(201, 26, null),
> Row(202, 25, null)
>   )), schema = schema)
> input.show
> val output = input
>   .withColumn("u1", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u2", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u3", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u4", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
>   .withColumn("u5", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u6", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u7", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u8", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
> output.show
> {code}
> *Expectation:*
> Based on my understanding of how ordered-Window and aggregate functions work, 
> the results I expected to see were:
>  * u1 = u2 = constant value of 26
>  * u3 = u4 = constant value of 24
>  * u5 = u6 = constant value of 11
>  * u7 = u8 = constant value of 13
> However, columns u1, u2, u7, and u8 contain some unexpected nulls. 
> *Results:*
> {code:java}
> +---+++++---+---+---+---+++
> |  x|   y|   z|  u1|  u2| u3| u4| u5| u6|  u7|  u8|
> +---+++++---+---+---+---+++
> |203|  24|null|  26|  26| 24| 24| 11| 11|null|null|
> |202|  25|null|  26|  26| 24| 24| 11| 11|null|null|
> |201|  26|null|  26|  26| 24| 24| 11| 11|null|null|
> |103|null|  13|null|null| 24| 24| 11| 11|  13|  13|
> |102|null|  12|null|null| 24| 24| 11| 11|  13|  13|
> |101|null|  11|null|null| 24| 24| 11| 11|  13|  13|
> +---+++++---+---+---+---+++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28474) Lower JDBC client cannot read binary type

2019-07-22 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28474:

Summary: Lower JDBC client cannot read binary type  (was: Lower JDBC client 
version cannot read binary type)

> Lower JDBC client cannot read binary type
> -
>
> Key: SPARK-28474
> URL: https://issues.apache.org/jira/browse/SPARK-28474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Logs:
> {noformat}
> java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible 
> with java.lang.String
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:770)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy26.fetchResults(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String
>   at 
> org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:148)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   ... 18 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-22 Thread Xiao Li (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890281#comment-16890281
 ] 

Xiao Li commented on SPARK-28457:
-

[~shaneknapp] Thanks for fixing it! 

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: shane knapp
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-22 Thread shane knapp (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp reassigned SPARK-28457:
---

Assignee: shane knapp

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: shane knapp
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-22 Thread shane knapp (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp resolved SPARK-28457.
-
Resolution: Fixed

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: shane knapp
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890279#comment-16890279
 ] 

shane knapp commented on SPARK-28457:
-

ok, the error i'm seeing in the lint job is most definitely not related to the 
SSL certs:

[https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10613/console]
{noformat}
starting python compilation test...
python compilation succeeded.

downloading pycodestyle from 
https://raw.githubusercontent.com/PyCQA/pycodestyle/2.4.0/pycodestyle.py...
starting pycodestyle test...
pycodestyle checks failed:
  File "/home/jenkins/workspace/spark-master-lint/dev/pycodestyle-2.4.0.py", 
line 1
500: Internal Server Error
   ^
SyntaxError: invalid syntax{noformat}

i went to PyCQA's repo on github and i'm seeing a LOT of 500 errors.  this is 
out of scope of this ticket, and actually not a localized (to our jenkins) 
issue, so i will notify dev@ and mark this as resolved.

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890275#comment-16890275
 ] 

shane knapp commented on SPARK-28457:
-

ok, curl was unhappy w/the old the cacert.pem, so i updated to the latest from 
[https://curl.haxx.se/ca/cacert.pem] and things look to be better, tho the lint 
job is failing.

once i get that sorted i will mark this as resolved.

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28474) Lower JDBC client version cannot read binary type

2019-07-22 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28474:

Summary: Lower JDBC client version cannot read binary type  (was: Hive 
0.12's JDBC client cannot read binary type)

> Lower JDBC client version cannot read binary type
> -
>
> Key: SPARK-28474
> URL: https://issues.apache.org/jira/browse/SPARK-28474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Logs:
> {noformat}
> java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible 
> with java.lang.String
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:770)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy26.fetchResults(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String
>   at 
> org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:148)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   ... 18 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890255#comment-16890255
 ] 

shane knapp commented on SPARK-28457:
-

looking in to it now.

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28474) Hive 0.12's JDBC client cannot read binary type

2019-07-22 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-28474:
---

 Summary: Hive 0.12's JDBC client cannot read binary type
 Key: SPARK-28474
 URL: https://issues.apache.org/jira/browse/SPARK-28474
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


Logs:
{noformat}
java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible with 
java.lang.String
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at 
java.security.AccessController.doPrivileged(AccessController.java:770)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy26.fetchResults(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:819)
Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String
at 
org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198)
at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60)
at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:148)
at 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
... 18 more
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28473) Build command in README should start with ./

2019-07-22 Thread Douglas Colkitt (JIRA)

Douglas Colkitt created SPARK-28473:
---

 Summary: Build command in README should start with ./
 Key: SPARK-28473
 URL: https://issues.apache.org/jira/browse/SPARK-28473
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 2.4.3
Reporter: Douglas Colkitt


In the top-level README, the format of the build command does *not* begin with 
a ./ prefix:

 

    build/mvn -DskipTests clean package

 

All the other commands in the README begin with a ./ prefix, e.g.

 

    ./bin/spark-shell        

 

To be consistent the build command should be changed to match the style of the 
other commands in the README:

 

     ./build/mvn -DskipTests clean package

 

Although the non-prefixed command still works, having the ./ prefix makes it 
clear that the command is dependent on being executed from inside the 
repository as the CWD.

 

It's a minor change, but makes things less confusing for new users.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28280) Convert and port 'group-by.sql' into UDF test base

2019-07-22 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28280.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25098
[https://github.com/apache/spark/pull/25098]

> Convert and port 'group-by.sql' into UDF test base
> --
>
> Key: SPARK-28280
> URL: https://issues.apache.org/jira/browse/SPARK-28280
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28280) Convert and port 'group-by.sql' into UDF test base

2019-07-22 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28280:


Assignee: Stavros Kontopoulos

> Convert and port 'group-by.sql' into UDF test base
> --
>
> Key: SPARK-28280
> URL: https://issues.apache.org/jira/browse/SPARK-28280
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Stavros Kontopoulos
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-28451) substr returns different values

2019-07-22 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890099#comment-16890099
 ] 

Hyukjin Kwon edited comment on SPARK-28451 at 7/22/19 11:32 AM:


Personally I don't think it's worth but let's see what other committers think 
like.


was (Author: hyukjin.kwon):
Personally I don't think it'w worth but let's see what other committers like.

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28451) substr returns different values

2019-07-22 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890099#comment-16890099
 ] 

Hyukjin Kwon commented on SPARK-28451:
--

Personally I don't think it'w worth but let's see what other committers like.

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28451) substr returns different values

2019-07-22 Thread Shivu Sondur (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890084#comment-16890084
 ] 

Shivu Sondur commented on SPARK-28451:
--

[~hyukjin.kwon], [~dongjoon]

Here one more  postgres compatible issue, Is it required to handle?

Aftter checking i found following
> Spark behavior is same as *Oracle*,*mysql*
> and *MS Sql* behavior is same as *PostgreSQL*

I think we should have global settings like postgresql_Flavor, sql_Flavor 
parameter, if it is set corresponding flavor, all the function should behave 
accordingly to the database flavor.

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22213) Spark to detect slow executors on nodes with problematic hardware

2019-07-22 Thread Yuri Ronin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890081#comment-16890081
 ] 

Yuri Ronin commented on SPARK-22213:


[~hyukjin.kwon] thanks

> Spark to detect slow executors on nodes with problematic hardware
> -
>
> Key: SPARK-22213
> URL: https://issues.apache.org/jira/browse/SPARK-22213
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.0.0
> Environment: - AWS EMR clusters 
> - window time is 60s
> - several millions of events processed per minute
>Reporter: Oleksandr Konopko
>Priority: Major
>  Labels: bulk-closed
>
> Sometimes when new cluster is created it contains 1-2 slow nodes. When 
> average Task finishes in 5 seconds, it takes up to 50 seconds to finish on 
> slow node. As a result, batch processing time increases for 45s
> In order to avoid that we could use `speculation` feature, but it seems that 
> it can be improved
>  
> - 1st issue with `speculation` is that we do not want to use `speculation` on 
> all tasks, since we have tens of thousands of them during processing of one 
> batch. Spawning extra several thousands would not be resource-efficient. I 
> suggest to create new parameter `spark.speculation.mintime`. This would 
> specify minimal task run time for speculation to be enabled for this task
> - 2nd issue is that even if Spark spawns speculative tasks only for 
> long-running ones (longer than 10s for example), task on slow node still will 
> run for some significant time before it is killed. Which still makes batch 
> processing time bigger than it should be. Solution is to enable 
> `blacklisting` for slow nodes. With speculation and blacklisting combined, 
> only first 1-2 batches would take more time when expected. After faulty node 
> is blacklisted batch processing time is as expected



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28472) Add a test for testing different protocol versions

2019-07-22 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-28472:
---

 Summary: Add a test for testing different protocol versions
 Key: SPARK-28472
 URL: https://issues.apache.org/jira/browse/SPARK-28472
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 3.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22213) Spark to detect slow executors on nodes with problematic hardware

2019-07-22 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890079#comment-16890079
 ] 

Hyukjin Kwon commented on SPARK-22213:
--

It was closed because affected version indicated EOL releases.

> Spark to detect slow executors on nodes with problematic hardware
> -
>
> Key: SPARK-22213
> URL: https://issues.apache.org/jira/browse/SPARK-22213
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.0.0
> Environment: - AWS EMR clusters 
> - window time is 60s
> - several millions of events processed per minute
>Reporter: Oleksandr Konopko
>Priority: Major
>  Labels: bulk-closed
>
> Sometimes when new cluster is created it contains 1-2 slow nodes. When 
> average Task finishes in 5 seconds, it takes up to 50 seconds to finish on 
> slow node. As a result, batch processing time increases for 45s
> In order to avoid that we could use `speculation` feature, but it seems that 
> it can be improved
>  
> - 1st issue with `speculation` is that we do not want to use `speculation` on 
> all tasks, since we have tens of thousands of them during processing of one 
> batch. Spawning extra several thousands would not be resource-efficient. I 
> suggest to create new parameter `spark.speculation.mintime`. This would 
> specify minimal task run time for speculation to be enabled for this task
> - 2nd issue is that even if Spark spawns speculative tasks only for 
> long-running ones (longer than 10s for example), task on slow node still will 
> run for some significant time before it is killed. Which still makes batch 
> processing time bigger than it should be. Solution is to enable 
> `blacklisting` for slow nodes. With speculation and blacklisting combined, 
> only first 1-2 batches would take more time when expected. After faulty node 
> is blacklisted batch processing time is as expected



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22213) Spark to detect slow executors on nodes with problematic hardware

2019-07-22 Thread Yuri Ronin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890048#comment-16890048
 ] 

Yuri Ronin commented on SPARK-22213:


[~hyukjin.kwon], why did you close it? Is it resolved? Can you please provide a 
PR? Thanks

 

> Spark to detect slow executors on nodes with problematic hardware
> -
>
> Key: SPARK-22213
> URL: https://issues.apache.org/jira/browse/SPARK-22213
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.0.0
> Environment: - AWS EMR clusters 
> - window time is 60s
> - several millions of events processed per minute
>Reporter: Oleksandr Konopko
>Priority: Major
>  Labels: bulk-closed
>
> Sometimes when new cluster is created it contains 1-2 slow nodes. When 
> average Task finishes in 5 seconds, it takes up to 50 seconds to finish on 
> slow node. As a result, batch processing time increases for 45s
> In order to avoid that we could use `speculation` feature, but it seems that 
> it can be improved
>  
> - 1st issue with `speculation` is that we do not want to use `speculation` on 
> all tasks, since we have tens of thousands of them during processing of one 
> batch. Spawning extra several thousands would not be resource-efficient. I 
> suggest to create new parameter `spark.speculation.mintime`. This would 
> specify minimal task run time for speculation to be enabled for this task
> - 2nd issue is that even if Spark spawns speculative tasks only for 
> long-running ones (longer than 10s for example), task on slow node still will 
> run for some significant time before it is killed. Which still makes batch 
> processing time bigger than it should be. Solution is to enable 
> `blacklisting` for slow nodes. With speculation and blacklisting combined, 
> only first 1-2 batches would take more time when expected. After faulty node 
> is blacklisted batch processing time is as expected



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28451) substr returns different values

2019-07-22 Thread Shivu Sondur (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890026#comment-16890026
 ] 

Shivu Sondur commented on SPARK-28451:
--

i will check this issue

 

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28471) Formatting dates with negative years

2019-07-22 Thread Shivu Sondur (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890024#comment-16890024
 ] 

Shivu Sondur commented on SPARK-28471:
--

[~maxgekk]

According to your discussion link,  it is not required to change any code for 
this issue right?

> Formatting dates with negative years
> 
>
> Key: SPARK-28471
> URL: https://issues.apache.org/jira/browse/SPARK-28471
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Maxim Gekk
>Priority: Minor
>
> While converting dates with negative years to strings, Spark skips era 
> sub-field by default. That's can confuse users since years from BC era are 
> mirrored to current era. For example:
> {code}
> spark-sql> select make_date(-44, 3, 15);
> 0045-03-15
> {code}
> Even negative years are out of supported range by the DATE type, it would be 
> nice to indicate the era for such dates.
> PostgreSQL outputs the era for such inputs:
> {code}
> # select make_date(-44, 3, 15);
>make_date   
> ---
>  0044-03-15 BC
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28467) Tests failed if there are not enough executors up before running

2019-07-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28467:
--
Component/s: (was: Spark Core)
 Tests

> Tests failed if there are not enough executors up before running
> 
>
> Key: SPARK-28467
> URL: https://issues.apache.org/jira/browse/SPARK-28467
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> We ran unit tests on arm64 instance, and there are tests failed due to the 
> executor can't up under the timeout 1 ms:
> - test driver discovery under local-cluster mode *** FAILED ***
>   java.util.concurrent.TimeoutException: Can't find 1 executors before 1 
> milliseconds elapsed
>   at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$78(SparkContextSuite.scala:753)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$78$adapted(SparkContextSuite.scala:741)
>   at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$77(SparkContextSuite.scala:741)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   
> - test gpu driver resource files and discovery under local-cluster mode *** 
> FAILED ***
>   java.util.concurrent.TimeoutException: Can't find 1 executors before 1 
> milliseconds elapsed
>   at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$80(SparkContextSuite.scala:781)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$80$adapted(SparkContextSuite.scala:761)
>   at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$79(SparkContextSuite.scala:761)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
> And then we increase the timeout to 2(or 3) the tests passed, I found 
> there are other issues about the timeout increasing before, see: 
> https://issues.apache.org/jira/browse/SPARK-7989 and 
> https://issues.apache.org/jira/browse/SPARK-10651 
> I think the timeout doesn't work well, and seems there is no principle of the 
> timeout setting, how can I fix this? Could I increase the timeout for these 
> two tests?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28467) Tests failed if there are not enough executors up before running

2019-07-22 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889995#comment-16889995
 ] 

Dongjoon Hyun commented on SPARK-28467:
---

I tested in `a1.4xlarge` and I cannot reproduce the failure. I'd like to 
recommend to use more powerful machines like `a1.4xlarge` for testing.

> Tests failed if there are not enough executors up before running
> 
>
> Key: SPARK-28467
> URL: https://issues.apache.org/jira/browse/SPARK-28467
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> We ran unit tests on arm64 instance, and there are tests failed due to the 
> executor can't up under the timeout 1 ms:
> - test driver discovery under local-cluster mode *** FAILED ***
>   java.util.concurrent.TimeoutException: Can't find 1 executors before 1 
> milliseconds elapsed
>   at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$78(SparkContextSuite.scala:753)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$78$adapted(SparkContextSuite.scala:741)
>   at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$77(SparkContextSuite.scala:741)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   
> - test gpu driver resource files and discovery under local-cluster mode *** 
> FAILED ***
>   java.util.concurrent.TimeoutException: Can't find 1 executors before 1 
> milliseconds elapsed
>   at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$80(SparkContextSuite.scala:781)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$80$adapted(SparkContextSuite.scala:761)
>   at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
>   at 
> org.apache.spark.SparkContextSuite.$anonfun$new$79(SparkContextSuite.scala:761)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
> And then we increase the timeout to 2(or 3) the tests passed, I found 
> there are other issues about the timeout increasing before, see: 
> https://issues.apache.org/jira/browse/SPARK-7989 and 
> https://issues.apache.org/jira/browse/SPARK-10651 
> I think the timeout doesn't work well, and seems there is no principle of the 
> timeout setting, how can I fix this? Could I increase the timeout for these 
> two tests?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28471) Formatting dates with negative years

2019-07-22 Thread Maxim Gekk (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889981#comment-16889981
 ] 

Maxim Gekk commented on SPARK-28471:


Here is my explanation of difference in years between Spark's and PostgreSQL 
outputs: [https://github.com/apache/spark/pull/25210#discussion_r305609274]

> Formatting dates with negative years
> 
>
> Key: SPARK-28471
> URL: https://issues.apache.org/jira/browse/SPARK-28471
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Maxim Gekk
>Priority: Minor
>
> While converting dates with negative years to strings, Spark skips era 
> sub-field by default. That's can confuse users since years from BC era are 
> mirrored to current era. For example:
> {code}
> spark-sql> select make_date(-44, 3, 15);
> 0045-03-15
> {code}
> Even negative years are out of supported range by the DATE type, it would be 
> nice to indicate the era for such dates.
> PostgreSQL outputs the era for such inputs:
> {code}
> # select make_date(-44, 3, 15);
>make_date   
> ---
>  0044-03-15 BC
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28470) Honor spark.sql.decimalOperations.nullOnOverflow in Cast

2019-07-22 Thread Marco Gaido (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889979#comment-16889979
 ] 

Marco Gaido commented on SPARK-28470:
-

Thanks for checking this Wenchen! I will work on this ASAP. Thanks.

> Honor spark.sql.decimalOperations.nullOnOverflow in Cast
> 
>
> Key: SPARK-28470
> URL: https://issues.apache.org/jira/browse/SPARK-28470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>
> cast long to decimal or decimal to decimal can overflow, we should respect 
> the new config if overflow happens.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28471) Formatting dates with negative years

2019-07-22 Thread Maxim Gekk (JIRA)

Maxim Gekk created SPARK-28471:
--

 Summary: Formatting dates with negative years
 Key: SPARK-28471
 URL: https://issues.apache.org/jira/browse/SPARK-28471
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.3
Reporter: Maxim Gekk


While converting dates with negative years to strings, Spark skips era 
sub-field by default. That's can confuse users since years from BC era are 
mirrored to current era. For example:
{code}
spark-sql> select make_date(-44, 3, 15);
0045-03-15
{code}
Even negative years are out of supported range by the DATE type, it would be 
nice to indicate the era for such dates.

PostgreSQL outputs the era for such inputs:
{code}
# select make_date(-44, 3, 15);
   make_date   
---
 0044-03-15 BC
(1 row)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28470) Honor spark.sql.decimalOperations.nullOnOverflow in Cast

2019-07-22 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-28470:
---

 Summary: Honor spark.sql.decimalOperations.nullOnOverflow in Cast
 Key: SPARK-28470
 URL: https://issues.apache.org/jira/browse/SPARK-28470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28470) Honor spark.sql.decimalOperations.nullOnOverflow in Cast

2019-07-22 Thread Wenchen Fan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-28470:

Description: cast long to decimal or decimal to decimal can overflow, we 
should respect the new config if overflow happens.

> Honor spark.sql.decimalOperations.nullOnOverflow in Cast
> 
>
> Key: SPARK-28470
> URL: https://issues.apache.org/jira/browse/SPARK-28470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>
> cast long to decimal or decimal to decimal can overflow, we should respect 
> the new config if overflow happens.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28470) Honor spark.sql.decimalOperations.nullOnOverflow in Cast

2019-07-22 Thread Wenchen Fan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889975#comment-16889975
 ] 

Wenchen Fan commented on SPARK-28470:
-

cc [~mgaido]

> Honor spark.sql.decimalOperations.nullOnOverflow in Cast
> 
>
> Key: SPARK-28470
> URL: https://issues.apache.org/jira/browse/SPARK-28470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>
> cast long to decimal or decimal to decimal can overflow, we should respect 
> the new config if overflow happens.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

72 matches

Mail list logo