date:20200530

[jira] [Updated] (SPARK-31854) Different results of query execution with wholestage codegen on and off

2020-05-30 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-31854:
-
Component/s: (was: Spark Core)
 SQL

> Different results of query execution with wholestage codegen on and off
> ---
>
> Key: SPARK-31854
> URL: https://issues.apache.org/jira/browse/SPARK-31854
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Pasha Finkeshteyn
>Priority: Major
>
> Preface: I'm creating Kotlin API for spark to take best parts from three 
> worlds — spark scala, spark java and kotlin.
> What is nice — it works in most scenarios.
> But i've hit following cornercase:
> {code:scala}
> withSpark(props = mapOf("spark.sql.codegen.wholeStage" to true)) {
> dsOf(1, null, 2)
> .map { c(it) }
> .debugCodegen()
> .show()
> }
> {code}
> c(it) is creation of unnamed tuple
> It fails with exception
> {code}
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> top level Product or row object
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> …
> {code}
> I know, in Scala it won't work, so I could stop here. But it works in Kotlin 
> if I turn wholestage codegen off!
> Moreover, if we will dig into generated code (when wholestage codegen is on), 
> we'll see that basically flow is following:
> If one of elements in source dataset was null we wil throw NPE no matter what.
> Flow is as follows:
> {code}
> private void serializefromobject_doConsume_0(org.jetbrains.spark.api.Arity1 
> serializefromobject_expr_0_0, boolean serializefromobject_exprIsNull_0_0) 
> throws java.io.IOException {
> serializefromobject_doConsume_0(mapelements_value_1, 
> mapelements_isNull_1);
> mapelements_isNull_1 = mapelements_resultIsNull_0;
> mapelements_resultIsNull_0 = mapelements_exprIsNull_0_0;
> private void mapelements_doConsume_0(java.lang.Integer 
> mapelements_expr_0_0, boolean mapelements_exprIsNull_0_0) throws 
> java.io.IOException {
> mapelements_doConsume_0(deserializetoobject_value_0, 
> deserializetoobject_isNull_0);
> deserializetoobject_resultIsNull_0 = 
> deserializetoobject_exprIsNull_0_0;
> private void 
> deserializetoobject_doConsume_0(InternalRow localtablescan_row_0, int 
> deserializetoobject_expr_0_0, boolean deserializetoobject_exprIsNull_0_0) 
> throws java.io.IOException {
> 
> deserializetoobject_doConsume_0(localtablescan_row_0, localtablescan_value_0, 
> localtablescan_isNull_0);
> boolean localtablescan_isNull_0 = 
> localtablescan_row_0.isNullAt(0);
> mapelements_isNull_1 = true;
> {code}
> You can find generated code in it's original view and slightly simplified and 
> refacored version 
> [here|https://gist.github.com/asm0dey/5c0fa4c985ab999b383d16257b515100]
> I believe that Spark should not behave differently when wholestage codegen is 
> on and off and differences in behavior look like a bug.
> My Spark version is 3.0.0-preview2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31854) Different results of query execution with wholestage codegen on and off

2020-05-30 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-31854:
-
Affects Version/s: 2.4.5

> Different results of query execution with wholestage codegen on and off
> ---
>
> Key: SPARK-31854
> URL: https://issues.apache.org/jira/browse/SPARK-31854
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Pasha Finkeshteyn
>Priority: Major
>
> Preface: I'm creating Kotlin API for spark to take best parts from three 
> worlds — spark scala, spark java and kotlin.
> What is nice — it works in most scenarios.
> But i've hit following cornercase:
> {code:scala}
> withSpark(props = mapOf("spark.sql.codegen.wholeStage" to true)) {
> dsOf(1, null, 2)
> .map { c(it) }
> .debugCodegen()
> .show()
> }
> {code}
> c(it) is creation of unnamed tuple
> It fails with exception
> {code}
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> top level Product or row object
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> …
> {code}
> I know, in Scala it won't work, so I could stop here. But it works in Kotlin 
> if I turn wholestage codegen off!
> Moreover, if we will dig into generated code (when wholestage codegen is on), 
> we'll see that basically flow is following:
> If one of elements in source dataset was null we wil throw NPE no matter what.
> Flow is as follows:
> {code}
> private void serializefromobject_doConsume_0(org.jetbrains.spark.api.Arity1 
> serializefromobject_expr_0_0, boolean serializefromobject_exprIsNull_0_0) 
> throws java.io.IOException {
> serializefromobject_doConsume_0(mapelements_value_1, 
> mapelements_isNull_1);
> mapelements_isNull_1 = mapelements_resultIsNull_0;
> mapelements_resultIsNull_0 = mapelements_exprIsNull_0_0;
> private void mapelements_doConsume_0(java.lang.Integer 
> mapelements_expr_0_0, boolean mapelements_exprIsNull_0_0) throws 
> java.io.IOException {
> mapelements_doConsume_0(deserializetoobject_value_0, 
> deserializetoobject_isNull_0);
> deserializetoobject_resultIsNull_0 = 
> deserializetoobject_exprIsNull_0_0;
> private void 
> deserializetoobject_doConsume_0(InternalRow localtablescan_row_0, int 
> deserializetoobject_expr_0_0, boolean deserializetoobject_exprIsNull_0_0) 
> throws java.io.IOException {
> 
> deserializetoobject_doConsume_0(localtablescan_row_0, localtablescan_value_0, 
> localtablescan_isNull_0);
> boolean localtablescan_isNull_0 = 
> localtablescan_row_0.isNullAt(0);
> mapelements_isNull_1 = true;
> {code}
> You can find generated code in it's original view and slightly simplified and 
> refacored version 
> [here|https://gist.github.com/asm0dey/5c0fa4c985ab999b383d16257b515100]
> I believe that Spark should not behave differently when wholestage codegen is 
> on and off and differences in behavior look like a bug.
> My Spark version is 3.0.0-preview2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31854) Different results of query execution with wholestage codegen on and off

2020-05-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31854:


Assignee: (was: Apache Spark)

> Different results of query execution with wholestage codegen on and off
> ---
>
> Key: SPARK-31854
> URL: https://issues.apache.org/jira/browse/SPARK-31854
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Pasha Finkeshteyn
>Priority: Major
>
> Preface: I'm creating Kotlin API for spark to take best parts from three 
> worlds — spark scala, spark java and kotlin.
> What is nice — it works in most scenarios.
> But i've hit following cornercase:
> {code:scala}
> withSpark(props = mapOf("spark.sql.codegen.wholeStage" to true)) {
> dsOf(1, null, 2)
> .map { c(it) }
> .debugCodegen()
> .show()
> }
> {code}
> c(it) is creation of unnamed tuple
> It fails with exception
> {code}
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> top level Product or row object
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> …
> {code}
> I know, in Scala it won't work, so I could stop here. But it works in Kotlin 
> if I turn wholestage codegen off!
> Moreover, if we will dig into generated code (when wholestage codegen is on), 
> we'll see that basically flow is following:
> If one of elements in source dataset was null we wil throw NPE no matter what.
> Flow is as follows:
> {code}
> private void serializefromobject_doConsume_0(org.jetbrains.spark.api.Arity1 
> serializefromobject_expr_0_0, boolean serializefromobject_exprIsNull_0_0) 
> throws java.io.IOException {
> serializefromobject_doConsume_0(mapelements_value_1, 
> mapelements_isNull_1);
> mapelements_isNull_1 = mapelements_resultIsNull_0;
> mapelements_resultIsNull_0 = mapelements_exprIsNull_0_0;
> private void mapelements_doConsume_0(java.lang.Integer 
> mapelements_expr_0_0, boolean mapelements_exprIsNull_0_0) throws 
> java.io.IOException {
> mapelements_doConsume_0(deserializetoobject_value_0, 
> deserializetoobject_isNull_0);
> deserializetoobject_resultIsNull_0 = 
> deserializetoobject_exprIsNull_0_0;
> private void 
> deserializetoobject_doConsume_0(InternalRow localtablescan_row_0, int 
> deserializetoobject_expr_0_0, boolean deserializetoobject_exprIsNull_0_0) 
> throws java.io.IOException {
> 
> deserializetoobject_doConsume_0(localtablescan_row_0, localtablescan_value_0, 
> localtablescan_isNull_0);
> boolean localtablescan_isNull_0 = 
> localtablescan_row_0.isNullAt(0);
> mapelements_isNull_1 = true;
> {code}
> You can find generated code in it's original view and slightly simplified and 
> refacored version 
> [here|https://gist.github.com/asm0dey/5c0fa4c985ab999b383d16257b515100]
> I believe that Spark should not behave differently when wholestage codegen is 
> on and off and differences in behavior look like a bug.
> My Spark version is 3.0.0-preview2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31854) Different results of query execution with wholestage codegen on and off

2020-05-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31854:


Assignee: Apache Spark

> Different results of query execution with wholestage codegen on and off
> ---
>
> Key: SPARK-31854
> URL: https://issues.apache.org/jira/browse/SPARK-31854
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Pasha Finkeshteyn
>Assignee: Apache Spark
>Priority: Major
>
> Preface: I'm creating Kotlin API for spark to take best parts from three 
> worlds — spark scala, spark java and kotlin.
> What is nice — it works in most scenarios.
> But i've hit following cornercase:
> {code:scala}
> withSpark(props = mapOf("spark.sql.codegen.wholeStage" to true)) {
> dsOf(1, null, 2)
> .map { c(it) }
> .debugCodegen()
> .show()
> }
> {code}
> c(it) is creation of unnamed tuple
> It fails with exception
> {code}
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> top level Product or row object
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> …
> {code}
> I know, in Scala it won't work, so I could stop here. But it works in Kotlin 
> if I turn wholestage codegen off!
> Moreover, if we will dig into generated code (when wholestage codegen is on), 
> we'll see that basically flow is following:
> If one of elements in source dataset was null we wil throw NPE no matter what.
> Flow is as follows:
> {code}
> private void serializefromobject_doConsume_0(org.jetbrains.spark.api.Arity1 
> serializefromobject_expr_0_0, boolean serializefromobject_exprIsNull_0_0) 
> throws java.io.IOException {
> serializefromobject_doConsume_0(mapelements_value_1, 
> mapelements_isNull_1);
> mapelements_isNull_1 = mapelements_resultIsNull_0;
> mapelements_resultIsNull_0 = mapelements_exprIsNull_0_0;
> private void mapelements_doConsume_0(java.lang.Integer 
> mapelements_expr_0_0, boolean mapelements_exprIsNull_0_0) throws 
> java.io.IOException {
> mapelements_doConsume_0(deserializetoobject_value_0, 
> deserializetoobject_isNull_0);
> deserializetoobject_resultIsNull_0 = 
> deserializetoobject_exprIsNull_0_0;
> private void 
> deserializetoobject_doConsume_0(InternalRow localtablescan_row_0, int 
> deserializetoobject_expr_0_0, boolean deserializetoobject_exprIsNull_0_0) 
> throws java.io.IOException {
> 
> deserializetoobject_doConsume_0(localtablescan_row_0, localtablescan_value_0, 
> localtablescan_isNull_0);
> boolean localtablescan_isNull_0 = 
> localtablescan_row_0.isNullAt(0);
> mapelements_isNull_1 = true;
> {code}
> You can find generated code in it's original view and slightly simplified and 
> refacored version 
> [here|https://gist.github.com/asm0dey/5c0fa4c985ab999b383d16257b515100]
> I believe that Spark should not behave differently when wholestage codegen is 
> on and off and differences in behavior look like a bug.
> My Spark version is 3.0.0-preview2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31854) Different results of query execution with wholestage codegen on and off

2020-05-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120413#comment-17120413
 ] 

Apache Spark commented on SPARK-31854:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/28681

> Different results of query execution with wholestage codegen on and off
> ---
>
> Key: SPARK-31854
> URL: https://issues.apache.org/jira/browse/SPARK-31854
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Pasha Finkeshteyn
>Priority: Major
>
> Preface: I'm creating Kotlin API for spark to take best parts from three 
> worlds — spark scala, spark java and kotlin.
> What is nice — it works in most scenarios.
> But i've hit following cornercase:
> {code:scala}
> withSpark(props = mapOf("spark.sql.codegen.wholeStage" to true)) {
> dsOf(1, null, 2)
> .map { c(it) }
> .debugCodegen()
> .show()
> }
> {code}
> c(it) is creation of unnamed tuple
> It fails with exception
> {code}
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> top level Product or row object
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> …
> {code}
> I know, in Scala it won't work, so I could stop here. But it works in Kotlin 
> if I turn wholestage codegen off!
> Moreover, if we will dig into generated code (when wholestage codegen is on), 
> we'll see that basically flow is following:
> If one of elements in source dataset was null we wil throw NPE no matter what.
> Flow is as follows:
> {code}
> private void serializefromobject_doConsume_0(org.jetbrains.spark.api.Arity1 
> serializefromobject_expr_0_0, boolean serializefromobject_exprIsNull_0_0) 
> throws java.io.IOException {
> serializefromobject_doConsume_0(mapelements_value_1, 
> mapelements_isNull_1);
> mapelements_isNull_1 = mapelements_resultIsNull_0;
> mapelements_resultIsNull_0 = mapelements_exprIsNull_0_0;
> private void mapelements_doConsume_0(java.lang.Integer 
> mapelements_expr_0_0, boolean mapelements_exprIsNull_0_0) throws 
> java.io.IOException {
> mapelements_doConsume_0(deserializetoobject_value_0, 
> deserializetoobject_isNull_0);
> deserializetoobject_resultIsNull_0 = 
> deserializetoobject_exprIsNull_0_0;
> private void 
> deserializetoobject_doConsume_0(InternalRow localtablescan_row_0, int 
> deserializetoobject_expr_0_0, boolean deserializetoobject_exprIsNull_0_0) 
> throws java.io.IOException {
> 
> deserializetoobject_doConsume_0(localtablescan_row_0, localtablescan_value_0, 
> localtablescan_isNull_0);
> boolean localtablescan_isNull_0 = 
> localtablescan_row_0.isNullAt(0);
> mapelements_isNull_1 = true;
> {code}
> You can find generated code in it's original view and slightly simplified and 
> refacored version 
> [here|https://gist.github.com/asm0dey/5c0fa4c985ab999b383d16257b515100]
> I believe that Spark should not behave differently when wholestage codegen is 
> on and off and differences in behavior look like a bug.
> My Spark version is 3.0.0-preview2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31854) Different results of query execution with wholestage codegen on and off

2020-05-30 Thread Takeshi Yamamuro (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120412#comment-17120412
 ] 

Takeshi Yamamuro commented on SPARK-31854:
--

Thanks for your report. Yea. that should be a bug of the whole-stage codegen as 
you said.

> Different results of query execution with wholestage codegen on and off
> ---
>
> Key: SPARK-31854
> URL: https://issues.apache.org/jira/browse/SPARK-31854
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Pasha Finkeshteyn
>Priority: Major
>
> Preface: I'm creating Kotlin API for spark to take best parts from three 
> worlds — spark scala, spark java and kotlin.
> What is nice — it works in most scenarios.
> But i've hit following cornercase:
> {code:scala}
> withSpark(props = mapOf("spark.sql.codegen.wholeStage" to true)) {
> dsOf(1, null, 2)
> .map { c(it) }
> .debugCodegen()
> .show()
> }
> {code}
> c(it) is creation of unnamed tuple
> It fails with exception
> {code}
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> top level Product or row object
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> …
> {code}
> I know, in Scala it won't work, so I could stop here. But it works in Kotlin 
> if I turn wholestage codegen off!
> Moreover, if we will dig into generated code (when wholestage codegen is on), 
> we'll see that basically flow is following:
> If one of elements in source dataset was null we wil throw NPE no matter what.
> Flow is as follows:
> {code}
> private void serializefromobject_doConsume_0(org.jetbrains.spark.api.Arity1 
> serializefromobject_expr_0_0, boolean serializefromobject_exprIsNull_0_0) 
> throws java.io.IOException {
> serializefromobject_doConsume_0(mapelements_value_1, 
> mapelements_isNull_1);
> mapelements_isNull_1 = mapelements_resultIsNull_0;
> mapelements_resultIsNull_0 = mapelements_exprIsNull_0_0;
> private void mapelements_doConsume_0(java.lang.Integer 
> mapelements_expr_0_0, boolean mapelements_exprIsNull_0_0) throws 
> java.io.IOException {
> mapelements_doConsume_0(deserializetoobject_value_0, 
> deserializetoobject_isNull_0);
> deserializetoobject_resultIsNull_0 = 
> deserializetoobject_exprIsNull_0_0;
> private void 
> deserializetoobject_doConsume_0(InternalRow localtablescan_row_0, int 
> deserializetoobject_expr_0_0, boolean deserializetoobject_exprIsNull_0_0) 
> throws java.io.IOException {
> 
> deserializetoobject_doConsume_0(localtablescan_row_0, localtablescan_value_0, 
> localtablescan_isNull_0);
> boolean localtablescan_isNull_0 = 
> localtablescan_row_0.isNullAt(0);
> mapelements_isNull_1 = true;
> {code}
> You can find generated code in it's original view and slightly simplified and 
> refacored version 
> [here|https://gist.github.com/asm0dey/5c0fa4c985ab999b383d16257b515100]
> I believe that Spark should not behave differently when wholestage codegen is 
> on and off and differences in behavior look like a bug.
> My Spark version is 3.0.0-preview2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31836) input_file_name() gives wrong value following Python UDF usage

2020-05-30 Thread Adam Binford (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120375#comment-17120375
 ] 

Adam Binford commented on SPARK-31836:
--

Confirmed also an issue on 2.4.5. Also I could recreate with just two files 
without streaming, using 
{code:java}
spark.sql.files.openCostInBytes 0{code}
to make sure both files ended up on a single partition. The behavior seems to 
be after a python UDF, all rows in a partition have the input_file_name of the 
last row in the partition. But that's an assumption based on a tiny test. Doing
{code:java}
df = (df
.withColumn('before', input_file_name())
.withColumn('during', udf(lambda x: x)(input_file_name()))
.withColumn('after', input_file_name())
)
{code}
before and during are correct, where after is incorrect (all are the last file 
in the partition)

> input_file_name() gives wrong value following Python UDF usage
> --
>
> Key: SPARK-31836
> URL: https://issues.apache.org/jira/browse/SPARK-31836
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Wesley Hildebrandt
>Priority: Major
>
> I'm using PySpark for Spark 3.0.0 RC1 with Python 3.6.8.
> The following commands demonstrate that the input_file_name() function 
> sometimes returns the wrong filename following usage of a Python UDF:
> $ for i in `seq 5`; do echo $i > /tmp/test-file-$i; done
> $ pyspark
> >>> import pyspark.sql.functions as F
> >>> spark.readStream.text('file:///tmp/test-file-*', 
> >>> wholetext=True).withColumn('file1', 
> >>> F.input_file_name()).withColumn('udf', F.udf(lambda 
> >>> x:x)('value')).withColumn('file2', 
> >>> F.input_file_name()).writeStream.trigger(once=True).foreachBatch(lambda 
> >>> df,_: df.select('file1','file2').show(truncate=False, 
> >>> vertical=True)).start().awaitTermination()
> A few notes about this bug:
>  * It happens with many different files, so it's not related to the file 
> contents
>  * It also happens loading files from HDFS, so storage location is not a 
> factor
>  * It also happens using .csv() to read the files instead of .text(), so 
> input format is not a factor
>  * I have not been able to cause the error without using readStream, so it 
> seems to be related to streaming
>  * The bug also happens using spark-submit to send a job to my cluster
>  * I haven't tested an older version, but it's possible that Spark pulls 
> 24958 and 25321([https://github.com/apache/spark/pull/24958], 
> [https://github.com/apache/spark/pull/25321]) to fix issue 28153 
> (https://issues.apache.org/jira/browse/SPARK-28153) introduced this bug?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31874) Use `FastDateFormat` as the legacy fractional formatter

2020-05-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31874:


Assignee: Apache Spark

> Use `FastDateFormat` as the legacy fractional formatter
> ---
>
> Key: SPARK-31874
> URL: https://issues.apache.org/jira/browse/SPARK-31874
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> By default {{HiveResult}}.{{hiveResultString}} retrieves timestamps values as 
> instances of {{java.sql.Timestamp}}, and uses the legacy parser 
> {{SimpleDateFormat}} to convert the timestamps to strings. After the fix 
> [#28024|https://github.com/apache/spark/pull/28024], the fractional formatter 
> and its companion - legacy formatter {{SimpleDateFormat}} are created per 
> every value. By switching from {{LegacySimpleTimestampFormatter}} to 
> {{LegacyFastTimestampFormatter}}, we can utilize the internal cache of 
> {{FastDateFormat}}, and avoid parsing the default pattern {{-MM-dd 
> HH:mm:ss}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31874) Use `FastDateFormat` as the legacy fractional formatter

2020-05-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31874:


Assignee: (was: Apache Spark)

> Use `FastDateFormat` as the legacy fractional formatter
> ---
>
> Key: SPARK-31874
> URL: https://issues.apache.org/jira/browse/SPARK-31874
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> By default {{HiveResult}}.{{hiveResultString}} retrieves timestamps values as 
> instances of {{java.sql.Timestamp}}, and uses the legacy parser 
> {{SimpleDateFormat}} to convert the timestamps to strings. After the fix 
> [#28024|https://github.com/apache/spark/pull/28024], the fractional formatter 
> and its companion - legacy formatter {{SimpleDateFormat}} are created per 
> every value. By switching from {{LegacySimpleTimestampFormatter}} to 
> {{LegacyFastTimestampFormatter}}, we can utilize the internal cache of 
> {{FastDateFormat}}, and avoid parsing the default pattern {{-MM-dd 
> HH:mm:ss}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31874) Use `FastDateFormat` as the legacy fractional formatter

2020-05-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120368#comment-17120368
 ] 

Apache Spark commented on SPARK-31874:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28678

> Use `FastDateFormat` as the legacy fractional formatter
> ---
>
> Key: SPARK-31874
> URL: https://issues.apache.org/jira/browse/SPARK-31874
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> By default {{HiveResult}}.{{hiveResultString}} retrieves timestamps values as 
> instances of {{java.sql.Timestamp}}, and uses the legacy parser 
> {{SimpleDateFormat}} to convert the timestamps to strings. After the fix 
> [#28024|https://github.com/apache/spark/pull/28024], the fractional formatter 
> and its companion - legacy formatter {{SimpleDateFormat}} are created per 
> every value. By switching from {{LegacySimpleTimestampFormatter}} to 
> {{LegacyFastTimestampFormatter}}, we can utilize the internal cache of 
> {{FastDateFormat}}, and avoid parsing the default pattern {{-MM-dd 
> HH:mm:ss}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31874) Use `FastDateFormat` as the legacy fractional formatter

2020-05-30 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31874:
--

 Summary: Use `FastDateFormat` as the legacy fractional formatter
 Key: SPARK-31874
 URL: https://issues.apache.org/jira/browse/SPARK-31874
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


By default {{HiveResult}}.{{hiveResultString}} retrieves timestamps values as 
instances of {{java.sql.Timestamp}}, and uses the legacy parser 
{{SimpleDateFormat}} to convert the timestamps to strings. After the fix 
[#28024|https://github.com/apache/spark/pull/28024], the fractional formatter 
and its companion - legacy formatter {{SimpleDateFormat}} are created per every 
value. By switching from {{LegacySimpleTimestampFormatter}} to 
{{LegacyFastTimestampFormatter}}, we can utilize the internal cache of 
{{FastDateFormat}}, and avoid parsing the default pattern {{-MM-dd 
HH:mm:ss}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31866) Add partitioning hints in SQL reference

2020-05-30 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31866.
--
Fix Version/s: 3.0.0
 Assignee: Huaxin Gao  (was: Apache Spark)
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28672

> Add partitioning hints in SQL reference
> ---
>
> Key: SPARK-31866
> URL: https://issues.apache.org/jira/browse/SPARK-31866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>
> Add partitioning hints Coalesce/Repartition/Repartition_By_Range hints in SQL 
> reference



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31866) Add partitioning hints in SQL reference

2020-05-30 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31866:
-
Priority: Minor  (was: Major)

> Add partitioning hints in SQL reference
> ---
>
> Key: SPARK-31866
> URL: https://issues.apache.org/jira/browse/SPARK-31866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.0.0
>
>
> Add partitioning hints Coalesce/Repartition/Repartition_By_Range hints in SQL 
> reference



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31873) Spark Sql Function year does not extract year from date/timestamp

2020-05-30 Thread Rakesh Raushan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120335#comment-17120335
 ] 

Rakesh Raushan commented on SPARK-31873:


Yeah. This is a problem with 2.4.5. 
{code:java}
scala> val df = Seq(("1300-01-03 00:00:00") 
).toDF("date_val").withColumn("date_val_ts", 
to_timestamp(col("date_val"))).withColumn("year_val", 
year(to_timestamp(col("date_val"
df: org.apache.spark.sql.DataFrame = [date_val: string, date_val_ts: timestamp 
... 1 more field]
scala> df.show
+---+---++
| date_val  | date_val_ts.  |year_val|
+---+---++
|1300-01-03 00:00:00|1300-01-03 00:00:00| 1299   |
+---+---++
{code}
[~hyukjin.kwon] Do this need to get fixed in 2.4.5. If so, I can check this.

> Spark Sql Function year does not extract year from date/timestamp
> -
>
> Key: SPARK-31873
> URL: https://issues.apache.org/jira/browse/SPARK-31873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Deepak Shingavi
>Priority: Major
>
> There is a Spark SQL function
> org.apache.spark.sql.functions.year which fails in below case
>  
> {code:java}
> // Code to extract year from Timestamp
> val df = Seq(
>   ("1300-01-03 00:00:00")
> ).toDF("date_val")
>   .withColumn("date_val_ts", to_timestamp(col("date_val")))
>   .withColumn("year_val", year(to_timestamp(col("date_val"
> df.show()
> //Output of the above code
> +---+---++
> |   date_val|date_val_ts|year_val|
> +---+---++
> |1300-01-03 00:00:00|1300-01-03 00:00:00|1299|
> +---+---++
> {code}
>  
> The above code works perfectly for all the years greater than 1300
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31873) Spark Sql Function year does not extract year from date/timestamp

2020-05-30 Thread Deepak Shingavi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120310#comment-17120310
 ] 

Deepak Shingavi commented on SPARK-31873:
-

have you tested it on 2.4.5 ? [~rakson]

> Spark Sql Function year does not extract year from date/timestamp
> -
>
> Key: SPARK-31873
> URL: https://issues.apache.org/jira/browse/SPARK-31873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Deepak Shingavi
>Priority: Major
>
> There is a Spark SQL function
> org.apache.spark.sql.functions.year which fails in below case
>  
> {code:java}
> // Code to extract year from Timestamp
> val df = Seq(
>   ("1300-01-03 00:00:00")
> ).toDF("date_val")
>   .withColumn("date_val_ts", to_timestamp(col("date_val")))
>   .withColumn("year_val", year(to_timestamp(col("date_val"
> df.show()
> //Output of the above code
> +---+---++
> |   date_val|date_val_ts|year_val|
> +---+---++
> |1300-01-03 00:00:00|1300-01-03 00:00:00|1299|
> +---+---++
> {code}
>  
> The above code works perfectly for all the years greater than 1300
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31873) Spark Sql Function year does not extract year from date/timestamp

2020-05-30 Thread Rakesh Raushan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120301#comment-17120301
 ] 

Rakesh Raushan edited comment on SPARK-31873 at 5/30/20, 4:28 PM:
--

{code:java}
scala> val df = Seq(("1300-01-03 00:00:00") 
).toDF("date_val").withColumn("date_val_ts", 
to_timestamp(col("date_val"))).withColumn("year_val", 
year(to_timestamp(col("date_val"
df: org.apache.spark.sql.DataFrame = [date_val: string, date_val_ts: timestamp 
... 1 more field]

scala> df.show
+---+---++
| date_val  | date_val_ts.  |year_val|
+---+---++
|1300-01-03 00:00:00|1300-01-03 00:00:00| 1300   |
+---+---++

{code}
 

This works fine with master branch.


was (Author: rakson):
scala> val df = Seq(("1300-01-03 00:00:00") 
).toDF("date_val").withColumn("date_val_ts", 
to_timestamp(col("date_val"))).withColumn("year_val", 
year(to_timestamp(col("date_val"
df: org.apache.spark.sql.DataFrame = [date_val: string, date_val_ts: timestamp 
... 1 more field]

scala> df.show
+---+---++
| date_val| date_val_ts|year_val|
+---+---++
|1300-01-03 00:00:00|1300-01-03 00:00:00| 1300|
+---+---++

 

This works fine with master branch.

> Spark Sql Function year does not extract year from date/timestamp
> -
>
> Key: SPARK-31873
> URL: https://issues.apache.org/jira/browse/SPARK-31873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Deepak Shingavi
>Priority: Major
>
> There is a Spark SQL function
> org.apache.spark.sql.functions.year which fails in below case
>  
> {code:java}
> // Code to extract year from Timestamp
> val df = Seq(
>   ("1300-01-03 00:00:00")
> ).toDF("date_val")
>   .withColumn("date_val_ts", to_timestamp(col("date_val")))
>   .withColumn("year_val", year(to_timestamp(col("date_val"
> df.show()
> //Output of the above code
> +---+---++
> |   date_val|date_val_ts|year_val|
> +---+---++
> |1300-01-03 00:00:00|1300-01-03 00:00:00|1299|
> +---+---++
> {code}
>  
> The above code works perfectly for all the years greater than 1300
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31873) Spark Sql Function year does not extract year from date/timestamp

2020-05-30 Thread Rakesh Raushan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120301#comment-17120301
 ] 

Rakesh Raushan commented on SPARK-31873:


scala> val df = Seq(("1300-01-03 00:00:00") 
).toDF("date_val").withColumn("date_val_ts", 
to_timestamp(col("date_val"))).withColumn("year_val", 
year(to_timestamp(col("date_val"
df: org.apache.spark.sql.DataFrame = [date_val: string, date_val_ts: timestamp 
... 1 more field]

scala> df.show
+---+---++
| date_val| date_val_ts|year_val|
+---+---++
|1300-01-03 00:00:00|1300-01-03 00:00:00| 1300|
+---+---++

 

This works fine with master branch.

> Spark Sql Function year does not extract year from date/timestamp
> -
>
> Key: SPARK-31873
> URL: https://issues.apache.org/jira/browse/SPARK-31873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Deepak Shingavi
>Priority: Major
>
> There is a Spark SQL function
> org.apache.spark.sql.functions.year which fails in below case
>  
> {code:java}
> // Code to extract year from Timestamp
> val df = Seq(
>   ("1300-01-03 00:00:00")
> ).toDF("date_val")
>   .withColumn("date_val_ts", to_timestamp(col("date_val")))
>   .withColumn("year_val", year(to_timestamp(col("date_val"
> df.show()
> //Output of the above code
> +---+---++
> |   date_val|date_val_ts|year_val|
> +---+---++
> |1300-01-03 00:00:00|1300-01-03 00:00:00|1299|
> +---+---++
> {code}
>  
> The above code works perfectly for all the years greater than 1300
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31873) Spark Sql Function year does not extract year from date/timestamp

2020-05-30 Thread Deepak Shingavi (Jira)

Deepak Shingavi created SPARK-31873:
---

 Summary: Spark Sql Function year does not extract year from 
date/timestamp
 Key: SPARK-31873
 URL: https://issues.apache.org/jira/browse/SPARK-31873
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5
Reporter: Deepak Shingavi


There is a Spark SQL function

org.apache.spark.sql.functions.year which fails in below case


 
{code:java}
// Code to extract year from Timestamp
val df = Seq(
  ("1300-01-03 00:00:00")
).toDF("date_val")
  .withColumn("date_val_ts", to_timestamp(col("date_val")))
  .withColumn("year_val", year(to_timestamp(col("date_val"

df.show()
//Output of the above code
+---+---++
|   date_val|date_val_ts|year_val|
+---+---++
|1300-01-03 00:00:00|1300-01-03 00:00:00|1299|
+---+---++

{code}
 

The above code works perfectly for all the years greater than 1300

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31872) NotNullSafe to get complementary set

2020-05-30 Thread Xiaoju Wu (Jira)

Xiaoju Wu created SPARK-31872:
-

 Summary: NotNullSafe to get complementary set
 Key: SPARK-31872
 URL: https://issues.apache.org/jira/browse/SPARK-31872
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0, 2.3.0, 3.0.0
Reporter: Xiaoju Wu


If we have a filter expression to get subset of rows, and then we want the 
complementary set, Not(expression) cannot work, since Not is NullIntolerent, if 
expression.eval(row) is null, filter predicate is false too for 
Not(expression), "row" will not appear in both subset and complementary set.

So, maybe we need a NotNullSafe implementation to get the complementary set 
which will result in true if the expression.eval(row) is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31871) Display the canvas element icon for sorting column

2020-05-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120219#comment-17120219
 ] 

Apache Spark commented on SPARK-31871:
--

User 'liucht-inspur' has created a pull request for this issue:
https://github.com/apache/spark/pull/28680

> Display the canvas element icon for sorting column
> --
>
> Key: SPARK-31871
> URL: https://issues.apache.org/jira/browse/SPARK-31871
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 2.4.3, 2.4.4, 2.4.5
>Reporter: liucht-inspur
>Priority: Minor
>
> In the history Server page and Executor page, due to the wrong canvas element 
> image path,
> The sorting icon cannot be displayed when the sequence is clicked. In order 
> to improve the user experience, the error path code is modified



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31871) Display the canvas element icon for sorting column

2020-05-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31871:


Assignee: (was: Apache Spark)

> Display the canvas element icon for sorting column
> --
>
> Key: SPARK-31871
> URL: https://issues.apache.org/jira/browse/SPARK-31871
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 2.4.3, 2.4.4, 2.4.5
>Reporter: liucht-inspur
>Priority: Minor
>
> In the history Server page and Executor page, due to the wrong canvas element 
> image path,
> The sorting icon cannot be displayed when the sequence is clicked. In order 
> to improve the user experience, the error path code is modified



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31871) Display the canvas element icon for sorting column

2020-05-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120218#comment-17120218
 ] 

Apache Spark commented on SPARK-31871:
--

User 'liucht-inspur' has created a pull request for this issue:
https://github.com/apache/spark/pull/28680

> Display the canvas element icon for sorting column
> --
>
> Key: SPARK-31871
> URL: https://issues.apache.org/jira/browse/SPARK-31871
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 2.4.3, 2.4.4, 2.4.5
>Reporter: liucht-inspur
>Priority: Minor
>
> In the history Server page and Executor page, due to the wrong canvas element 
> image path,
> The sorting icon cannot be displayed when the sequence is clicked. In order 
> to improve the user experience, the error path code is modified



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31871) Display the canvas element icon for sorting column

2020-05-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31871:


Assignee: Apache Spark

> Display the canvas element icon for sorting column
> --
>
> Key: SPARK-31871
> URL: https://issues.apache.org/jira/browse/SPARK-31871
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 2.4.3, 2.4.4, 2.4.5
>Reporter: liucht-inspur
>Assignee: Apache Spark
>Priority: Minor
>
> In the history Server page and Executor page, due to the wrong canvas element 
> image path,
> The sorting icon cannot be displayed when the sequence is clicked. In order 
> to improve the user experience, the error path code is modified



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31871) Display the canvas element icon for sorting column

2020-05-30 Thread liucht-inspur (Jira)

liucht-inspur created SPARK-31871:
-

 Summary: Display the canvas element icon for sorting column
 Key: SPARK-31871
 URL: https://issues.apache.org/jira/browse/SPARK-31871
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Web UI
Affects Versions: 2.4.5, 2.4.4, 2.4.3
Reporter: liucht-inspur


In the history Server page and Executor page, due to the wrong canvas element 
image path,

The sorting icon cannot be displayed when the sequence is clicked. In order to 
improve the user experience, the error path code is modified



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31870) AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional shuffle" test has no skew join

2020-05-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120205#comment-17120205
 ] 

Apache Spark commented on SPARK-31870:
--

User 'manuzhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/28679

> AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional 
> shuffle" test has no skew join
> -
>
> Key: SPARK-31870
> URL: https://issues.apache.org/jira/browse/SPARK-31870
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Manu Zhang
>Priority: Minor
>
> Due to incorrect configurations of 
> - spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes
> - spark.sql.adaptive.advisoryPartitionSizeInBytes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31870) AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional shuffle" test has no skew join

2020-05-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31870:


Assignee: (was: Apache Spark)

> AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional 
> shuffle" test has no skew join
> -
>
> Key: SPARK-31870
> URL: https://issues.apache.org/jira/browse/SPARK-31870
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Manu Zhang
>Priority: Minor
>
> Due to incorrect configurations of 
> - spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes
> - spark.sql.adaptive.advisoryPartitionSizeInBytes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31870) AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional shuffle" test has no skew join

2020-05-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31870:


Assignee: Apache Spark

> AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional 
> shuffle" test has no skew join
> -
>
> Key: SPARK-31870
> URL: https://issues.apache.org/jira/browse/SPARK-31870
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Manu Zhang
>Assignee: Apache Spark
>Priority: Minor
>
> Due to incorrect configurations of 
> - spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes
> - spark.sql.adaptive.advisoryPartitionSizeInBytes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31870) AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional shuffle" test has no skew join

2020-05-30 Thread Manu Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated SPARK-31870:
---
Summary: AdaptiveQueryExecSuite: "Do not optimize skew join if introduce 
additional shuffle" test has no skew join  (was: AdaptiveQueryExecSuite: "Do 
not optimize skew join if introduce additional shuffle" test doesn't optimize 
skew join at all)

> AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional 
> shuffle" test has no skew join
> -
>
> Key: SPARK-31870
> URL: https://issues.apache.org/jira/browse/SPARK-31870
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Manu Zhang
>Priority: Minor
>
> Due to incorrect configurations of 
> - spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes
> - spark.sql.adaptive.advisoryPartitionSizeInBytes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31870) AdaptiveQueryExecSuite: "Do not optimize skew join if introduce additional shuffle" test doesn't optimize skew join at all

2020-05-30 Thread Manu Zhang (Jira)

Manu Zhang created SPARK-31870:
--

 Summary: AdaptiveQueryExecSuite: "Do not optimize skew join if 
introduce additional shuffle" test doesn't optimize skew join at all
 Key: SPARK-31870
 URL: https://issues.apache.org/jira/browse/SPARK-31870
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.0.0
Reporter: Manu Zhang


Due to incorrect configurations of 

- spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes

- spark.sql.adaptive.advisoryPartitionSizeInBytes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31864) Adjust AQE skew join trigger condition

2020-05-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31864.
-
Fix Version/s: 3.0.0
 Assignee: Wei Xue
   Resolution: Fixed

> Adjust AQE skew join trigger condition
> --
>
> Key: SPARK-31864
> URL: https://issues.apache.org/jira/browse/SPARK-31864
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wei Xue
>Assignee: Wei Xue
>Priority: Minor
> Fix For: 3.0.0
>
>
> Instead of using the raw partition sizes, we should use coalesced partition 
> sizes to test skew.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31799) Spark Datasource Tables Creating Incorrect Hive Metadata

2020-05-30 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120136#comment-17120136
 ] 

L. C. Hsieh commented on SPARK-31799:
-

This is happened when Spark SQL think it cannot save the data source table in a 
Hive compatible way. So this kind of data source tables should be only readable 
by Spark.

> Spark Datasource Tables Creating Incorrect Hive Metadata
> 
>
> Key: SPARK-31799
> URL: https://issues.apache.org/jira/browse/SPARK-31799
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Anoop Johnson
>Priority: Major
>
> I found that if I create a CSV or JSON table using Spark SQL, it writes the 
> wrong Hive table metadata, breaking compatibility with other query engines 
> like Hive and Presto. Here is a very simple example:
> {code:sql}
> CREATE TABLE test_csv (id String, name String)
> USING csv
>   LOCATION  's3://[...]'
> ;
> {code}
> If you describe the table using Presto, you will see:
> {code:sql}
> CREATE EXTERNAL TABLE `test_csv`(
>   `col` array COMMENT 'from deserializer')
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> WITH SERDEPROPERTIES ( 
>   'path'='s3://[...]') 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.SequenceFileInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
> LOCATION
>   's3://[...]/test_csv-__PLACEHOLDER__'
> TBLPROPERTIES (
>   'spark.sql.create.version'='2.4.4', 
>   'spark.sql.sources.provider'='csv', 
>   'spark.sql.sources.schema.numParts'='1', 
>   
> 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}',
>  
>   'transient_lastDdlTime'='1590196086')
>   ;
> {code}
>  The table location is set to a placeholder value - the schema is always set 
> to _col array_. The serde/inputformat is wrong - it says 
> _SequenceFileInputFormat_ and _LazySimpleSerDe_ even though the requested 
> format is CSV.
> But all the right metadata is written to the custom table properties with 
> prefix _spark.sql_. However, Hive and Presto does not understand these table 
> properties and this breaks them. I could reproduce this with JSON too, but 
> not with Parquet. 
> I root-caused this issue to CSV and JSON tables not handled 
> [here|https://github.com/apache/spark/blob/721cba540292d8d76102b18922dabe2a7d918dc5/sql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala#L31-L66]
>  in HiveSerde.scala. As a result, these default values are written.
> Is there a reason why CSV and JSON are not handled? I could send a patch to 
> fix this, but the caveat is that the CSV and JSON Hive serdes should be in 
> the Spark classpath, otherwise the table creation will fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31863) Thriftserver not setting active SparkSession, SQLConf.get not getting session configs correctly

2020-05-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31863:
---

Assignee: Juliusz Sompolski  (was: Apache Spark)

> Thriftserver not setting active SparkSession, SQLConf.get not getting session 
> configs correctly
> ---
>
> Key: SPARK-31863
> URL: https://issues.apache.org/jira/browse/SPARK-31863
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.0.0
>
>
> Thriftserver is not setting the active SparkSession.
> Because of that, configuration obtained with SQLConf.get is not the session 
> configuration.
> This makes many configs set by "set" in the session not work correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31861) Thriftserver collecting timestamp not using spark.sql.session.timeZone

2020-05-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31861.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28671
[https://github.com/apache/spark/pull/28671]

> Thriftserver collecting timestamp not using spark.sql.session.timeZone
> --
>
> Key: SPARK-31861
> URL: https://issues.apache.org/jira/browse/SPARK-31861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.0.0
>
>
> If JDBC client is in TimeZone PST, and sets spark.sql.session.timeZone to 
> PST, and sends a query "SELECT timestamp '2020-05-20 12:00:00'", and the JVM 
> timezone of the Spark cluster is e.g. CET, then
> - the timestamp literal in the query is interpreted as 12:00:00 PST, i.e. 
> 21:00:00 CET
> - but currently when it's returned, the timestamps are collected from the 
> query with a collect() in 
> https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L299,
>  and then in the end Timestamps are turned into strings using a t.toString() 
> in 
> https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v2.3/src/main/java/org/apache/hive/service/cli/ColumnValue.java#L138
>  This will use the Spark cluster TimeZone. That results in "21:00:00" 
> returned to the JDBC application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31863) Thriftserver not setting active SparkSession, SQLConf.get not getting session configs correctly

2020-05-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31863.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28671
[https://github.com/apache/spark/pull/28671]

> Thriftserver not setting active SparkSession, SQLConf.get not getting session 
> configs correctly
> ---
>
> Key: SPARK-31863
> URL: https://issues.apache.org/jira/browse/SPARK-31863
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.0.0
>
>
> Thriftserver is not setting the active SparkSession.
> Because of that, configuration obtained with SQLConf.get is not the session 
> configuration.
> This makes many configs set by "set" in the session not work correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31861) Thriftserver collecting timestamp not using spark.sql.session.timeZone

2020-05-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31861:
---

Assignee: Juliusz Sompolski

> Thriftserver collecting timestamp not using spark.sql.session.timeZone
> --
>
> Key: SPARK-31861
> URL: https://issues.apache.org/jira/browse/SPARK-31861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>
> If JDBC client is in TimeZone PST, and sets spark.sql.session.timeZone to 
> PST, and sends a query "SELECT timestamp '2020-05-20 12:00:00'", and the JVM 
> timezone of the Spark cluster is e.g. CET, then
> - the timestamp literal in the query is interpreted as 12:00:00 PST, i.e. 
> 21:00:00 CET
> - but currently when it's returned, the timestamps are collected from the 
> query with a collect() in 
> https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L299,
>  and then in the end Timestamps are turned into strings using a t.toString() 
> in 
> https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v2.3/src/main/java/org/apache/hive/service/cli/ColumnValue.java#L138
>  This will use the Spark cluster TimeZone. That results in "21:00:00" 
> returned to the JDBC application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31859) Thriftserver with spark.sql.datetime.java8API.enabled=true

2020-05-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31859.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28671
[https://github.com/apache/spark/pull/28671]

> Thriftserver with spark.sql.datetime.java8API.enabled=true
> --
>
> Key: SPARK-31859
> URL: https://issues.apache.org/jira/browse/SPARK-31859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.0.0
>
>
> {code}
>   test("spark.sql.datetime.java8API.enabled=true") {
> withJdbcStatement() { st =>
>   st.execute("set spark.sql.datetime.java8API.enabled=true")
>   val rs = st.executeQuery("select timestamp '2020-05-28 00:00:00'")
>   rs.next()
>   // scalastyle:off
>   println(rs.getObject(1))
> }
>   }
> {code}
> fails with 
> {code}
> HiveThriftBinaryServerSuite:
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
> at java.sql.Timestamp.valueOf(Timestamp.java:204)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:444)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464
> {code}
> It seems it might be needed in HiveResult.toHiveString?
> cc [~maxgekk]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31859) Thriftserver with spark.sql.datetime.java8API.enabled=true

2020-05-30 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31859:
---

Assignee: Juliusz Sompolski

> Thriftserver with spark.sql.datetime.java8API.enabled=true
> --
>
> Key: SPARK-31859
> URL: https://issues.apache.org/jira/browse/SPARK-31859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>
> {code}
>   test("spark.sql.datetime.java8API.enabled=true") {
> withJdbcStatement() { st =>
>   st.execute("set spark.sql.datetime.java8API.enabled=true")
>   val rs = st.executeQuery("select timestamp '2020-05-28 00:00:00'")
>   rs.next()
>   // scalastyle:off
>   println(rs.getObject(1))
> }
>   }
> {code}
> fails with 
> {code}
> HiveThriftBinaryServerSuite:
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
> at java.sql.Timestamp.valueOf(Timestamp.java:204)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:444)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424)
> at 
> org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464
> {code}
> It seems it might be needed in HiveResult.toHiveString?
> cc [~maxgekk]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

37 matches

Mail list logo