[jira] [Resolved] (SPARK-30881) Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold

2020-02-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-30881.

Resolution: Fixed

The issue is resolved in https://github.com/apache/spark/pull/27639

> Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold
> 
>
> Key: SPARK-30881
> URL: https://issues.apache.org/jira/browse/SPARK-30881
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> The doc of configuration 
> "spark.sql.sources.parallelPartitionDiscovery.threshold" is not accurate on 
> the part "This applies to Parquet, ORC, CSV, JSON and LibSVM data sources".
> We should revise it as effective on all the file-based data sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Jackey Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jackey Lee updated SPARK-30868:
---
Description: 
At present, HiveClientImpl.runHive will not throw an exception when it runs 
incorrectly, which will cause it to fail to feedback error information normally.
Example
{code:scala}
spark.sql("add jar file:///tmp/test.jar")
spark.sql("show databases").show()
{code}
/tmp/test.jar doesn't exist, thus add jar is failed. However this code will run 
completely without causing application failure.

  was:
At present, HiveClientImpl.runHive will not throw an exception when it runs 
incorrectly, which will cause it to fail to feedback error information normally.
Example
{code:scala}
spark.sql("add jar file:///tmp/test.jar").show()
spark.sql("show databases").show()
{code}
/tmp/test.jar doesn't exist, thus add jar is failed. However this code will run 
completely without causing application failure.


> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040795#comment-17040795
 ] 

Ankit Raj Boudh commented on SPARK-30868:
-

no i have not created any jira.

> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040803#comment-17040803
 ] 

Ankit Raj Boudh commented on SPARK-30868:
-

[~srowen] and [~cloud_fan] please give your suggestion, I think this behaviour 
is ok.

> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040803#comment-17040803
 ] 

Ankit Raj Boudh edited comment on SPARK-30868 at 2/20/20 9:43 AM:
--

[~srowen] and [~cloud_fan] please give your suggestion, according to me this 
behaviour is correct.


was (Author: ankitraj):
[~srowen] and [~cloud_fan] please give your suggestion, I think this behaviour 
is ok.

> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040803#comment-17040803
 ] 

Ankit Raj Boudh edited comment on SPARK-30868 at 2/20/20 9:44 AM:
--

[~srowen] and [~cloud_fan] please give your suggestion for this jira, according 
to me current behaviour is correct.


was (Author: ankitraj):
[~srowen] and [~cloud_fan] please give your suggestion, according to me this 
behaviour is correct.

> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Jackey Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040822#comment-17040822
 ] 

Jackey Lee commented on SPARK-30868:


So your opinion is to return normal results even if the statement runs wrong? A 
bit weird, it is really unamiable.

> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30883) Tests that use setWritable,setReadable and setExecutable should be cancel when user is root

2020-02-20 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-30883:
---
Environment: The java api *setWritable,setReadable and setExecutable* 
dosen't work well because root can read / write or execute every files. Maybe, 
we could cancel these tests or fast failure when the mvn test is starting.  
(was: The java api *setWritable,setReadable and setExecutable* dosen't work 
well when the user is root. Maybe, we could cancel these tests or fast failure 
when the mvn test is starting.)

> Tests that use setWritable,setReadable and setExecutable should be cancel 
> when user is root
> ---
>
> Key: SPARK-30883
> URL: https://issues.apache.org/jira/browse/SPARK-30883
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
> Environment: The java api *setWritable,setReadable and setExecutable* 
> dosen't work well because root can read / write or execute every files. 
> Maybe, we could cancel these tests or fast failure when the mvn test is 
> starting.
>Reporter: deshanxiao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040834#comment-17040834
 ] 

Ankit Raj Boudh commented on SPARK-30868:
-

According to me user should handle Exception case,during exception handling 
they can terminate application or can write log and continue further execution. 

> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30878) improve the CREATE TABLE document

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30878.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27638
[https://github.com/apache/spark/pull/27638]

> improve the CREATE TABLE document
> -
>
> Key: SPARK-30878
> URL: https://issues.apache.org/jira/browse/SPARK-30878
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040834#comment-17040834
 ] 

Ankit Raj Boudh edited comment on SPARK-30868 at 2/20/20 10:45 AM:
---

According to me user should handle Exception case,during exception handling 
they can terminate application or can write log and continue further execution.

Anyhow you already raised a PR let them review then it will clear to us. 


was (Author: ankitraj):
According to me user should handle Exception case,during exception handling 
they can terminate application or can write log and continue further execution. 

> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28024) Incorrect numeric values when out of range

2020-02-20 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040848#comment-17040848
 ] 

Wenchen Fan commented on SPARK-28024:
-

these are the behaviors of Java:
{code}
scala> java.lang.Float.valueOf("10e-70")
res0: Float = 0.0
scala> java.lang.StrictMath.exp(-1.2345678901234E200)
res1: Double = 0.0
{code}

Although it's not officially documented, Spark arithmetic follows Java since 
the very beginning. I won't treat them as correctness bug simply because they 
are not ANSI-compliance. You won't report this as correctness bug to JDK, right?

I'd suggest we close this ticket. These behaviors are well defined (follows 
Java). We need to improve our document though. BTW when we complete the ANSI 
mode and turn it on by default, these problems would go away.

> Incorrect numeric values when out of range
> --
>
> Key: SPARK-28024
> URL: https://issues.apache.org/jira/browse/SPARK-28024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Yuming Wang
>Priority: Blocker
>  Labels: correctness
> Attachments: SPARK-28024.png
>
>
> For example
> Case 1:
> {code:sql}
> select tinyint(128) * tinyint(2); -- 0
> select smallint(2147483647) * smallint(2); -- -2
> select int(2147483647) * int(2); -- -2
> SELECT smallint((-32768)) * smallint(-1); -- -32768
> {code}
> Case 2:
> {code:sql}
> spark-sql> select cast('10e-70' as float), cast('-10e-70' as float);
> 0.0   -0.0
> {code}
> Case 3:
> {code:sql}
> spark-sql> select cast('10e-400' as double), cast('-10e-400' as double);
> 0.0   -0.0
> {code}
> Case 4:
> {code:sql}
> spark-sql> select exp(-1.2345678901234E200);
> 0.0
> postgres=# select exp(-1.2345678901234E200);
> ERROR:  value overflows numeric format
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26836) Columns get switched in Spark SQL using Avro backed Hive table if schema evolves

2020-02-20 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040853#comment-17040853
 ] 

Wenchen Fan commented on SPARK-26836:
-

cc [~Gengliang.Wang]

> Columns get switched in Spark SQL using Avro backed Hive table if schema 
> evolves
> 
>
> Key: SPARK-26836
> URL: https://issues.apache.org/jira/browse/SPARK-26836
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.0
> Environment: I tested with Hive and HCatalog which runs on version 
> 2.3.4 and with Spark 2.3.1 and 2.4
>Reporter: Tamás Németh
>Priority: Blocker
>  Labels: correctness
> Attachments: doctors.avro, doctors_evolved.avro, 
> doctors_evolved.json, original.avsc
>
>
> I have a hive avro table where the avro schema is stored on s3 next to the 
> avro files. 
> In the table definiton the avro.schema.url always points to the latest 
> partition's _schema.avsc file which is always the lates schema. (Avro schemas 
> are backward and forward compatible in a table)
> When new data comes in, I always add a new partition where the 
> avro.schema.url properties also set to the _schema.avsc which was used when 
> it was added and of course I always update the table avro.schema.url property 
> to the latest one.
> Querying this table works fine until the schema evolves in a way that a new 
> optional property is added in the middle. 
> When this happens then after the spark sql query the columns in the old 
> partition gets mixed up and it shows the wrong data for the columns.
> If I query the table with Hive then everything is perfectly fine and it gives 
> me back the correct columns for the partitions which were created the old 
> schema and for the new which was created the evolved schema.
>  
> Here is how I could reproduce with the 
> [doctors.avro|https://github.com/apache/spark/blob/master/sql/hive/src/test/resources/data/files/doctors.avro]
>  example data in sql test suite.
>  # I have created two partition folder:
> {code:java}
> [hadoop@ip-192-168-10-158 hadoop]$ hdfs dfs -ls s3://somelocation/doctors/*/
> Found 2 items
> -rw-rw-rw- 1 hadoop hadoop 418 2019-02-06 12:48 s3://somelocation/doctors
> /dt=2019-02-05/_schema.avsc
> -rw-rw-rw- 1 hadoop hadoop 521 2019-02-06 12:13 s3://somelocation/doctors
> /dt=2019-02-05/doctors.avro
> Found 2 items
> -rw-rw-rw- 1 hadoop hadoop 580 2019-02-06 12:49 s3://somelocation/doctors
> /dt=2019-02-06/_schema.avsc
> -rw-rw-rw- 1 hadoop hadoop 577 2019-02-06 12:13 s3://somelocation/doctors
> /dt=2019-02-06/doctors_evolved.avro{code}
> Here the first partition had data which was created with the schema before 
> evolving and the second one had the evolved one. (the evolved schema is the 
> same as in your testcase except I moved the extra_field column to the last 
> from the second and I generated two lines of avro data with the evolved 
> schema.
>  # I have created a hive table with the following command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `default.doctors`
>  PARTITIONED BY (
>  `dt` string
>  )
>  ROW FORMAT SERDE
>  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>  WITH SERDEPROPERTIES (
>  'avro.schema.url'='s3://somelocation/doctors/
> /dt=2019-02-06/_schema.avsc')
>  STORED AS INPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>  OUTPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>  LOCATION
>  's3://somelocation/doctors/'
>  TBLPROPERTIES (
>  'transient_lastDdlTime'='1538130975'){code}
>  
> Here as you can see the table schema url points to the latest schema
> 3. I ran an msck _repair table_ to pick up all the partitions.
> Fyi: If I run my select * query from here then everything is fine and no 
> columns switch happening.
> 4. Then I changed the first partition's avro.schema.url url to points to the 
> schema which is under the partition folder (non-evolved one -> 
> s3://somelocation/doctors/
> /dt=2019-02-05/_schema.avsc)
> Then if you ran a _select * from default.spark_test_ then the columns will be 
> mixed up (on the data below the first name column becomes the extra_field 
> column. I guess because in the latest schema it is the second column):
>  
> {code:java}
> number,extra_field,first_name,last_name,dt 
> 6,Colin,Baker,null,2019-02-05 
> 3,Jon,Pertwee,null,2019-02-05 
> 4,Tom,Baker,null,2019-02-05 
> 5,Peter,Davison,null,2019-02-05 
> 11,Matt,Smith,null,2019-02-05 
> 1,William,Hartnell,null,2019-02-05 
> 7,Sylvester,McCoy,null,2019-02-05 
> 8,Paul,McGann,null,2019-02-05 
> 2,Patrick,Troughton,null,2019-02-05 
> 9,Christopher,Eccleston,null,2019-02-05 
> 10,David,Tennant,null,2019-02-05 
> 21,fishfinger,Jim,Baker,2019-02-06 
> 24,fishfinger,Bean,Pertwee,2019-02-06
> {code}
> If I try

[jira] [Created] (SPARK-30892) Exclude spark.sql.variable.substitute.depth from removedSQLConfigs

2020-02-20 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30892:
--

 Summary: Exclude spark.sql.variable.substitute.depth from 
removedSQLConfigs
 Key: SPARK-30892
 URL: https://issues.apache.org/jira/browse/SPARK-30892
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


The spark.sql.variable.substitute.depth SQL config is not used since Spark 2.4 
inclusively. By the [https://github.com/apache/spark/pull/27169], the config 
was placed to SQLConf.removedSQLConfigs. And as a consequence of that when an 
user set it non-default value (1 for example),  he/she will get an exception. 
It is acceptable for configs that could impact on the behavior but not for this 
particular config. Raising of such exception will just make migration to Spark 
more difficult.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30868) Throw Exception if runHive(sql) failed

2020-02-20 Thread Jackey Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040868#comment-17040868
 ] 

Jackey Lee commented on SPARK-30868:


Can't agree more. It is up to the user to choose how to handle the exception. 
What Spark does is to inform the user about the exception, not to mask it.

> Throw Exception if runHive(sql) failed
> --
>
> Key: SPARK-30868
> URL: https://issues.apache.org/jira/browse/SPARK-30868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> At present, HiveClientImpl.runHive will not throw an exception when it runs 
> incorrectly, which will cause it to fail to feedback error information 
> normally.
> Example
> {code:scala}
> spark.sql("add jar file:///tmp/test.jar")
> spark.sql("show databases").show()
> {code}
> /tmp/test.jar doesn't exist, thus add jar is failed. However this code will 
> run completely without causing application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30892) Exclude spark.sql.variable.substitute.depth from removedSQLConfigs

2020-02-20 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30892:
---
Description: The spark.sql.variable.substitute.depth SQL config is not used 
since Spark 2.4 inclusively. By the 
[https://github.com/apache/spark/pull/27169], the config was placed to 
SQLConf.removedSQLConfigs. And as a consequence of that when an user set it 
non-default value (1 for example),  he/she will get an exception. It is 
acceptable for configs that could impact on the behavior but not for this 
particular config. Raising of such exception will just make migration to Spark 
3.0 more difficult.  (was: The spark.sql.variable.substitute.depth SQL config 
is not used since Spark 2.4 inclusively. By the 
[https://github.com/apache/spark/pull/27169], the config was placed to 
SQLConf.removedSQLConfigs. And as a consequence of that when an user set it 
non-default value (1 for example),  he/she will get an exception. It is 
acceptable for configs that could impact on the behavior but not for this 
particular config. Raising of such exception will just make migration to Spark 
more difficult.)

> Exclude spark.sql.variable.substitute.depth from removedSQLConfigs
> --
>
> Key: SPARK-30892
> URL: https://issues.apache.org/jira/browse/SPARK-30892
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The spark.sql.variable.substitute.depth SQL config is not used since Spark 
> 2.4 inclusively. By the [https://github.com/apache/spark/pull/27169], the 
> config was placed to SQLConf.removedSQLConfigs. And as a consequence of that 
> when an user set it non-default value (1 for example),  he/she will get an 
> exception. It is acceptable for configs that could impact on the behavior but 
> not for this particular config. Raising of such exception will just make 
> migration to Spark 3.0 more difficult.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30893) expressions should not change its data type/nullability after it's created

2020-02-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-30893:
---

 Summary: expressions should not change its data type/nullability 
after it's created
 Key: SPARK-30893
 URL: https://issues.apache.org/jira/browse/SPARK-30893
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan


This is a problem because the configuration can change between different phases 
of planning, and this can silently break a query plan which can lead to crashes 
or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30858) IntegralDivide's dataType should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30858:

Parent: SPARK-30893
Issue Type: Sub-task  (was: Bug)

> IntegralDivide's dataType should not depend on SQLConf.get
> --
>
> Key: SPARK-30858
> URL: https://issues.apache.org/jira/browse/SPARK-30858
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Herman van Hövell
>Priority: Blocker
>
> {{IntegralDivide}}'s dataType depends on the value of 
> {{SQLConf.get.integralDivideReturnLong}}. This is a problem because the 
> configuration can change between different phases of planning, and this can 
> silently break a query plan which can lead to crashes or data corruption. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30894) the behavior of Size operation should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-30894:
---

 Summary: the behavior of Size operation should not depend on 
SQLConf.get
 Key: SPARK-30894
 URL: https://issues.apache.org/jira/browse/SPARK-30894
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30893) expressions should not change its data type/nullability/behavior after it's created

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30893:

Summary: expressions should not change its data type/nullability/behavior 
after it's created  (was: expressions should not change its data 
type/nullability after it's created)

> expressions should not change its data type/nullability/behavior after it's 
> created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30895) The behavior of CsvToStructs should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-30895:
---

 Summary: The behavior of CsvToStructs should not depend on 
SQLConf.get
 Key: SPARK-30895
 URL: https://issues.apache.org/jira/browse/SPARK-30895
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30896) The behavior of JsonToStructs should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-30896:
---

 Summary: The behavior of JsonToStructs should not depend on 
SQLConf.get
 Key: SPARK-30896
 URL: https://issues.apache.org/jira/browse/SPARK-30896
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30893) Expressions should not change its data type/nullability/behavior after it's created

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30893:

Summary: Expressions should not change its data type/nullability/behavior 
after it's created  (was: expressions should not change its data 
type/nullability/behavior after it's created)

> Expressions should not change its data type/nullability/behavior after it's 
> created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30894) The behavior of Size function should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30894:

Summary: The behavior of Size function should not depend on SQLConf.get  
(was: the behavior of Size operation should not depend on SQLConf.get)

> The behavior of Size function should not depend on SQLConf.get
> --
>
> Key: SPARK-30894
> URL: https://issues.apache.org/jira/browse/SPARK-30894
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30897) The behavior of ArrayExists should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-30897:
---

 Summary: The behavior of ArrayExists should not depend on 
SQLConf.get
 Key: SPARK-30897
 URL: https://issues.apache.org/jira/browse/SPARK-30897
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30898) The behavior of MakeDecimal should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-30898:
---

 Summary: The behavior of MakeDecimal should not depend on 
SQLConf.get
 Key: SPARK-30898
 URL: https://issues.apache.org/jira/browse/SPARK-30898
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30893) Expressions should not change its data type/nullability/behavior after it's created

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30893:

Priority: Critical  (was: Blocker)

> Expressions should not change its data type/nullability/behavior after it's 
> created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30899) CreateArray/CreateMap's data type should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-30899:
---

 Summary: CreateArray/CreateMap's data type should not depend on 
SQLConf.get
 Key: SPARK-30899
 URL: https://issues.apache.org/jira/browse/SPARK-30899
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30893) Expressions should not change its data type/behavior after it's created

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30893:

Summary: Expressions should not change its data type/behavior after it's 
created  (was: Expressions should not change its data type/nullability/behavior 
after it's created)

> Expressions should not change its data type/behavior after it's created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30858) IntegralDivide's dataType should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30858.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27628
[https://github.com/apache/spark/pull/27628]

> IntegralDivide's dataType should not depend on SQLConf.get
> --
>
> Key: SPARK-30858
> URL: https://issues.apache.org/jira/browse/SPARK-30858
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Herman van Hövell
>Assignee: Maxim Gekk
>Priority: Blocker
> Fix For: 3.0.0
>
>
> {{IntegralDivide}}'s dataType depends on the value of 
> {{SQLConf.get.integralDivideReturnLong}}. This is a problem because the 
> configuration can change between different phases of planning, and this can 
> silently break a query plan which can lead to crashes or data corruption. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30858) IntegralDivide's dataType should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30858:
---

Assignee: Maxim Gekk

> IntegralDivide's dataType should not depend on SQLConf.get
> --
>
> Key: SPARK-30858
> URL: https://issues.apache.org/jira/browse/SPARK-30858
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Herman van Hövell
>Assignee: Maxim Gekk
>Priority: Blocker
>
> {{IntegralDivide}}'s dataType depends on the value of 
> {{SQLConf.get.integralDivideReturnLong}}. This is a problem because the 
> configuration can change between different phases of planning, and this can 
> silently break a query plan which can lead to crashes or data corruption. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30900) FileStreamSource: Avoid reading compact metadata log twice if the query stops from compact batch and restarts

2020-02-20 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-30900:


 Summary: FileStreamSource: Avoid reading compact metadata log 
twice if the query stops from compact batch and restarts
 Key: SPARK-30900
 URL: https://issues.apache.org/jira/browse/SPARK-30900
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Jungtaek Lim


When restarting the query, there is a case which the query starts from 
compaction batch, and the batch has source metadata file to read. One case is 
that the previous query succeeded to read from inputs, but not finalized the 
batch for various reasons.

This case FileStreamSource will read the compact metadata file twice, one for 
retrieving all files to build seen file map, another one for retrieving entries 
in the batch. If the query processes huge number of inputs so far, compact 
metadata file becomes considerably bigger, so reading once more adds 
unnecessary latency on processing startup batch.

This issue tracks the effort to address this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30901) [DOC] In streaming-kinesis-integration.md, the initialPosition method changed

2020-02-20 Thread DavidXU (Jira)
DavidXU created SPARK-30901:
---

 Summary: [DOC] In streaming-kinesis-integration.md, the 
initialPosition method changed
 Key: SPARK-30901
 URL: https://issues.apache.org/jira/browse/SPARK-30901
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.4.4
Reporter: DavidXU


In Spark documentation : 
[https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html#configuring-spark-streaming-application]

I find that here we still give exemple like: 

{{_".initialPositionInStream([initial position])"_}}

{{}}

{{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}

I noticed:

_@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
"2.3.0")_
 _def initialPositionInStream(initialPosition: InitialPositionInStream): 
Builder_ 

 

{{So, I think that the doc here should be updated, to give an exemple like:}}

{{_".initialPosition([initial position])"_}}

{{}}

{{}}{{}}

And the same thing goes to the description below:

"_{{[initial position]}}: Can be either 
{{InitialPositionInStream.TRIM_HORIZON}} or {{InitialPositionInStream.LATEST}}_"

In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_

_{{**}}_{{(by using }}{{initialPosition in the place of 
initialPositionInStream)}}

{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30901) [DOC] In streaming-kinesis-integration.md, the initialPosition method changed

2020-02-20 Thread DavidXU (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DavidXU updated SPARK-30901:

Description: 
In Spark documentation : 
[https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html#configuring-spark-streaming-application]

I find that here we still give exemple like: 

{{_".initialPositionInStream([initial position])"_}}

 

{{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}

I noticed:

_@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
"2.3.0")_
 _def initialPositionInStream(initialPosition: InitialPositionInStream): 
Builder_ 

 

{{So, I think that the doc here should be updated, to give an exemple like:}}

{{_".initialPosition([initial position])"_}}

 

 

And the same thing goes to the description below:

"_{{[initial position]}}: Can be either 
{{InitialPositionInStream.TRIM_HORIZON}} or {{InitialPositionInStream.LATEST}}_"

In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_

_{{**}}_{{(by using initialPosition in the place of initialPositionInStream)}}

 

  was:
In Spark documentation : 
[https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html#configuring-spark-streaming-application]

I find that here we still give exemple like: 

{{_".initialPositionInStream([initial position])"_}}

{{}}

{{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}

I noticed:

_@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
"2.3.0")_
 _def initialPositionInStream(initialPosition: InitialPositionInStream): 
Builder_ 

 

{{So, I think that the doc here should be updated, to give an exemple like:}}

{{_".initialPosition([initial position])"_}}

{{}}

{{}}{{}}

And the same thing goes to the description below:

"_{{[initial position]}}: Can be either 
{{InitialPositionInStream.TRIM_HORIZON}} or {{InitialPositionInStream.LATEST}}_"

In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_

_{{**}}_{{(by using }}{{initialPosition in the place of 
initialPositionInStream)}}

{{}}


> [DOC] In streaming-kinesis-integration.md, the initialPosition method changed
> -
>
> Key: SPARK-30901
> URL: https://issues.apache.org/jira/browse/SPARK-30901
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: DavidXU
>Priority: Major
>
> In Spark documentation : 
> [https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html#configuring-spark-streaming-application]
> I find that here we still give exemple like: 
> {{_".initialPositionInStream([initial position])"_}}
>  
> {{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}
> I noticed:
> _@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
> "2.3.0")_
>  _def initialPositionInStream(initialPosition: InitialPositionInStream): 
> Builder_ 
>  
> {{So, I think that the doc here should be updated, to give an exemple like:}}
> {{_".initialPosition([initial position])"_}}
>  
>  
> And the same thing goes to the description below:
> "_{{[initial position]}}: Can be either 
> {{InitialPositionInStream.TRIM_HORIZON}} or 
> {{InitialPositionInStream.LATEST}}_"
> In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
> KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_
> _{{**}}_{{(by using initialPosition in the place of initialPositionInStream)}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30901) [DOC] In streaming-kinesis-integration.md, the initialPosition method changed

2020-02-20 Thread DavidXU (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DavidXU updated SPARK-30901:

Description: 
In Spark documentation : 
[https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html#configuring-spark-streaming-application]

I find that here we still give exemple like: 

{{_".initialPositionInStream([initial position])"_}}

 

{{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}

I noticed:

_@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
"2.3.0")_
 _def initialPositionInStream(initialPosition: InitialPositionInStream): 
Builder_ 

 

{{So, I think that the doc here should be updated, to give an exemple like:}}

{{_".initialPosition([initial position])"_}}

 

 

And the same thing goes to the description below:

"_{{[initial position]}}: Can be either 
{{InitialPositionInStream.TRIM_HORIZON}} or {{InitialPositionInStream.LATEST}}_"

In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_

{{(by using initialPosition in the place of initialPositionInStream)}}

 

  was:
In Spark documentation : 
[https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html#configuring-spark-streaming-application]

I find that here we still give exemple like: 

{{_".initialPositionInStream([initial position])"_}}

 

{{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}

I noticed:

_@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
"2.3.0")_
 _def initialPositionInStream(initialPosition: InitialPositionInStream): 
Builder_ 

 

{{So, I think that the doc here should be updated, to give an exemple like:}}

{{_".initialPosition([initial position])"_}}

 

 

And the same thing goes to the description below:

"_{{[initial position]}}: Can be either 
{{InitialPositionInStream.TRIM_HORIZON}} or {{InitialPositionInStream.LATEST}}_"

In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_

_{{**}}_{{(by using initialPosition in the place of initialPositionInStream)}}

 


> [DOC] In streaming-kinesis-integration.md, the initialPosition method changed
> -
>
> Key: SPARK-30901
> URL: https://issues.apache.org/jira/browse/SPARK-30901
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: DavidXU
>Priority: Major
>
> In Spark documentation : 
> [https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html#configuring-spark-streaming-application]
> I find that here we still give exemple like: 
> {{_".initialPositionInStream([initial position])"_}}
>  
> {{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}
> I noticed:
> _@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
> "2.3.0")_
>  _def initialPositionInStream(initialPosition: InitialPositionInStream): 
> Builder_ 
>  
> {{So, I think that the doc here should be updated, to give an exemple like:}}
> {{_".initialPosition([initial position])"_}}
>  
>  
> And the same thing goes to the description below:
> "_{{[initial position]}}: Can be either 
> {{InitialPositionInStream.TRIM_HORIZON}} or 
> {{InitialPositionInStream.LATEST}}_"
> In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
> KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_
> {{(by using initialPosition in the place of initialPositionInStream)}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30901) [DOC] In streaming-kinesis-integration.md, the initialPosition method changed

2020-02-20 Thread DavidXU (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DavidXU updated SPARK-30901:

Description: 
In Spark documentation : 
[https://spark.apache.org/docs/2.4.4/streaming-kinesis-integration.html#configuring-spark-streaming-application]

I find that here we still give exemple like: 

{{_".initialPositionInStream([initial position])"_}}

 

{{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}

I noticed:

_@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
"2.3.0")_
 _def initialPositionInStream(initialPosition: InitialPositionInStream): 
Builder_ 

 

{{So, I think that the doc here should be updated, to give an exemple like:}}

{{_".initialPosition([initial position])"_}}

 

 

And the same thing goes to the description below:

"_{{[initial position]}}: Can be either 
{{InitialPositionInStream.TRIM_HORIZON}} or {{InitialPositionInStream.LATEST}}_"

In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_

{{(by using initialPosition in the place of initialPositionInStream)}}

 

  was:
In Spark documentation : 
[https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html#configuring-spark-streaming-application]

I find that here we still give exemple like: 

{{_".initialPositionInStream([initial position])"_}}

 

{{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}

I noticed:

_@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
"2.3.0")_
 _def initialPositionInStream(initialPosition: InitialPositionInStream): 
Builder_ 

 

{{So, I think that the doc here should be updated, to give an exemple like:}}

{{_".initialPosition([initial position])"_}}

 

 

And the same thing goes to the description below:

"_{{[initial position]}}: Can be either 
{{InitialPositionInStream.TRIM_HORIZON}} or {{InitialPositionInStream.LATEST}}_"

In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_

{{(by using initialPosition in the place of initialPositionInStream)}}

 


> [DOC] In streaming-kinesis-integration.md, the initialPosition method changed
> -
>
> Key: SPARK-30901
> URL: https://issues.apache.org/jira/browse/SPARK-30901
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: DavidXU
>Priority: Major
>
> In Spark documentation : 
> [https://spark.apache.org/docs/2.4.4/streaming-kinesis-integration.html#configuring-spark-streaming-application]
> I find that here we still give exemple like: 
> {{_".initialPositionInStream([initial position])"_}}
>  
> {{But in the code source of _spark-streaming-kinesis-asl_2.11-2.4.4_}}
> I noticed:
> _@deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", 
> "2.3.0")_
>  _def initialPositionInStream(initialPosition: InitialPositionInStream): 
> Builder_ 
>  
> {{So, I think that the doc here should be updated, to give an exemple like:}}
> {{_".initialPosition([initial position])"_}}
>  
>  
> And the same thing goes to the description below:
> "_{{[initial position]}}: Can be either 
> {{InitialPositionInStream.TRIM_HORIZON}} or 
> {{InitialPositionInStream.LATEST}}_"
> In fact, now we can use _{{KinesisInitialPositions.}}__{{TRIM_HORIZON 
> KinesisInitialPositions.LATEST or *KinesisInitialPositions.AT_TIMESTAMP*}}_
> {{(by using initialPosition in the place of initialPositionInStream)}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30902) default table provider should be decided by catalog implementations

2020-02-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-30902:
---

 Summary: default table provider should be decided by catalog 
implementations
 Key: SPARK-30902
 URL: https://issues.apache.org/jira/browse/SPARK-30902
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30903) Fail fast on duplicate columns when analyze columns

2020-02-20 Thread wuyi (Jira)
wuyi created SPARK-30903:


 Summary: Fail fast on duplicate columns when analyze columns
 Key: SPARK-30903
 URL: https://issues.apache.org/jira/browse/SPARK-30903
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: wuyi


We should fail fast on duplicate columns when analyze columns to avoid 
duplicate computation on the column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30892) Exclude spark.sql.variable.substitute.depth from removedSQLConfigs

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30892.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27646
[https://github.com/apache/spark/pull/27646]

> Exclude spark.sql.variable.substitute.depth from removedSQLConfigs
> --
>
> Key: SPARK-30892
> URL: https://issues.apache.org/jira/browse/SPARK-30892
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> The spark.sql.variable.substitute.depth SQL config is not used since Spark 
> 2.4 inclusively. By the [https://github.com/apache/spark/pull/27169], the 
> config was placed to SQLConf.removedSQLConfigs. And as a consequence of that 
> when an user set it non-default value (1 for example),  he/she will get an 
> exception. It is acceptable for configs that could impact on the behavior but 
> not for this particular config. Raising of such exception will just make 
> migration to Spark 3.0 more difficult.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30892) Exclude spark.sql.variable.substitute.depth from removedSQLConfigs

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30892:
---

Assignee: Maxim Gekk

> Exclude spark.sql.variable.substitute.depth from removedSQLConfigs
> --
>
> Key: SPARK-30892
> URL: https://issues.apache.org/jira/browse/SPARK-30892
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The spark.sql.variable.substitute.depth SQL config is not used since Spark 
> 2.4 inclusively. By the [https://github.com/apache/spark/pull/27169], the 
> config was placed to SQLConf.removedSQLConfigs. And as a consequence of that 
> when an user set it non-default value (1 for example),  he/she will get an 
> exception. It is acceptable for configs that could impact on the behavior but 
> not for this particular config. Raising of such exception will just make 
> migration to Spark 3.0 more difficult.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30884) Upgrade to Py4J 0.10.9

2020-02-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30884.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27641
[https://github.com/apache/spark/pull/27641]

> Upgrade to Py4J 0.10.9
> --
>
> Key: SPARK-30884
> URL: https://issues.apache.org/jira/browse/SPARK-30884
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to upgrade Py4J from 0.10.8.1 to 0.10.9.
> Py4J 0.10.9 is released with the following fixes.
> - https://www.py4j.org/changelog.html#py4j-0-10-9



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30904) Thrift RowBasedSet serialization throws NullPointerException on NULL BigDecimal

2020-02-20 Thread Christian Stuart (Jira)
Christian Stuart created SPARK-30904:


 Summary: Thrift RowBasedSet serialization throws 
NullPointerException on NULL BigDecimal
 Key: SPARK-30904
 URL: https://issues.apache.org/jira/browse/SPARK-30904
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0, 3.1.0
Reporter: Christian Stuart


Adding the following test to {{SparkThriftServerProtocolVersionsSuite}} 
reproduces the issue:
{code:scala}
test(s"$version get null as decimal") {
  testExecuteStatementWithProtocolVersion(version,
"SELECT cast(null as decimal)") { rs =>
assert(rs.next())
assert(rs.getBigDecimal(1) === null)
  }
}{code}

The bug was introduced in https://github.com/apache/spark/commit/163f4a4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30905) Execute the TIMESTAMP roadmap

2020-02-20 Thread H. Vetinari (Jira)
H. Vetinari created SPARK-30905:
---

 Summary: Execute the TIMESTAMP roadmap
 Key: SPARK-30905
 URL: https://issues.apache.org/jira/browse/SPARK-30905
 Project: Spark
  Issue Type: Task
  Components: Input/Output, Spark Core
Affects Versions: 2.4.5
Reporter: H. Vetinari


This issue is intended for tracking the addition and/or alteration of different 
TIMESTAMP types in order to eventually reach the desired state as specified in 
the [design 
doc|https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit]
 for TIMESTAMP types.

It's a sister issue to HIVE-21348 & IMPALA-9408 - I found no comparable issue 
for Spark (and I was hoping to find out the status of this roadmap on the 
Spark-side) - and related to SPARK-26797.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30861) Deprecate constructor of SQLContext and getOrCreate in SQLContext at PySpark

2020-02-20 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned SPARK-30861:


Assignee: Hyukjin Kwon

> Deprecate constructor of SQLContext and getOrCreate in SQLContext at PySpark
> 
>
> Key: SPARK-30861
> URL: https://issues.apache.org/jira/browse/SPARK-30861
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Those were removed as of SPARK-25908. We should deprecate them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30861) Deprecate constructor of SQLContext and getOrCreate in SQLContext at PySpark

2020-02-20 Thread Bryan Cutler (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041268#comment-17041268
 ] 

Bryan Cutler commented on SPARK-30861:
--

Issue resolved by pull request 27614
https://github.com/apache/spark/pull/27614

> Deprecate constructor of SQLContext and getOrCreate in SQLContext at PySpark
> 
>
> Key: SPARK-30861
> URL: https://issues.apache.org/jira/browse/SPARK-30861
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Those were removed as of SPARK-25908. We should deprecate them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30861) Deprecate constructor of SQLContext and getOrCreate in SQLContext at PySpark

2020-02-20 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-30861:
-
Fix Version/s: 3.0.0

> Deprecate constructor of SQLContext and getOrCreate in SQLContext at PySpark
> 
>
> Key: SPARK-30861
> URL: https://issues.apache.org/jira/browse/SPARK-30861
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> Those were removed as of SPARK-25908. We should deprecate them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30861) Deprecate constructor of SQLContext and getOrCreate in SQLContext at PySpark

2020-02-20 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-30861.
--
Resolution: Fixed

> Deprecate constructor of SQLContext and getOrCreate in SQLContext at PySpark
> 
>
> Key: SPARK-30861
> URL: https://issues.apache.org/jira/browse/SPARK-30861
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> Those were removed as of SPARK-25908. We should deprecate them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30667) Support simple all gather in barrier task context

2020-02-20 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-30667:
--
Fix Version/s: (was: 3.0.0)

> Support simple all gather in barrier task context
> -
>
> Key: SPARK-30667
> URL: https://issues.apache.org/jira/browse/SPARK-30667
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Sarth Frey
>Priority: Major
>
> Currently we offer task.barrier() to coordinate tasks in barrier mode. Tasks 
> can see all IP addresses from BarrierTaskContext. It would be simpler to 
> integrate with distributed frameworks like TensorFlow DistributionStrategy if 
> we provide all gather that can let tasks share additional information with 
> others, e.g., an available port.
> Note that with all gather, tasks are share their IP addresses as well.
> {code}
> port = ... # get an available port
> ports = context.all_gather(port) # get all available ports, ordered by task ID
> ...  # set up distributed training service
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-30667) Support simple all gather in barrier task context

2020-02-20 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reopened SPARK-30667:
---

> Support simple all gather in barrier task context
> -
>
> Key: SPARK-30667
> URL: https://issues.apache.org/jira/browse/SPARK-30667
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Sarth Frey
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently we offer task.barrier() to coordinate tasks in barrier mode. Tasks 
> can see all IP addresses from BarrierTaskContext. It would be simpler to 
> integrate with distributed frameworks like TensorFlow DistributionStrategy if 
> we provide all gather that can let tasks share additional information with 
> others, e.g., an available port.
> Note that with all gather, tasks are share their IP addresses as well.
> {code}
> port = ... # get an available port
> ports = context.all_gather(port) # get all available ports, ordered by task ID
> ...  # set up distributed training service
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30894) The behavior of Size function should not depend on SQLConf.get

2020-02-20 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041284#comment-17041284
 ] 

Maxim Gekk commented on SPARK-30894:


I am working on it.

> The behavior of Size function should not depend on SQLConf.get
> --
>
> Key: SPARK-30894
> URL: https://issues.apache.org/jira/browse/SPARK-30894
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30906) Turning off AQE in CacheManager is not thread-safe

2020-02-20 Thread Wei Xue (Jira)
Wei Xue created SPARK-30906:
---

 Summary: Turning off AQE in CacheManager is not thread-safe
 Key: SPARK-30906
 URL: https://issues.apache.org/jira/browse/SPARK-30906
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wei Xue


This is fix for https://issues.apache.org/jira/browse/SPARK-30188

And it should have been turned off for "recache" too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30768) Constraints inferred from inequality attributes

2020-02-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-30768:

Summary: Constraints inferred from inequality attributes  (was: Constraints 
should be inferred from inequality attributes)

> Constraints inferred from inequality attributes
> ---
>
> Key: SPARK-30768
> URL: https://issues.apache.org/jira/browse/SPARK-30768
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> create table SPARK_30768_1(c1 int, c2 int);
> create table SPARK_30768_2(c1 int, c2 int);
> {code}
> *Spark SQL*:
> {noformat}
> spark-sql> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on 
> (t1.c1 > t2.c1) where t1.c1 = 3;
> == Physical Plan ==
> *(3) Project [c1#5, c2#6]
> +- BroadcastNestedLoopJoin BuildRight, Inner, (c1#5 > c1#7)
>:- *(1) Project [c1#5, c2#6]
>:  +- *(1) Filter (isnotnull(c1#5) AND (c1#5 = 3))
>: +- *(1) ColumnarToRow
>:+- FileScan parquet default.spark_30768_1[c1#5,c2#6] Batched: 
> true, DataFilters: [isnotnull(c1#5), (c1#5 = 3)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,3)], 
> ReadSchema: struct
>+- BroadcastExchange IdentityBroadcastMode, [id=#60]
>   +- *(2) Project [c1#7]
>  +- *(2) Filter isnotnull(c1#7)
> +- *(2) ColumnarToRow
>+- FileScan parquet default.spark_30768_2[c1#7] Batched: true, 
> DataFilters: [isnotnull(c1#7)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: 
> struct
> {noformat}
> *Hive* support this feature:
> {noformat}
> hive> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on 
> (t1.c1 > t2.c1) where t1.c1 = 3;
> Warning: Map Join MAPJOIN[13][bigTable=?] in task 'Stage-3:MAPRED' is a cross 
> product
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-4
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> $hdt$_0:t1
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> $hdt$_0:t1
>   TableScan
> alias: t1
> filterExpr: (c1 = 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: NONE
> Filter Operator
>   predicate: (c1 = 3) (type: boolean)
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE
>   Select Operator
> expressions: c2 (type: int)
> outputColumnNames: _col1
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE
> HashTable Sink Operator
>   keys:
> 0
> 1
>   Stage: Stage-3
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: t2
> filterExpr: (c1 < 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: NONE
> Filter Operator
>   predicate: (c1 < 3) (type: boolean)
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE
>   Select Operator
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   keys:
> 0
> 1
>   outputColumnNames: _col1
>   Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL 
> Column stats: NONE
>   Select Operator
> expressions: 3 (type: int), _col1 (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> PARTIAL Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   

[jira] [Created] (SPARK-30907) Revise the doc of spark.ui.retainedTasks

2020-02-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-30907:
--

 Summary: Revise the doc of spark.ui.retainedTasks
 Key: SPARK-30907
 URL: https://issues.apache.org/jira/browse/SPARK-30907
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


There are configurations for the limitation of UI data.  
`spark.ui.retainedJobs`, `spark.ui.retainedStages` and 
`spark.worker.ui.retainedExecutors` are the total max number for one 
application, while the configuration `spark.ui.retainedTasks` is the max number 
for one stage.

We should revise the documentation to make it clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30907) Revise the doc of spark.ui.retainedTasks

2020-02-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-30907.
--
Fix Version/s: 3.0.0
   2.4.6
   Resolution: Fixed

Issue resolved by pull request 27660
[https://github.com/apache/spark/pull/27660]

> Revise the doc of spark.ui.retainedTasks
> 
>
> Key: SPARK-30907
> URL: https://issues.apache.org/jira/browse/SPARK-30907
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 2.4.6, 3.0.0
>
>
> There are configurations for the limitation of UI data.  
> `spark.ui.retainedJobs`, `spark.ui.retainedStages` and 
> `spark.worker.ui.retainedExecutors` are the total max number for one 
> application, while the configuration `spark.ui.retainedTasks` is the max 
> number for one stage.
> We should revise the documentation to make it clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30841) Display the version of Spark Sql configurations into page

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041447#comment-17041447
 ] 

jiaan.geng commented on SPARK-30841:


I'm working on.

> Display the version of Spark Sql configurations into page
> -
>
> Key: SPARK-30841
> URL: https://issues.apache.org/jira/browse/SPARK-30841
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30840) Add version property for ConfigEntry and ConfigBuilder

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041446#comment-17041446
 ] 

jiaan.geng commented on SPARK-30840:


I'm working on.

> Add version property for ConfigEntry and ConfigBuilder
> --
>
> Key: SPARK-30840
> URL: https://issues.apache.org/jira/browse/SPARK-30840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30887) Arrange version info of deploy

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041448#comment-17041448
 ] 

jiaan.geng commented on SPARK-30887:


I'm working on.

> Arrange version info of deploy
> --
>
> Key: SPARK-30887
> URL: https://issues.apache.org/jira/browse/SPARK-30887
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> [*spark*|https://github.com/apache/spark]/[core|https://github.com/apache/spark/tree/master/core]/[src|https://github.com/apache/spark/tree/master/core/src]/[main|https://github.com/apache/spark/tree/master/core/src/main]/[scala|https://github.com/apache/spark/tree/master/core/src/main/scala]/[org|https://github.com/apache/spark/tree/master/core/src/main/scala/org]/[apache|https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache]/[spark|https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark]/[internal|https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/internal]/[config|https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/internal/config]/*Deploy.scala*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30889) Arrange version info of worker

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041451#comment-17041451
 ] 

jiaan.geng commented on SPARK-30889:


I'm working on.

> Arrange version info of worker
> --
>
> Key: SPARK-30889
> URL: https://issues.apache.org/jira/browse/SPARK-30889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/Worker.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30888) Arrange version info of network

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041450#comment-17041450
 ] 

jiaan.geng commented on SPARK-30888:


I'm working on.

> Arrange version info of network
> ---
>
> Key: SPARK-30888
> URL: https://issues.apache.org/jira/browse/SPARK-30888
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> spark/core/src/main/scala/org/apache/spark/internal/config/Network.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30891) Arrange version info of history

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041452#comment-17041452
 ] 

jiaan.geng commented on SPARK-30891:


I'm working on.

> Arrange version info of history
> ---
>
> Key: SPARK-30891
> URL: https://issues.apache.org/jira/browse/SPARK-30891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/History.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30908) Arrange version info of Kryo

2020-02-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-30908:
--

 Summary: Arrange version info of Kryo
 Key: SPARK-30908
 URL: https://issues.apache.org/jira/browse/SPARK-30908
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: jiaan.geng


core/src/main/scala/org/apache/spark/internal/config/Kryo.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30908) Arrange version info of Kryo

2020-02-20 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-30908:
---
Component/s: (was: SQL)
 Spark Core

> Arrange version info of Kryo
> 
>
> Key: SPARK-30908
> URL: https://issues.apache.org/jira/browse/SPARK-30908
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/Kryo.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30909) Arrange version info of Python

2020-02-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-30909:
--

 Summary: Arrange version info of Python
 Key: SPARK-30909
 URL: https://issues.apache.org/jira/browse/SPARK-30909
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: jiaan.geng


core/src/main/scala/org/apache/spark/internal/config/Python.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30910) Arrange version info of R

2020-02-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-30910:
--

 Summary: Arrange version info of R
 Key: SPARK-30910
 URL: https://issues.apache.org/jira/browse/SPARK-30910
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: jiaan.geng


core/src/main/scala/org/apache/spark/internal/config/R.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30911) Arrange version info of Status

2020-02-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-30911:
--

 Summary: Arrange version info of Status
 Key: SPARK-30911
 URL: https://issues.apache.org/jira/browse/SPARK-30911
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: jiaan.geng


core/src/main/scala/org/apache/spark/internal/config/Status.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30913) Arrange version info of Tests.scala

2020-02-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-30913:
--

 Summary: Arrange version info of Tests.scala
 Key: SPARK-30913
 URL: https://issues.apache.org/jira/browse/SPARK-30913
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: jiaan.geng


core/src/main/scala/org/apache/spark/internal/config/Tests.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30912) Arrange version info of Streaming.scala

2020-02-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-30912:
--

 Summary: Arrange version info of Streaming.scala
 Key: SPARK-30912
 URL: https://issues.apache.org/jira/browse/SPARK-30912
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: jiaan.geng


core/src/main/scala/org/apache/spark/internal/config/Streaming.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30908) Arrange version info of Kryo

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041457#comment-17041457
 ] 

jiaan.geng commented on SPARK-30908:


I'm working on.

> Arrange version info of Kryo
> 
>
> Key: SPARK-30908
> URL: https://issues.apache.org/jira/browse/SPARK-30908
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/Kryo.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30909) Arrange version info of Python

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041459#comment-17041459
 ] 

jiaan.geng commented on SPARK-30909:


I'm working on.

> Arrange version info of Python
> --
>
> Key: SPARK-30909
> URL: https://issues.apache.org/jira/browse/SPARK-30909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/Python.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30914) Arrange version info of UI

2020-02-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-30914:
--

 Summary: Arrange version info of UI
 Key: SPARK-30914
 URL: https://issues.apache.org/jira/browse/SPARK-30914
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: jiaan.geng


core/src/main/scala/org/apache/spark/internal/config/UI.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30910) Arrange version info of R

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041460#comment-17041460
 ] 

jiaan.geng commented on SPARK-30910:


I'm working on.

> Arrange version info of R
> -
>
> Key: SPARK-30910
> URL: https://issues.apache.org/jira/browse/SPARK-30910
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/R.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30911) Arrange version info of Status

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041461#comment-17041461
 ] 

jiaan.geng commented on SPARK-30911:


I'm working on.

> Arrange version info of Status
> --
>
> Key: SPARK-30911
> URL: https://issues.apache.org/jira/browse/SPARK-30911
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/Status.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30913) Arrange version info of Tests.scala

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041463#comment-17041463
 ] 

jiaan.geng commented on SPARK-30913:


I'm working on.

> Arrange version info of Tests.scala
> ---
>
> Key: SPARK-30913
> URL: https://issues.apache.org/jira/browse/SPARK-30913
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/Tests.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30914) Arrange version info of UI

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041464#comment-17041464
 ] 

jiaan.geng commented on SPARK-30914:


I'm working on.

> Arrange version info of UI
> --
>
> Key: SPARK-30914
> URL: https://issues.apache.org/jira/browse/SPARK-30914
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/UI.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30912) Arrange version info of Streaming.scala

2020-02-20 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041462#comment-17041462
 ] 

jiaan.geng commented on SPARK-30912:


I'm working on.

> Arrange version info of Streaming.scala
> ---
>
> Key: SPARK-30912
> URL: https://issues.apache.org/jira/browse/SPARK-30912
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> core/src/main/scala/org/apache/spark/internal/config/Streaming.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-28093) Built-in function trim/ltrim/rtrim has bug when using trimStr

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reopened SPARK-28093:
-
  Assignee: (was: Yuming Wang)

> Built-in function trim/ltrim/rtrim has bug when using trimStr
> -
>
> Key: SPARK-28093
> URL: https://issues.apache.org/jira/browse/SPARK-28093
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3
>Reporter: Yuming Wang
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> {noformat}
> spark-sql> SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x');
> z
> spark-sql> SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy');
> xyz
> spark-sql> SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy');
> xy
> spark-sql>
> {noformat}
> {noformat}
> postgres=# SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x');
>  btrim | btrim
> ---+---
>  Tom   | bar
> (1 row)
> postgres=# SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy');
>  ltrim |ltrim
> ---+--
>  test  | XxyLAST WORD
> (1 row)
> postgres=# SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy');
>  rtrim |   rtrim
> ---+---
>  test  | TURNERyxX
> (1 row)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28093) Built-in function trim/ltrim/rtrim has bug when using trimStr

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28093.
-
Fix Version/s: (was: 3.0.0)
   Resolution: Won't Fix

> Built-in function trim/ltrim/rtrim has bug when using trimStr
> -
>
> Key: SPARK-28093
> URL: https://issues.apache.org/jira/browse/SPARK-28093
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3
>Reporter: Yuming Wang
>Priority: Major
>  Labels: release-notes
>
> {noformat}
> spark-sql> SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x');
> z
> spark-sql> SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy');
> xyz
> spark-sql> SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy');
> xy
> spark-sql>
> {noformat}
> {noformat}
> postgres=# SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x');
>  btrim | btrim
> ---+---
>  Tom   | bar
> (1 row)
> postgres=# SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy');
>  ltrim |ltrim
> ---+--
>  test  | XxyLAST WORD
> (1 row)
> postgres=# SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy');
>  rtrim |   rtrim
> ---+---
>  test  | TURNERyxX
> (1 row)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28093) Built-in function trim/ltrim/rtrim has bug when using trimStr

2020-02-20 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041556#comment-17041556
 ] 

Wenchen Fan commented on SPARK-28093:
-

This has been reverted in https://api.github.com/repos/apache/spark/pulls/27540

> Built-in function trim/ltrim/rtrim has bug when using trimStr
> -
>
> Key: SPARK-28093
> URL: https://issues.apache.org/jira/browse/SPARK-28093
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3
>Reporter: Yuming Wang
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> {noformat}
> spark-sql> SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x');
> z
> spark-sql> SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy');
> xyz
> spark-sql> SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy');
> xy
> spark-sql>
> {noformat}
> {noformat}
> postgres=# SELECT trim('yxTomxx', 'xyz'), trim('xxxbarxxx', 'x');
>  btrim | btrim
> ---+---
>  Tom   | bar
> (1 row)
> postgres=# SELECT ltrim('zzzytest', 'xyz'), ltrim('xyxXxyLAST WORD', 'xy');
>  ltrim |ltrim
> ---+--
>  test  | XxyLAST WORD
> (1 row)
> postgres=# SELECT rtrim('testxxzx', 'xyz'), rtrim('TURNERyxXxy', 'xy');
>  rtrim |   rtrim
> ---+---
>  test  | TURNERyxX
> (1 row)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30893) Expressions should not change its data type/behavior after it's created

2020-02-20 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041567#comment-17041567
 ] 

Wenchen Fan commented on SPARK-30893:
-

My assumption is: after you create a `df`, calling `df.collect` multiple times 
should always return the same result even if you change some configs.

If this is not what we expect, then we should only fix the data type changes 
and leave the beavior ones.

cc [~hvanhovell] [~viirya] [~maropu] what's your opinion?

> Expressions should not change its data type/behavior after it's created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30880) Delete Sphinx Makefile cruft

2020-02-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-30880.
--
Resolution: Fixed

> Delete Sphinx Makefile cruft
> 
>
> Key: SPARK-30880
> URL: https://issues.apache.org/jira/browse/SPARK-30880
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Nicholas Chammas
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30880) Delete Sphinx Makefile cruft

2020-02-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041570#comment-17041570
 ] 

Hyukjin Kwon commented on SPARK-30880:
--

Fixed in https://github.com/apache/spark/pull/27625

> Delete Sphinx Makefile cruft
> 
>
> Key: SPARK-30880
> URL: https://issues.apache.org/jira/browse/SPARK-30880
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Nicholas Chammas
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30880) Delete Sphinx Makefile cruft

2020-02-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-30880:
-
Fix Version/s: 3.1.0

> Delete Sphinx Makefile cruft
> 
>
> Key: SPARK-30880
> URL: https://issues.apache.org/jira/browse/SPARK-30880
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Nicholas Chammas
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30893) Expressions should not change its data type/behavior after it's created

2020-02-20 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041576#comment-17041576
 ] 

Takeshi Yamamuro commented on SPARK-30893:
--

Yea, I've already seen someone opening PRs for this issue and the idea looks 
pretty reasonable to me. we shouldn't change results in that case because 
the change looks error-prone in applications. But, I think this change has a 
big impact on the existing Spark users. We can simply change the behaviour even 
on major release, e.g., 3.0?

> Expressions should not change its data type/behavior after it's created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)

2020-02-20 Thread Gan Wei (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041602#comment-17041602
 ] 

Gan Wei commented on SPARK-18502:
-

Is there a resolution for this issue. I am also encountering the same issue 
when selecting a column name containing backtick "`" .
{code:java}
df.select("a`b`").show(1)
{code}
got error msg:
{code:java}
org.apache.spark.sql.AnalysisException: syntax error in attribute name
{code}

> Spark does not handle columns that contain backquote (`)
> 
>
> Key: SPARK-18502
> URL: https://issues.apache.org/jira/browse/SPARK-18502
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Barry Becker
>Priority: Minor
>  Labels: bulk-closed
>
> I know that if a column contains dots or hyphens we can put 
> backquotes/backticks around it, but what if the column contains a backtick 
> (`)? Can the back tick be escaped by some means?
> Here is an example of the sort of error I see
> {code}
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
> `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99)
>  
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109)
>  
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90)
>  org.apache.spark.sql.Column.(Column.scala:113) 
> org.apache.spark.sql.Column$.apply(Column.scala:36) 
> org.apache.spark.sql.functions$.min(functions.scala:407) 
> com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18502) Spark does not handle columns that contain backquote (`)

2020-02-20 Thread Gan Wei (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041602#comment-17041602
 ] 

Gan Wei edited comment on SPARK-18502 at 2/21/20 6:36 AM:
--

Is there a resolution for this issue?

I am also encountering the same issue when selecting a column name containing 
backtick "`".
{code:java}
df.select("a`b`").show(1)
{code}
got error msg:
{code:java}
org.apache.spark.sql.AnalysisException: syntax error in attribute name
{code}


was (Author: wgan008):
Is there a resolution for this issue. I am also encountering the same issue 
when selecting a column name containing backtick "`" .
{code:java}
df.select("a`b`").show(1)
{code}
got error msg:
{code:java}
org.apache.spark.sql.AnalysisException: syntax error in attribute name
{code}

> Spark does not handle columns that contain backquote (`)
> 
>
> Key: SPARK-18502
> URL: https://issues.apache.org/jira/browse/SPARK-18502
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Barry Becker
>Priority: Minor
>  Labels: bulk-closed
>
> I know that if a column contains dots or hyphens we can put 
> backquotes/backticks around it, but what if the column contains a backtick 
> (`)? Can the back tick be escaped by some means?
> Here is an example of the sort of error I see
> {code}
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
> `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99)
>  
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109)
>  
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90)
>  org.apache.spark.sql.Column.(Column.scala:113) 
> org.apache.spark.sql.Column$.apply(Column.scala:36) 
> org.apache.spark.sql.functions$.min(functions.scala:407) 
> com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30852) Use Long instead of Int as argument type in Dataset limit method

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30852.
-
Resolution: Won't Fix

> Use Long instead of Int as argument type in Dataset limit method
> 
>
> Key: SPARK-30852
> URL: https://issues.apache.org/jira/browse/SPARK-30852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Damianos Christophides
>Priority: Minor
>
> The Dataset limit method takes an input of type Int, which is a 32bit 
> integer. The numerical upper limit of this type is 2,147,483,647. I found in 
> my work to need to apply a limit to a Dataset higher than that which gives an 
> error:
> "py4j.Py4JException: Method limit([class java.lang.Long]) does not exist"
>  
> Could the input type of the limit method be changed to a Long (64bit)?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30894) The nullability of Size function should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30894:

Priority: Blocker  (was: Major)

> The nullability of Size function should not depend on SQLConf.get
> -
>
> Key: SPARK-30894
> URL: https://issues.apache.org/jira/browse/SPARK-30894
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30893) Expressions should not change its data type/behavior after it's created

2020-02-20 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041617#comment-17041617
 ] 

L. C. Hsieh commented on SPARK-30893:
-

hmm, an action like `df.collect` should re-evaluate if it is not cached, right? 
If so, changing a config causing the results different sounds reasonable too? 
Although I agree that it is error-prone in practice. We should avoid 
inconsistency among different SQL configs, for example some configs can change 
behaviors across `df.collect` calls but some are not. If there is a consistent 
rule about whether SQL configs can change behaviors across different calls of 
same action, it is better.

> Expressions should not change its data type/behavior after it's created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30894) The nullability of Size function should not depend on SQLConf.get

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30894:

Summary: The nullability of Size function should not depend on SQLConf.get  
(was: The behavior of Size function should not depend on SQLConf.get)

> The nullability of Size function should not depend on SQLConf.get
> -
>
> Key: SPARK-30894
> URL: https://issues.apache.org/jira/browse/SPARK-30894
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30893) Expressions should not change its data type/behavior after it's created

2020-02-20 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041618#comment-17041618
 ] 

Wenchen Fan commented on SPARK-30893:
-

For data type and nullability, I think we should fix before 3.0, as they can 
lead to data corruption.

For other behaviors, we should merge it to 3.1 with migration guide items.

> Expressions should not change its data type/behavior after it's created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30893) Expressions should not change its data type/behavior after it's created

2020-02-20 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041618#comment-17041618
 ] 

Wenchen Fan edited comment on SPARK-30893 at 2/21/20 7:14 AM:
--

For data type and nullability, I think we should fix before 3.0, as they can 
lead to data corruption.

For other behaviors, we can have more discussion and wait for 3.1


was (Author: cloud_fan):
For data type and nullability, I think we should fix before 3.0, as they can 
lead to data corruption.

For other behaviors, we should merge it to 3.1 with migration guide items.

> Expressions should not change its data type/behavior after it's created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30893) Expressions should not change its data type/behavior after it's created

2020-02-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30893:

Description: This is a problem because the configuration can change between 
different phases of planning, and this can silently break a query plan which 
can lead to crashes or data corruption, if data type/nullability gets changed.  
(was: This is a problem because the configuration can change between different 
phases of planning, and this can silently break a query plan which can lead to 
crashes or data corruption.)

> Expressions should not change its data type/behavior after it's created
> ---
>
> Key: SPARK-30893
> URL: https://issues.apache.org/jira/browse/SPARK-30893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Critical
>
> This is a problem because the configuration can change between different 
> phases of planning, and this can silently break a query plan which can lead 
> to crashes or data corruption, if data type/nullability gets changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org