date:20210617

[jira] [Assigned] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35813:


Assignee: Apache Spark

> Add new adaptive config into sql-performance-tuning docs
> 
>
> Key: SPARK-35813
> URL: https://issues.apache.org/jira/browse/SPARK-35813
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and 
> `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at 
> sql-performance-tuning docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365252#comment-17365252
 ] 

Apache Spark commented on SPARK-35813:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/32960

> Add new adaptive config into sql-performance-tuning docs
> 
>
> Key: SPARK-35813
> URL: https://issues.apache.org/jira/browse/SPARK-35813
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and 
> `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at 
> sql-performance-tuning docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35813:


Assignee: (was: Apache Spark)

> Add new adaptive config into sql-performance-tuning docs
> 
>
> Key: SPARK-35813
> URL: https://issues.apache.org/jira/browse/SPARK-35813
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and 
> `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at 
> sql-performance-tuning docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365251#comment-17365251
 ] 

Apache Spark commented on SPARK-35813:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/32960

> Add new adaptive config into sql-performance-tuning docs
> 
>
> Key: SPARK-35813
> URL: https://issues.apache.org/jira/browse/SPARK-35813
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and 
> `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at 
> sql-performance-tuning docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs

2021-06-17 Thread XiDuo You (Jira)

XiDuo You created SPARK-35813:
-

 Summary: Add new adaptive config into sql-performance-tuning docs
 Key: SPARK-35813
 URL: https://issues.apache.org/jira/browse/SPARK-35813
 Project: Spark
  Issue Type: Improvement
  Components: docs
Affects Versions: 3.2.0
Reporter: XiDuo You


Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and 
`spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at 
sql-performance-tuning docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35812) Throw an error if `version` and `timestamp` are used together in DataFrame.to_delta.

2021-06-17 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-35812:
---

 Summary: Throw an error if `version` and `timestamp` are used 
together in DataFrame.to_delta.
 Key: SPARK-35812
 URL: https://issues.apache.org/jira/browse/SPARK-35812
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


[read_delta 
|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.read_delta.html#databricks.koalas.read_delta]has
 an arguments named `version` and `timestamp`, but it cannot be used together.

We should raise the proper error message when they are used together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35811) Deprecate DataFrame.to_spark_io

2021-06-17 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-35811:
---

 Summary: Deprecate DataFrame.to_spark_io
 Key: SPARK-35811
 URL: https://issues.apache.org/jira/browse/SPARK-35811
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


We should deprecate the 
[DataFrame.to_spark_io|https://docs.google.com/document/d/1RxvQJVf736Vg9XU7uiCaRlNl-P7GdmFGa6U3Ab78JJk/edit#heading=h.todz8y4xdqrx]
 since it's duplicated with 
[DataFrame.spark.to_spark_io|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.spark.to_spark_io.html],
 and it's not existed in pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35810) Remove ps.broadcast API

2021-06-17 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-35810:
---

 Summary: Remove ps.broadcast API
 Key: SPARK-35810
 URL: https://issues.apache.org/jira/browse/SPARK-35810
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


We have 
[ps.broadcast|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.broadcast.html]
 in pandas API on Spark, but it's duplicated with 
[DataFrame.spark.hint|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.spark.hint.html]
 when using this API with "broadcast".

So, we'd better remove this and 
[broadcast|http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.broadcast.html]
 function in PySpark as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35809) Add `index_col` argument for ps.sql.

2021-06-17 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-35809:
---

 Summary: Add `index_col` argument for ps.sql.
 Key: SPARK-35809
 URL: https://issues.apache.org/jira/browse/SPARK-35809
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


The current behavior of [ps.sql 
|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.sql.html]always
 lost the index, so we should add the `indxe_col` arguments for this API so 
that we can preserve the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35808) Always enable the `pandas_metadata` in DataFrame.parquet

2021-06-17 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-35808:
---

 Summary: Always enable the `pandas_metadata` in DataFrame.parquet
 Key: SPARK-35808
 URL: https://issues.apache.org/jira/browse/SPARK-35808
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


We have a argument named `pandas_metadata` in 
[ps.read_parquet|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.read_parquet.html],
 but seems we can just always enable so that it always respect the pandas 
metadata when reading the Parquet file written by pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35807) Rename the `num_files` argument

2021-06-17 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-35807:

Description: 
We should rename the num_files argument in [DataFrame.to_csv 
|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]and
 
[DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html].

Because the behavior of num_files is not actually specify the number of files, 
but it specifies the number of partition.

Or we just can remove, and use the 
+[DataFrame.spark.repartition|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.spark.repartition.html]+
 as a work around.

  was:
We should rename the num_files argument in [DataFrame.to_csv 
|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]and
 
[DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html].

Because the behavior of num_files is not actually specify the number of files, 
but it specifies the number of partition.


> Rename the `num_files` argument
> ---
>
> Key: SPARK-35807
> URL: https://issues.apache.org/jira/browse/SPARK-35807
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should rename the num_files argument in [DataFrame.to_csv 
> |https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]and
>  
> [DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html].
> Because the behavior of num_files is not actually specify the number of 
> files, but it specifies the number of partition.
> Or we just can remove, and use the 
> +[DataFrame.spark.repartition|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.spark.repartition.html]+
>  as a work around.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35807) Rename the `num_files` argument

2021-06-17 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-35807:
---

 Summary: Rename the `num_files` argument
 Key: SPARK-35807
 URL: https://issues.apache.org/jira/browse/SPARK-35807
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


We should rename the num_files argument in [DataFrame.to_csv 
|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]and
 
[DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html].

Because the behavior of num_files is not actually specify the number of files, 
but it specifies the number of partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35806) Rename the `mode` argument to avoid confusion with `mode` argument in pandas

2021-06-17 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-35806:
---

 Summary: Rename the `mode` argument to avoid confusion with `mode` 
argument in pandas
 Key: SPARK-35806
 URL: https://issues.apache.org/jira/browse/SPARK-35806
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


pandas on Spark has a argument named `mode` in the APIs below:
 * 
[DataFrame.to_csv|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]
 * 
[DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html]
 * 
[DataFrame.to_table|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_table.html]
 * 
[DataFrame.to_delta|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_delta.html]
 * 
[DataFrame.to_parquet|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_parquet.html]
 * 
[DataFrame.to_orc|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_orc.html]
 * 
[DataFrame.to_spark_io|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_spark_io.html]

And pandas has same argument, but the usage is different.

So we should rename the argument to avoid confusion with pandas'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35747) Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are not running in a secure cluster

2021-06-17 Thread Vinod KC (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-35747:
-
Description: 
In a secure Yarn cluster, even though HBase or Kafka, or Hive services are not 
used in the user application, yarn client unnecessarily trying to generate  
Delegations token from these services. This will add additional delays while 
submitting spark application in a yarn cluster

 Also during HBase delegation, token generation step in the application submit 
stage,  HBaseDelegationTokenProvider prints a full Exception Stack trace and it 
causes a noisy warning. 
{code:java}
WARN security.HBaseDelegationTokenProvider: Failed to get token from service 
hbase
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT
okenProvider.scala:93)

more than 100+ line exception stack trace{code}
Hence, if these services are not used in the user Application, it is better add 
WARN message to disable Delegation Token generation for those services. 

ie, spark.security.credentials.hbase.enabled=false , 

 spark.security.credentials.hive.enabled=false ,

 spark.security.credentials.kafka.enabled=false

  was:
In a secure Yarn cluster, even though HBase or Kafka, or Hive services are not 
used in the user application, yarn client unnecessarily trying to generate  
Delegations token from these services. This will add additional delays while 
submitting spark application in a yarn cluster

 

Also during HBase delegation, token generation step in the application submit 
stage,  HBaseDelegationTokenProvider prints a full Exception Stack trace and it 
causes a noisy warning. 
{code:java}
WARN security.HBaseDelegationTokenProvider: Failed to get token from service 
hbase
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT
okenProvider.scala:93)

more than 100+ line exception stack trace{code}
Hence, if these services are not used in the user Application, it is better add 
WARN message to disable Delegation Token generation for those services. 

ie, spark.security.credentials.hbase.enabled=false , 

 spark.security.credentials.hive.enabled=false ,

 spark.security.credentials.kafka.enabled=false


> Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are 
> not running in a secure cluster 
> 
>
> Key: SPARK-35747
> URL: https://issues.apache.org/jira/browse/SPARK-35747
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0, 3.1.2
>Reporter: Vinod KC
>Priority: Minor
>
> In a secure Yarn cluster, even though HBase or Kafka, or Hive services are 
> not used in the user application, yarn client unnecessarily trying to 
> generate  Delegations token from these services. This will add additional 
> delays while submitting spark application in a yarn cluster
>  Also during HBase delegation, token generation step in the application 
> submit stage,  HBaseDelegationTokenProvider prints a full Exception Stack 
> trace and it causes a noisy warning. 
> {code:java}
> WARN security.HBaseDelegationTokenProvider: Failed to get token from service 
> hbase
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT
> okenProvider.scala:93)
> more than 100+ line exception stack trace{code}
> Hence, if these services are not used in the user Application, it is better 
> add WARN message to disable Delegation Token generation for those services. 
> ie, spark.security.credentials.hbase.enable

[jira] [Assigned] (SPARK-35780) Support DATE/TIMESTAMP literals across the full range

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35780:


Assignee: (was: Apache Spark)

> Support DATE/TIMESTAMP literals across the full range
> -
>
> Key: SPARK-35780
> URL: https://issues.apache.org/jira/browse/SPARK-35780
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Priority: Major
>
> DATE/TIMESTAMP literals support years  to .
> However, internally we support a range that is much larger.
> I can add or subtract large intervals from a date/timestamp and the system 
> will happily process and display large negative and positive dates.
> Since we obviously cannot put this genie back into the bottle the only thing 
> we can do is allow matching DATE/TIMESTAMP literals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35780) Support DATE/TIMESTAMP literals across the full range

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35780:


Assignee: Apache Spark

> Support DATE/TIMESTAMP literals across the full range
> -
>
> Key: SPARK-35780
> URL: https://issues.apache.org/jira/browse/SPARK-35780
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Assignee: Apache Spark
>Priority: Major
>
> DATE/TIMESTAMP literals support years  to .
> However, internally we support a range that is much larger.
> I can add or subtract large intervals from a date/timestamp and the system 
> will happily process and display large negative and positive dates.
> Since we obviously cannot put this genie back into the bottle the only thing 
> we can do is allow matching DATE/TIMESTAMP literals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35805) Pandas API on Spark improvements

2021-06-17 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-35805:
---

 Summary: Pandas API on Spark improvements
 Key: SPARK-35805
 URL: https://issues.apache.org/jira/browse/SPARK-35805
 Project: Spark
  Issue Type: Umbrella
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


There are several things that need improvement in pandas on Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35780) Support DATE/TIMESTAMP literals across the full range

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365234#comment-17365234
 ] 

Apache Spark commented on SPARK-35780:
--

User 'linhongliu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/32959

> Support DATE/TIMESTAMP literals across the full range
> -
>
> Key: SPARK-35780
> URL: https://issues.apache.org/jira/browse/SPARK-35780
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Priority: Major
>
> DATE/TIMESTAMP literals support years  to .
> However, internally we support a range that is much larger.
> I can add or subtract large intervals from a date/timestamp and the system 
> will happily process and display large negative and positive dates.
> Since we obviously cannot put this genie back into the bottle the only thing 
> we can do is allow matching DATE/TIMESTAMP literals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-35065) Group exception messages in spark/sql (core)

2021-06-17 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-35065:
---
Comment: was deleted

(was: I'm working on.)

> Group exception messages in spark/sql (core)
> 
>
> Key: SPARK-35065
> URL: https://issues.apache.org/jira/browse/SPARK-35065
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> Group all errors in sql/core/src/main/scala/org/apache/spark/sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35747) Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are not running in a secure cluster

2021-06-17 Thread Vinod KC (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-35747:
-
Description: 
In a secure Yarn cluster, even though HBase or Kafka, or Hive services are not 
used in the user application, yarn client unnecessarily trying to generate  
Delegations token from these services. This will add additional delays while 
submitting spark application in a yarn cluster

 

Also during HBase delegation, token generation step in the application submit 
stage,  HBaseDelegationTokenProvider prints a full Exception Stack trace and it 
causes a noisy warning. 
{code:java}
WARN security.HBaseDelegationTokenProvider: Failed to get token from service 
hbase
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT
okenProvider.scala:93)

more than 100+ line exception stack trace{code}
Hence, if these services are not used in the user Application, it is better add 
WARN message to disable Delegation Token generation for those services. 

ie, spark.security.credentials.hbase.enabled=false , 

 spark.security.credentials.hive.enabled=false ,

 spark.security.credentials.kafka.enabled=false

  was:
In a secure Yarn cluster where HBase service is down, even if the spark 
application is not using HBase, during the application submit stage, 
HBaseDelegationTokenProvider prints full Exception Stack trace and it causes a 
noisy warning.

 
{code:java}
WARN security.HBaseDelegationTokenProvider: Failed to get token from service 
hbase
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT
okenProvider.scala:93)
more than 100 line exception stack trace{code}
Also, Application submission taking more time as 
`HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn` retries the 
connection to HBase master multiple times before it gives up. This slows down 
the application submission steps. Hence, if HBase is not used in the user 
Application, it is better to suggest user to disable HBase Delegation Token 
generation. 

ie, spark.security.credentials.hbase.enabled=false

 


> Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are 
> not running in a secure cluster 
> 
>
> Key: SPARK-35747
> URL: https://issues.apache.org/jira/browse/SPARK-35747
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0, 3.1.2
>Reporter: Vinod KC
>Priority: Minor
>
> In a secure Yarn cluster, even though HBase or Kafka, or Hive services are 
> not used in the user application, yarn client unnecessarily trying to 
> generate  Delegations token from these services. This will add additional 
> delays while submitting spark application in a yarn cluster
>  
> Also during HBase delegation, token generation step in the application submit 
> stage,  HBaseDelegationTokenProvider prints a full Exception Stack trace and 
> it causes a noisy warning. 
> {code:java}
> WARN security.HBaseDelegationTokenProvider: Failed to get token from service 
> hbase
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT
> okenProvider.scala:93)
> more than 100+ line exception stack trace{code}
> Hence, if these services are not used in the user Application, it is better 
> add WARN message to disable Delegation Token generation for those services. 
> ie, spark.security.credentials.hbase.enabled=false , 
>  spark.security.credentials.hive.enabled=false ,
>  spark.security.credentials.kafka.enabled=

[jira] [Commented] (SPARK-35065) Group exception messages in spark/sql (core)

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365233#comment-17365233
 ] 

Apache Spark commented on SPARK-35065:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/32958

> Group exception messages in spark/sql (core)
> 
>
> Key: SPARK-35065
> URL: https://issues.apache.org/jira/browse/SPARK-35065
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> Group all errors in sql/core/src/main/scala/org/apache/spark/sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35065) Group exception messages in spark/sql (core)

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365232#comment-17365232
 ] 

Apache Spark commented on SPARK-35065:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/32958

> Group exception messages in spark/sql (core)
> 
>
> Key: SPARK-35065
> URL: https://issues.apache.org/jira/browse/SPARK-35065
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> Group all errors in sql/core/src/main/scala/org/apache/spark/sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35065) Group exception messages in spark/sql (core)

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35065:


Assignee: Apache Spark

> Group exception messages in spark/sql (core)
> 
>
> Key: SPARK-35065
> URL: https://issues.apache.org/jira/browse/SPARK-35065
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Group all errors in sql/core/src/main/scala/org/apache/spark/sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35065) Group exception messages in spark/sql (core)

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35065:


Assignee: (was: Apache Spark)

> Group exception messages in spark/sql (core)
> 
>
> Key: SPARK-35065
> URL: https://issues.apache.org/jira/browse/SPARK-35065
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> Group all errors in sql/core/src/main/scala/org/apache/spark/sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35747) Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are not running in a secure cluster

2021-06-17 Thread Vinod KC (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-35747:
-
Summary: Avoid printing full Exception stack trace, if HBase/Kafka/Hive 
services are not running in a secure cluster   (was: Avoid printing full 
Exception stack trace, if HBase service is not running in a secure cluster )

> Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are 
> not running in a secure cluster 
> 
>
> Key: SPARK-35747
> URL: https://issues.apache.org/jira/browse/SPARK-35747
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0, 3.1.2
>Reporter: Vinod KC
>Priority: Minor
>
> In a secure Yarn cluster where HBase service is down, even if the spark 
> application is not using HBase, during the application submit stage, 
> HBaseDelegationTokenProvider prints full Exception Stack trace and it causes 
> a noisy warning.
>  
> {code:java}
> WARN security.HBaseDelegationTokenProvider: Failed to get token from service 
> hbase
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT
> okenProvider.scala:93)
> more than 100 line exception stack trace{code}
> Also, Application submission taking more time as 
> `HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn` retries 
> the connection to HBase master multiple times before it gives up. This slows 
> down the application submission steps. Hence, if HBase is not used in the 
> user Application, it is better to suggest user to disable HBase Delegation 
> Token generation. 
> ie, spark.security.credentials.hbase.enabled=false
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-06-17 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-30641:
-
Description: 
We had been refactoring linear models for a long time, and there still are some 
works in the future. After some discussions among [~huaxingao] [~srowen] 
[~weichenxu123] [~mengxr] [~podongfeng] , we decide to gather related works 
under a sub-project Matrix, it includes:
 # *Blockification (vectorization of vectors)*
 ** vectors are stacked into matrices, so that high-level BLAS can be used for 
better performance. (about ~3x faster on sparse datasets, up to ~18x faster on 
dense datasets, see SPARK-31783 for details).
 ** Since 3.1.1, LoR/SVC/LiR/AFT supports blockification, and we need to 
blockify KMeans in the future.
 # *Standardization (virutal centering)*
 ** Existing impl of standardization in linear models does NOT center the 
vectors by removing the means, for the purpose of keeping dataset _*sparsity*_. 
However, this will cause feature values with small var be scaled to large 
values, and underlying solver like LBFGS can not efficiently handle this case. 
see SPARK-34448 for details.
 ** If internal vectors are centered (like famous GLMNET), the convergence 
ratio will be better. In the case in SPARK-34448, the number of iteration to 
convergence will be reduced from 93 to 6. Moreover, the final solution is much 
more close to the one in GLMNET.
 ** Luckily, we find a new way to _*virtually*_ center the vectors without 
densifying the dataset. Good results had been observed in LoR, we will take it 
into account in other linear models.
 # _*Initialization (To be discussed)*_
 ** Initializing model coef with a given model, should be beneficial to: 1, 
convergence ratio (should reduce number of iterations); 2, model stability (may 
obtain a new solution more close to the previous one);
 # _*Early Stopping* *(To be discussed)*_
 ** we can compute the test error in the procedure (like tree models), and stop 
the training procedure if test error begin to increase;

 

  If you want to add other features in these models, please comment in the 
ticket.

  was:
We had been refactoring linear models for a long time, and there still are some 
works in the future. After some discuss among [~huaxingao] [~srowen] 
[~weichenxu123] [~mengxr] [~podongfeng] , we decide to gather related works 
under a sub-project Matrix, it includes:
 # *Blockification (vectorization of vectors)*
 ** vectors are stacked into matrices, so that high-level BLAS can be used for 
better performance. (about ~3x faster on sparse datasets, up to ~18x faster on 
dense datasets, see SPARK-31783 for details).
 ** Since 3.1.1, LoR/SVC/LiR/AFT supports blockification, and we need to 
blockify KMeans in the future.
 # *Standardization (virutal centering)*
 ** Existing impl of standardization in linear models does NOT center the 
vectors by removing the means, for the purpose of keeping dataset _*sparsity*_. 
However, this will cause feature values with small var be scaled to large 
values, and underlying solver like LBFGS can not efficiently handle this case. 
see SPARK-34448 for details.
 ** If internal vectors are centers (like other famous impl, i.e. 
GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in 
SPARK-34448, the number of iteration to convergence will be reduced from 93 to 
6. Moreover, the final solution is much more close to the one in GLMNET.
 ** Luckily, we find a new way to _*virtually*_ center the vectors without 
densifying the dataset. Good results had been observed in LoR, we will take it 
into account in other linear models.
 # _*Initialization (To be discussed)*_
 ** Initializing model coef with a given model, should be beneficial to: 1, 
convergence ratio (should reduce number of iterations); 2, model stability (may 
obtain a new solution more close to the previous one);
 # _*Early Stopping* *(To be discussed)*_
 ** we can compute the test error in the procedure (like tree models), and stop 
the training procedure if test error begin to increase;

 

  If you want to add other features in these models, please comment in the 
ticket.


> Project Matrix: Linear Models revisit and refactor
> --
>
> Key: SPARK-30641
> URL: https://issues.apache.org/jira/browse/SPARK-30641
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0, 3.2.0
>Reporter: zhengruifeng
>Priority: Major
>
> We had been refactoring linear models for a long time, and there still are 
> some works in the future. After some discussions among [~huaxingao] [~srowen] 
> [~weichenxu123] [~mengxr] [~podongfeng] , we decide to gather related works 
> under a sub-project Matrix, it includes:
>  # *Blockification (vectorization of vectors)*
>  ** ve

[jira] [Resolved] (SPARK-35303) Enable pinned thread mode by default

2021-06-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35303.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32429
[https://github.com/apache/spark/pull/32429]

> Enable pinned thread mode by default
> 
>
> Key: SPARK-35303
> URL: https://issues.apache.org/jira/browse/SPARK-35303
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.2.0
>
>
> Pinned thread mode was added at SPARK-22340. We should enable it back to map 
> Python thread to JVM thread in order to prevent potential issues such as 
> thread local inheritance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35303) Enable pinned thread mode by default

2021-06-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35303:


Assignee: Hyukjin Kwon

> Enable pinned thread mode by default
> 
>
> Key: SPARK-35303
> URL: https://issues.apache.org/jira/browse/SPARK-35303
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Pinned thread mode was added at SPARK-22340. We should enable it back to map 
> Python thread to JVM thread in order to prevent potential issues such as 
> thread local inheritance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35804) can't read external hive table on spark

2021-06-17 Thread cao zhiyu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cao zhiyu updated SPARK-35804:
--
Description: 
I create a external hive table with hdfs file which is formatted as json string.

I can read the data field of this hive table with the help of  
org.apache.hive.hcatalog.data.JsonSerDe which is packed in 
hive-hcatalog-core.jar in hive shell.

But when I try to use the spark (pyspark ,spark-shell or whatever) ,I just 
can't read it.

It gave me a error  Table: Unable to get field from serde: 
org.apache.hive.hcatalog.data.JsonSerDe

I've copied the jar (hive-hcatalog-core.jar) to $spark_home/jars and yarn libs 
and rerun ,there is no effect,even use --jars 
$jar_path/hive-hcatalog-core.jar.But  when I browse the webpage of spark task 
,I can actually find the jar in the env list.

 

 

  was:
I create a external hive table with hdfs file which is formatted as json string.

I can read the data field of this hive table with the help of  
org.apache.hive.hcatalog.data.JsonSerDe which is packed in 
hive-hcatalog-core.jar in hive shell.

But when I try to use the spark (pyspark ,spark-shell or whatever) ,I just 
can't read it.

It gave me a error  Table: Unable to get field from serde: 
org.apache.hive.hcatalog.data.JsonSerDe

I've copied the jar (hive-hcatalog-core.jar) to $spark_home/jars and yarn libs 
and rerun ,there is no effect.Even when I browse the webpage of spark task ,I 
can actually find the jar in the env list.

 

 


> can't read external hive table on spark
> ---
>
> Key: SPARK-35804
> URL: https://issues.apache.org/jira/browse/SPARK-35804
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core, Spark Shell
>Affects Versions: 2.3.2
> Environment: hdp 3.1.4 
> hive-hcatalog-core-3.1.0.3.1.4.0-315.jar & hive-hcatalog-core-3.1.2 both I've 
> tried
>  
>Reporter: cao zhiyu
>Priority: Critical
>  Labels: JSON, external-tables, hive, spark
>
> I create a external hive table with hdfs file which is formatted as json 
> string.
> I can read the data field of this hive table with the help of  
> org.apache.hive.hcatalog.data.JsonSerDe which is packed in 
> hive-hcatalog-core.jar in hive shell.
> But when I try to use the spark (pyspark ,spark-shell or whatever) ,I just 
> can't read it.
> It gave me a error  Table: Unable to get field from serde: 
> org.apache.hive.hcatalog.data.JsonSerDe
> I've copied the jar (hive-hcatalog-core.jar) to $spark_home/jars and yarn 
> libs and rerun ,there is no effect,even use --jars 
> $jar_path/hive-hcatalog-core.jar.But  when I browse the webpage of spark task 
> ,I can actually find the jar in the env list.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35804) can't read external hive table on spark

2021-06-17 Thread cao zhiyu (Jira)

cao zhiyu created SPARK-35804:
-

 Summary: can't read external hive table on spark
 Key: SPARK-35804
 URL: https://issues.apache.org/jira/browse/SPARK-35804
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Spark Core, Spark Shell
Affects Versions: 2.3.2
 Environment: hdp 3.1.4 

hive-hcatalog-core-3.1.0.3.1.4.0-315.jar & hive-hcatalog-core-3.1.2 both I've 
tried

 
Reporter: cao zhiyu


I create a external hive table with hdfs file which is formatted as json string.

I can read the data field of this hive table with the help of  
org.apache.hive.hcatalog.data.JsonSerDe which is packed in 
hive-hcatalog-core.jar in hive shell.

But when I try to use the spark (pyspark ,spark-shell or whatever) ,I just 
can't read it.

It gave me a error  Table: Unable to get field from serde: 
org.apache.hive.hcatalog.data.JsonSerDe

I've copied the jar (hive-hcatalog-core.jar) to $spark_home/jars and yarn libs 
and rerun ,there is no effect.Even when I browse the webpage of spark task ,I 
can actually find the jar in the env list.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365215#comment-17365215
 ] 

Apache Spark commented on SPARK-35472:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/32957

> Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
> ---
>
> Key: SPARK-35472
> URL: https://issues.apache.org/jira/browse/SPARK-35472
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35472:


Assignee: Apache Spark

> Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
> ---
>
> Key: SPARK-35472
> URL: https://issues.apache.org/jira/browse/SPARK-35472
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365214#comment-17365214
 ] 

Apache Spark commented on SPARK-35472:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/32957

> Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
> ---
>
> Key: SPARK-35472
> URL: https://issues.apache.org/jira/browse/SPARK-35472
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35472:


Assignee: (was: Apache Spark)

> Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
> ---
>
> Key: SPARK-35472
> URL: https://issues.apache.org/jira/browse/SPARK-35472
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35803) Spark SQL does not support creating views using DataSource v2 based data sources

2021-06-17 Thread David Rabinowitz (Jira)

David Rabinowitz created SPARK-35803:


 Summary: Spark SQL does not support creating views using 
DataSource v2 based data sources
 Key: SPARK-35803
 URL: https://issues.apache.org/jira/browse/SPARK-35803
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.2, 2.4.8
Reporter: David Rabinowitz


When a temporary view is created in Spark SQL using an external data source, 
Spark then tries to create the relevant relation using 
DataSource.resolveRelation() method. Unlike DataFrameReader.load(), 
resolveRelation() does not check if the provided DataSource implements the 
DataSourceV2 interface and instead tries to use the RelationProvider trait in 
order to generate the Relation.

Furthermore, DataSourceV2Relation is not a subclass of BaseRelation, so it 
cannot be used in resolveRelation().

Last, I tried to implement the RelationProvider trait in my Java implementation 
of DataSourceV2, but the match inside resolveRelation() did not detect it as 
RelationProvider.



 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35802) Error loading the stages/stage/ page in spark UI

2021-06-17 Thread Helt Long (Jira)

Helt Long created SPARK-35802:
-

 Summary: Error loading the stages/stage/ page in spark UI
 Key: SPARK-35802
 URL: https://issues.apache.org/jira/browse/SPARK-35802
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.1.2, 3.1.1, 3.0.1, 3.0.0
Reporter: Helt Long


I try to load the sparkUI page for a specific stage, I get the following error:
{quote}Unable to connect to the server. Looks like the Spark application must 
have ended. Please Switch to the history UI.
{quote}
Obviously the server is still alive and process new messages.

Looking at the network tab shows one of the requests fails:

 

{{curl 
'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable'





Error 500 Request failed.

HTTP ERROR 500
Problem accessing 
/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason:
Request failed.http://eclipse.org/jetty";>Powered by Jetty:// 9.4.z-SNAPSHOT


}}

requests to any other object that I've tested seem to work, for example

 

{{curl 
'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}}

 

The exception is:

{{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable
javax.servlet.ServletException: java.lang.NullPointerException
at 
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
at 
org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.sparkproject.jetty.server.Server.handle(Server.java:505)
at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at 
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at 
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
at 
org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175)
at 
org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:107)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:135)
at 
org.apache.spark.status.api.v1.BaseAppResour

[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses

2021-06-17 Thread Sarth Frey (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365194#comment-17365194
 ] 

Sarth Frey commented on SPARK-22674:


I can confirm this is affecting PySpark 3.1.1

> PySpark breaks serialization of namedtuple subclasses
> -
>
> Key: SPARK-22674
> URL: https://issues.apache.org/jira/browse/SPARK-22674
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0, 2.3.0, 3.1.1
>Reporter: Jonas Amrich
>Priority: Major
>
> Pyspark monkey patches the namedtuple class to make it serializable, however 
> this breaks serialization of its subclasses. With current implementation, any 
> subclass will be serialized (and deserialized) as it's parent namedtuple. 
> Consider this code, which will fail with {{AttributeError: 'Point' object has 
> no attribute 'sum'}}:
> {code}
> from collections import namedtuple
> Point = namedtuple("Point", "x y")
> class PointSubclass(Point):
> def sum(self):
> return self.x + self.y
> rdd = spark.sparkContext.parallelize([[PointSubclass(1, 1)]])
> rdd.collect()[0][0].sum()
> {code}
> Moreover, as PySpark hijacks all namedtuples in the main module, importing 
> pyspark breaks serialization of namedtuple subclasses even in code which is 
> not related to spark / distributed execution. I don't see any clean solution 
> to this; a possible workaround may be to limit serialization hack only to 
> direct namedtuple subclasses like in 
> https://github.com/JonasAmrich/spark/commit/f3efecee28243380ecf6657fe54e1a165c1b7204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses

2021-06-17 Thread Sarth Frey (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarth Frey updated SPARK-22674:
---
Affects Version/s: 3.1.1

> PySpark breaks serialization of namedtuple subclasses
> -
>
> Key: SPARK-22674
> URL: https://issues.apache.org/jira/browse/SPARK-22674
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0, 2.3.0, 3.1.1
>Reporter: Jonas Amrich
>Priority: Major
>
> Pyspark monkey patches the namedtuple class to make it serializable, however 
> this breaks serialization of its subclasses. With current implementation, any 
> subclass will be serialized (and deserialized) as it's parent namedtuple. 
> Consider this code, which will fail with {{AttributeError: 'Point' object has 
> no attribute 'sum'}}:
> {code}
> from collections import namedtuple
> Point = namedtuple("Point", "x y")
> class PointSubclass(Point):
> def sum(self):
> return self.x + self.y
> rdd = spark.sparkContext.parallelize([[PointSubclass(1, 1)]])
> rdd.collect()[0][0].sum()
> {code}
> Moreover, as PySpark hijacks all namedtuples in the main module, importing 
> pyspark breaks serialization of namedtuple subclasses even in code which is 
> not related to spark / distributed execution. I don't see any clean solution 
> to this; a possible workaround may be to limit serialization hack only to 
> direct namedtuple subclasses like in 
> https://github.com/JonasAmrich/spark/commit/f3efecee28243380ecf6657fe54e1a165c1b7204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35790) Spark Package Python Import does not work for namespace packages

2021-06-17 Thread Mark Hamilton (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamilton updated SPARK-35790:
--
Description: 
If one includes python files within several jars that comprise a python 
"namespace package"
[https://www.python.org/dev/peps/pep-0420/]

Then only one of packages is imported

  was:
If one includes python in a jar it will automatically be added to the classpath 
allowing for the distribution of java + python packages with a single jar

If one depends on a mixed jar the python is not properly loaded

Summary: Spark Package Python Import does not work for namespace 
packages  (was: Spark Package Python Import does not work for depenant jars)

> Spark Package Python Import does not work for namespace packages
> 
>
> Key: SPARK-35790
> URL: https://issues.apache.org/jira/browse/SPARK-35790
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark, Spark Submit
>Affects Versions: 3.0.0, 3.1.2
>Reporter: Mark Hamilton
>Priority: Major
>
> If one includes python files within several jars that comprise a python 
> "namespace package"
> [https://www.python.org/dev/peps/pep-0420/]
> Then only one of packages is imported



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35469) Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365138#comment-17365138
 ] 

Apache Spark commented on SPARK-35469:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/32956

> Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.
> -
>
> Key: SPARK-35469
> URL: https://issues.apache.org/jira/browse/SPARK-35469
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35469) Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35469:


Assignee: (was: Apache Spark)

> Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.
> -
>
> Key: SPARK-35469
> URL: https://issues.apache.org/jira/browse/SPARK-35469
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35469) Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35469:


Assignee: Apache Spark

> Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.
> -
>
> Key: SPARK-35469
> URL: https://issues.apache.org/jira/browse/SPARK-35469
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35801) SPIP: Support MERGE in Data Source V2

2021-06-17 Thread Anton Okolnychyi (Jira)

Anton Okolnychyi created SPARK-35801:


 Summary: SPIP: Support MERGE in Data Source V2
 Key: SPARK-35801
 URL: https://issues.apache.org/jira/browse/SPARK-35801
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Anton Okolnychyi


[MERGE INTO|https://en.wikipedia.org/wiki/Merge_(SQL)] is well suited to 
large-scale workloads because it can express operations to insert, update, or 
delete multiple rows in a single SQL command. Many updates can be expressed as 
MERGE INTO queries that would otherwise require much more SQL. Common patterns 
for updating partitions are to read, union, and overwrite or read, diff, and 
append. Using MERGE INTO, these operations are easier to express and can be 
more efficient to run.

Hive supports [MERGE 
INTO|https://blog.cloudera.com/update-hive-tables-easy-way/] and Spark should 
implement similar support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35095) Use ANSI intervals in streaming join tests

2021-06-17 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35095.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32953
[https://github.com/apache/spark/pull/32953]

> Use ANSI intervals in streaming join tests
> --
>
> Key: SPARK-35095
> URL: https://issues.apache.org/jira/browse/SPARK-35095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> Enable ANSI intervals in the tests:
> - StreamingOuterJoinSuite.right outer with watermark range condition
> - StreamingOuterJoinSuite.left outer with watermark range condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35095) Use ANSI intervals in streaming join tests

2021-06-17 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-35095:


Assignee: Kousuke Saruta

> Use ANSI intervals in streaming join tests
> --
>
> Key: SPARK-35095
> URL: https://issues.apache.org/jira/browse/SPARK-35095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
>
> Enable ANSI intervals in the tests:
> - StreamingOuterJoinSuite.right outer with watermark range condition
> - StreamingOuterJoinSuite.left outer with watermark range condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18107) Insert overwrite statement runs much slower in spark-sql than it does in hive-client

2021-06-17 Thread Hemanth (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365072#comment-17365072
 ] 

Hemanth commented on SPARK-18107:
-

We are seeing this issue still exists on Spark versions 2.3.1 and 2.4.7. 

> Insert overwrite statement runs much slower in spark-sql than it does in 
> hive-client
> 
>
> Key: SPARK-18107
> URL: https://issues.apache.org/jira/browse/SPARK-18107
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: spark 2.0.0
> hive 2.0.1
>Reporter: snodawn
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 2.1.0
>
>
> I find insert overwrite statement running in spark-sql or spark-shell spends 
> much more time than it does in  hive-client (i start it in 
> apache-hive-2.0.1-bin/bin/hive ), where spark costs about ten minutes but 
> hive-client just costs less than 20 seconds.
> These are the steps I took.
> Test sql is :
> insert overwrite table login4game partition(pt='mix_en',dt='2016-10-21')
> select distinct account_name,role_id,server,'1476979200' as recdate, 'mix' as 
> platform, 'mix' as pid, 'mix' as dev from tbllog_login  where pt='mix_en' and 
>  dt='2016-10-21' ;
> there are 257128 lines of data in tbllog_login with 
> partition(pt='mix_en',dt='2016-10-21')
> ps:
> I'm sure it must be "insert overwrite" costing a lot of time in spark, may be 
> when doing overwrite, it need to spend a lot of time in io or in something 
> else.
> I also compare the executing time between insert overwrite statement and 
> insert into statement.
> 1. insert overwrite statement and insert into statement in spark:
> insert overwrite statement costs about 10 minutes
> insert into statement costs about 30 seconds
> 2. insert into statement in spark and insert into statement in hive-client:
> spark costs about 30 seconds
> hive-client costs about 20 seconds
> the difference is little that we can ignore
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35670) Upgrade ZSTD-JNI to 1.5.0-1

2021-06-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35670.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32826
[https://github.com/apache/spark/pull/32826]

> Upgrade ZSTD-JNI to 1.5.0-1
> ---
>
> Key: SPARK-35670
> URL: https://issues.apache.org/jira/browse/SPARK-35670
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: apache.org
>Assignee: apache.org
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35670) Upgrade ZSTD-JNI to 1.5.0-1

2021-06-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35670:
-

Assignee: apache.org

> Upgrade ZSTD-JNI to 1.5.0-1
> ---
>
> Key: SPARK-35670
> URL: https://issues.apache.org/jira/browse/SPARK-35670
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: apache.org
>Assignee: apache.org
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35720) Support casting of String to timestamp without time zone type

2021-06-17 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-35720:
--

Assignee: Gengliang Wang

> Support casting of String to timestamp without time zone type
> -
>
> Key: SPARK-35720
> URL: https://issues.apache.org/jira/browse/SPARK-35720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Extend the Cast expression and support  in casting StringType 
> toTimestampWithoutTZType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35720) Support casting of String to timestamp without time zone type

2021-06-17 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35720.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32936
[https://github.com/apache/spark/pull/32936]

> Support casting of String to timestamp without time zone type
> -
>
> Key: SPARK-35720
> URL: https://issues.apache.org/jira/browse/SPARK-35720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Extend the Cast expression and support  in casting StringType 
> toTimestampWithoutTZType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35800) Improving testability of GroupState in streaming flatMapGroupsWithState

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35800:


Assignee: Apache Spark

> Improving testability of GroupState in streaming flatMapGroupsWithState
> ---
>
> Key: SPARK-35800
> URL: https://issues.apache.org/jira/browse/SPARK-35800
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Tathagata Das
>Assignee: Apache Spark
>Priority: Major
>
> GroupStateImpl is the internal implementation of the GroupState interface 
> which mean to be not exposed. Thus, it only has a private constructor. Such 
> access control does benefit encapsulation, however, this introduces 
> difficulties for unit tests and the users are calling the engine to construct 
> such GroupState instances in order to test their customized state transition 
> functions.
> The solution is to introduce new interfaces that allow users to create 
> instances of GroupState but also access internal values of what they have set 
> (for example, has to state been updated, or removed). This would allow them 
> to write unit tests of the state transition function with custom GroupState 
> objects and then verifying whether the state was updated in an expected way. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35800) Improving testability of GroupState in streaming flatMapGroupsWithState

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365056#comment-17365056
 ] 

Apache Spark commented on SPARK-35800:
--

User 'lizhangdatabricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/32938

> Improving testability of GroupState in streaming flatMapGroupsWithState
> ---
>
> Key: SPARK-35800
> URL: https://issues.apache.org/jira/browse/SPARK-35800
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Tathagata Das
>Priority: Major
>
> GroupStateImpl is the internal implementation of the GroupState interface 
> which mean to be not exposed. Thus, it only has a private constructor. Such 
> access control does benefit encapsulation, however, this introduces 
> difficulties for unit tests and the users are calling the engine to construct 
> such GroupState instances in order to test their customized state transition 
> functions.
> The solution is to introduce new interfaces that allow users to create 
> instances of GroupState but also access internal values of what they have set 
> (for example, has to state been updated, or removed). This would allow them 
> to write unit tests of the state transition function with custom GroupState 
> objects and then verifying whether the state was updated in an expected way. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35800) Improving testability of GroupState in streaming flatMapGroupsWithState

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35800:


Assignee: (was: Apache Spark)

> Improving testability of GroupState in streaming flatMapGroupsWithState
> ---
>
> Key: SPARK-35800
> URL: https://issues.apache.org/jira/browse/SPARK-35800
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Tathagata Das
>Priority: Major
>
> GroupStateImpl is the internal implementation of the GroupState interface 
> which mean to be not exposed. Thus, it only has a private constructor. Such 
> access control does benefit encapsulation, however, this introduces 
> difficulties for unit tests and the users are calling the engine to construct 
> such GroupState instances in order to test their customized state transition 
> functions.
> The solution is to introduce new interfaces that allow users to create 
> instances of GroupState but also access internal values of what they have set 
> (for example, has to state been updated, or removed). This would allow them 
> to write unit tests of the state transition function with custom GroupState 
> objects and then verifying whether the state was updated in an expected way. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35800) Improving testability of GroupState in streaming flatMapGroupsWithState

2021-06-17 Thread Tathagata Das (Jira)

Tathagata Das created SPARK-35800:
-

 Summary: Improving testability of GroupState in streaming 
flatMapGroupsWithState
 Key: SPARK-35800
 URL: https://issues.apache.org/jira/browse/SPARK-35800
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.1.2
Reporter: Tathagata Das


GroupStateImpl is the internal implementation of the GroupState interface which 
mean to be not exposed. Thus, it only has a private constructor. Such access 
control does benefit encapsulation, however, this introduces difficulties for 
unit tests and the users are calling the engine to construct such GroupState 
instances in order to test their customized state transition functions.

The solution is to introduce new interfaces that allow users to create 
instances of GroupState but also access internal values of what they have set 
(for example, has to state been updated, or removed). This would allow them to 
write unit tests of the state transition function with custom GroupState 
objects and then verifying whether the state was updated in an expected way. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35095) Use ANSI intervals in streaming join tests

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35095:


Assignee: Apache Spark

> Use ANSI intervals in streaming join tests
> --
>
> Key: SPARK-35095
> URL: https://issues.apache.org/jira/browse/SPARK-35095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Enable ANSI intervals in the tests:
> - StreamingOuterJoinSuite.right outer with watermark range condition
> - StreamingOuterJoinSuite.left outer with watermark range condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35095) Use ANSI intervals in streaming join tests

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365052#comment-17365052
 ] 

Apache Spark commented on SPARK-35095:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/32953

> Use ANSI intervals in streaming join tests
> --
>
> Key: SPARK-35095
> URL: https://issues.apache.org/jira/browse/SPARK-35095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Enable ANSI intervals in the tests:
> - StreamingOuterJoinSuite.right outer with watermark range condition
> - StreamingOuterJoinSuite.left outer with watermark range condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35095) Use ANSI intervals in streaming join tests

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35095:


Assignee: (was: Apache Spark)

> Use ANSI intervals in streaming join tests
> --
>
> Key: SPARK-35095
> URL: https://issues.apache.org/jira/browse/SPARK-35095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Enable ANSI intervals in the tests:
> - StreamingOuterJoinSuite.right outer with watermark range condition
> - StreamingOuterJoinSuite.left outer with watermark range condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35799:


Assignee: (was: Apache Spark)

> Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
> ---
>
> Key: SPARK-35799
> URL: https://issues.apache.org/jira/browse/SPARK-35799
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Venki Korukanti
>Priority: Minor
>
> Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the 
> operator {{FlatMapGroupsWithStateExec}}, but currently it just 
> [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121]
>  the iterator creation time. 
> Fix it to measure similar to how other stateful operators measure. Example 
> one 
> [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406].
>  This measurement is not perfect due to the nature of the lazy iterator and 
> also includes the time the consumer operator spent in processing the current 
> operator output, but it should give a good signal when comparing the metric 
> in one microbatch to the metric in another microbatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35799:


Assignee: Apache Spark

> Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
> ---
>
> Key: SPARK-35799
> URL: https://issues.apache.org/jira/browse/SPARK-35799
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Venki Korukanti
>Assignee: Apache Spark
>Priority: Minor
>
> Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the 
> operator {{FlatMapGroupsWithStateExec}}, but currently it just 
> [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121]
>  the iterator creation time. 
> Fix it to measure similar to how other stateful operators measure. Example 
> one 
> [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406].
>  This measurement is not perfect due to the nature of the lazy iterator and 
> also includes the time the consumer operator spent in processing the current 
> operator output, but it should give a good signal when comparing the metric 
> in one microbatch to the metric in another microbatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365049#comment-17365049
 ] 

Apache Spark commented on SPARK-35799:
--

User 'vkorukanti' has created a pull request for this issue:
https://github.com/apache/spark/pull/32952

> Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
> ---
>
> Key: SPARK-35799
> URL: https://issues.apache.org/jira/browse/SPARK-35799
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Venki Korukanti
>Priority: Minor
>
> Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the 
> operator {{FlatMapGroupsWithStateExec}}, but currently it just 
> [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121]
>  the iterator creation time. 
> Fix it to measure similar to how other stateful operators measure. Example 
> one 
> [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406].
>  This measurement is not perfect due to the nature of the lazy iterator and 
> also includes the time the consumer operator spent in processing the current 
> operator output, but it should give a good signal when comparing the metric 
> in one microbatch to the metric in another microbatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-17 Thread Venki Korukanti (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated SPARK-35799:

Description: 
Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the 
operator {{FlatMapGroupsWithStateExec}}, but currently it just 
[captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121]
 the iterator creation time. 

Fix it to measure similar to how other stateful operators measure. Example one 
[here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406].
 This measurement is not perfect due to the nature of the lazy iterator and 
also includes the time the consumer operator spent in processing the current 
operator output, but it should give a good signal when comparing the metric in 
one microbatch to the metric in another microbatch.

  was:
Metric `allUpdatesTimeMs` meant to capture the start to end walltime of the 
operator `FlatMapGroupsWithStateExec`, but currently it just 
[captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121]
 the iterator creation time. 

Fix it to measure similar to how other stateful operators measure. Example one 
[here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406].
 This measurement is not perfect due to the nature of the lazy iterator and 
also includes the time the consumer operator spent in processing the current 
operator output, but it should give a good signal when comparing the metric in 
one microbatch to the metric in another microbatch.


> Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
> ---
>
> Key: SPARK-35799
> URL: https://issues.apache.org/jira/browse/SPARK-35799
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Venki Korukanti
>Priority: Minor
>
> Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the 
> operator {{FlatMapGroupsWithStateExec}}, but currently it just 
> [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121]
>  the iterator creation time. 
> Fix it to measure similar to how other stateful operators measure. Example 
> one 
> [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406].
>  This measurement is not perfect due to the nature of the lazy iterator and 
> also includes the time the consumer operator spent in processing the current 
> operator output, but it should give a good signal when comparing the metric 
> in one microbatch to the metric in another microbatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-17 Thread Venki Korukanti (Jira)

Venki Korukanti created SPARK-35799:
---

 Summary: Fix the allUpdatesTimeMs metric measuring in 
FlatMapGroupsWithStateExec
 Key: SPARK-35799
 URL: https://issues.apache.org/jira/browse/SPARK-35799
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.1.2
Reporter: Venki Korukanti


Metric `allUpdatesTimeMs` meant to capture the start to end walltime of the 
operator `FlatMapGroupsWithStateExec`, but currently it just 
[captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121]
 the iterator creation time. 

Fix it to measure similar to how other stateful operators measure. Example one 
[here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406].
 This measurement is not perfect due to the nature of the lazy iterator and 
also includes the time the consumer operator spent in processing the current 
operator output, but it should give a good signal when comparing the metric in 
one microbatch to the metric in another microbatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34898) Send ExecutorMetricsUpdate EventLog appropriately

2021-06-17 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-34898.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31992
[https://github.com/apache/spark/pull/31992]

> Send ExecutorMetricsUpdate EventLog appropriately
> -
>
> Key: SPARK-34898
> URL: https://issues.apache.org/jira/browse/SPARK-34898
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> In current EventLoggingListener, we won't write 
> SparkListenerExecutorMetricsUpdate message at all
> {code:java}
> override def onExecutorMetricsUpdate(event: 
> SparkListenerExecutorMetricsUpdate): Unit = {
>   if (shouldLogStageExecutorMetrics) {
> event.executorUpdates.foreach { case (stageKey1, newPeaks) =>
>   liveStageExecutorMetrics.foreach { case (stageKey2, metricsPerExecutor) 
> =>
> // If the update came from the driver, stageKey1 will be the dummy 
> key (-1, -1),
> // so record those peaks for all active stages.
> // Otherwise, record the peaks for the matching stage.
> if (stageKey1 == DRIVER_STAGE_KEY || stageKey1 == stageKey2) {
>   val metrics = metricsPerExecutor.getOrElseUpdate(
> event.execId, new ExecutorMetrics())
>   metrics.compareAndUpdatePeakValues(newPeaks)
> }
>   }
> }
>   }
> }
> {code}
> It causes this effect that we can't get driver peakMemoryMetrics in SHS.  We 
> can get executor's since it will update with TaskEnd events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35789) Lateral join should only be used with subquery

2021-06-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35789.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32937
[https://github.com/apache/spark/pull/32937]

> Lateral join should only be used with subquery
> --
>
> Key: SPARK-35789
> URL: https://issues.apache.org/jira/browse/SPARK-35789
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> This is a follow up for SPARK-34382. Currently the keyword LATERAL can be 
> used in front of a `relationPrimary`, which consists of more than just 
> subqueries, for example:
> select * from t1, lateral t2
> Such syntax is not allowed in Postgres. LATERAL should only be used in front 
> of a subquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35789) Lateral join should only be used with subquery

2021-06-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35789:
---

Assignee: Allison Wang

> Lateral join should only be used with subquery
> --
>
> Key: SPARK-35789
> URL: https://issues.apache.org/jira/browse/SPARK-35789
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> This is a follow up for SPARK-34382. Currently the keyword LATERAL can be 
> used in front of a `relationPrimary`, which consists of more than just 
> subqueries, for example:
> select * from t1, lateral t2
> Such syntax is not allowed in Postgres. LATERAL should only be used in front 
> of a subquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35782) leveldbjni doesn't work in Apple Silicon on macOS

2021-06-17 Thread DB Tsai (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365014#comment-17365014
 ] 

DB Tsai commented on SPARK-35782:
-

[~yikunkero] It will only work for Apple Silicon on Linux but not macOS. For 
macOS, we need to recompile for this specific OS.

> leveldbjni doesn't work in Apple Silicon on macOS
> -
>
> Key: SPARK-35782
> URL: https://issues.apache.org/jira/browse/SPARK-35782
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: DB Tsai
>Priority: Major
>
> leveldbjni doesn't contain the native library for Apple Silicon on macOS. We 
> will need to build native library for Apple Silicon on macOS, and cut a new 
> release so Spark can use it.
> However, it is not maintained for a long time, and the last release was in 
> 2016. Per 
> [discussion|http://apache-spark-developers-list.1001551.n3.nabble.com/leveldbjni-dependency-td30146.html]
>  in spark dev mailing list, other platform also runs into the same support 
> issue. Perhaps, we should we consider racksdb as replacement.
> Note, here is the rocksdb task to support Apple Silicon, 
> https://github.com/facebook/rocksdb/issues/7720



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33603) Group exception messages in execution/command

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365011#comment-17365011
 ] 

Apache Spark commented on SPARK-33603:
--

User 'dgd-contributor' has created a pull request for this issue:
https://github.com/apache/spark/pull/32951

> Group exception messages in execution/command
> -
>
> Key: SPARK-33603
> URL: https://issues.apache.org/jira/browse/SPARK-33603
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/command'
> || Filename  ||   Count ||
> | AnalyzeColumnCommand.scala|   3 |
> | AnalyzePartitionCommand.scala |   2 |
> | AnalyzeTableCommand.scala |   1 |
> | SetCommand.scala  |   2 |
> | createDataSourceTables.scala  |   2 |
> | ddl.scala |   1 |
> | functions.scala   |   4 |
> | tables.scala  |   7 |
> | views.scala   |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33603) Group exception messages in execution/command

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365010#comment-17365010
 ] 

Apache Spark commented on SPARK-33603:
--

User 'dgd-contributor' has created a pull request for this issue:
https://github.com/apache/spark/pull/32951

> Group exception messages in execution/command
> -
>
> Key: SPARK-33603
> URL: https://issues.apache.org/jira/browse/SPARK-33603
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/command'
> || Filename  ||   Count ||
> | AnalyzeColumnCommand.scala|   3 |
> | AnalyzePartitionCommand.scala |   2 |
> | AnalyzeTableCommand.scala |   1 |
> | SetCommand.scala  |   2 |
> | createDataSourceTables.scala  |   2 |
> | ddl.scala |   1 |
> | functions.scala   |   4 |
> | tables.scala  |   7 |
> | views.scala   |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33603) Group exception messages in execution/command

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33603:


Assignee: (was: Apache Spark)

> Group exception messages in execution/command
> -
>
> Key: SPARK-33603
> URL: https://issues.apache.org/jira/browse/SPARK-33603
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/command'
> || Filename  ||   Count ||
> | AnalyzeColumnCommand.scala|   3 |
> | AnalyzePartitionCommand.scala |   2 |
> | AnalyzeTableCommand.scala |   1 |
> | SetCommand.scala  |   2 |
> | createDataSourceTables.scala  |   2 |
> | ddl.scala |   1 |
> | functions.scala   |   4 |
> | tables.scala  |   7 |
> | views.scala   |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33603) Group exception messages in execution/command

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33603:


Assignee: Apache Spark

> Group exception messages in execution/command
> -
>
> Key: SPARK-33603
> URL: https://issues.apache.org/jira/browse/SPARK-33603
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/command'
> || Filename  ||   Count ||
> | AnalyzeColumnCommand.scala|   3 |
> | AnalyzePartitionCommand.scala |   2 |
> | AnalyzeTableCommand.scala |   1 |
> | SetCommand.scala  |   2 |
> | createDataSourceTables.scala  |   2 |
> | ddl.scala |   1 |
> | functions.scala   |   4 |
> | tables.scala  |   7 |
> | views.scala   |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34054) BlockManagerDecommissioner cleanup

2021-06-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34054.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31102
[https://github.com/apache/spark/pull/31102]

> BlockManagerDecommissioner cleanup
> --
>
> Key: SPARK-34054
> URL: https://issues.apache.org/jira/browse/SPARK-34054
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0
>
>
> Code cleanup for BlockManagerDecommissioner to fix/improve some issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34054) BlockManagerDecommissioner cleanup

2021-06-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34054:
---

Assignee: wuyi

> BlockManagerDecommissioner cleanup
> --
>
> Key: SPARK-34054
> URL: https://issues.apache.org/jira/browse/SPARK-34054
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> Code cleanup for BlockManagerDecommissioner to fix/improve some issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35798) Fix SparkPlan.sqlContext usage

2021-06-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35798.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32947
[https://github.com/apache/spark/pull/32947]

> Fix SparkPlan.sqlContext usage
> --
>
> Key: SPARK-35798
> URL: https://issues.apache.org/jira/browse/SPARK-35798
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
> Fix For: 3.2.0
>
>
> There might be SparkPlan nodes where canonicalization on executor side can 
> cause issues. 
> Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35798) Fix SparkPlan.sqlContext usage

2021-06-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35798:
---

Assignee: Peter Toth

> Fix SparkPlan.sqlContext usage
> --
>
> Key: SPARK-35798
> URL: https://issues.apache.org/jira/browse/SPARK-35798
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
>
> There might be SparkPlan nodes where canonicalization on executor side can 
> cause issues. 
> Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35792) View should not capture configs used in `RelationConversions`

2021-06-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35792.
-
Fix Version/s: 3.1.3
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 32941
[https://github.com/apache/spark/pull/32941]

> View should not capture configs used in `RelationConversions`
> -
>
> Key: SPARK-35792
> URL: https://issues.apache.org/jira/browse/SPARK-35792
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Assignee: Linhong Liu
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> RelationConversions is actually a optimization rule while it's executed in 
> the analysis phase. For view, it's designed to only capture sementic configs, 
> so we should ignore the configs related to `RelationConversions`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35792) View should not capture configs used in `RelationConversions`

2021-06-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35792:
---

Assignee: Linhong Liu

> View should not capture configs used in `RelationConversions`
> -
>
> Key: SPARK-35792
> URL: https://issues.apache.org/jira/browse/SPARK-35792
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Assignee: Linhong Liu
>Priority: Major
>
> RelationConversions is actually a optimization rule while it's executed in 
> the analysis phase. For view, it's designed to only capture sementic configs, 
> so we should ignore the configs related to `RelationConversions`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35726) Truncate java.time.Duration by fields of day-time interval type

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35726:


Assignee: (was: Apache Spark)

> Truncate java.time.Duration by fields of day-time interval type
> ---
>
> Key: SPARK-35726
> URL: https://issues.apache.org/jira/browse/SPARK-35726
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Truncate input java.time.Duration instances using fields of 
> DayTimeIntervalType. For example, if DayTimeIntervalType has the end field 
> HOUR, granularity of DayTimeIntervalType values should hours too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35726) Truncate java.time.Duration by fields of day-time interval type

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35726:


Assignee: (was: Apache Spark)

> Truncate java.time.Duration by fields of day-time interval type
> ---
>
> Key: SPARK-35726
> URL: https://issues.apache.org/jira/browse/SPARK-35726
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Truncate input java.time.Duration instances using fields of 
> DayTimeIntervalType. For example, if DayTimeIntervalType has the end field 
> HOUR, granularity of DayTimeIntervalType values should hours too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35726) Truncate java.time.Duration by fields of day-time interval type

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364923#comment-17364923
 ] 

Apache Spark commented on SPARK-35726:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32950

> Truncate java.time.Duration by fields of day-time interval type
> ---
>
> Key: SPARK-35726
> URL: https://issues.apache.org/jira/browse/SPARK-35726
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Truncate input java.time.Duration instances using fields of 
> DayTimeIntervalType. For example, if DayTimeIntervalType has the end field 
> HOUR, granularity of DayTimeIntervalType values should hours too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35773) Parse year-month interval literals to tightest types

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364922#comment-17364922
 ] 

Apache Spark commented on SPARK-35773:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/32949

> Parse year-month interval literals to tightest types
> 
>
> Key: SPARK-35773
> URL: https://issues.apache.org/jira/browse/SPARK-35773
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> Modify AstBuilder.visitInterval to parse year-month interval literals to 
> tightest types. For example:
> INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but 
> not as YearMonthIntervalType(YEAR, MONTH).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35726) Truncate java.time.Duration by fields of day-time interval type

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35726:


Assignee: Apache Spark

> Truncate java.time.Duration by fields of day-time interval type
> ---
>
> Key: SPARK-35726
> URL: https://issues.apache.org/jira/browse/SPARK-35726
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Truncate input java.time.Duration instances using fields of 
> DayTimeIntervalType. For example, if DayTimeIntervalType has the end field 
> HOUR, granularity of DayTimeIntervalType values should hours too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35773) Parse year-month interval literals to tightest types

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364921#comment-17364921
 ] 

Apache Spark commented on SPARK-35773:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/32949

> Parse year-month interval literals to tightest types
> 
>
> Key: SPARK-35773
> URL: https://issues.apache.org/jira/browse/SPARK-35773
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> Modify AstBuilder.visitInterval to parse year-month interval literals to 
> tightest types. For example:
> INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but 
> not as YearMonthIntervalType(YEAR, MONTH).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35773) Parse year-month interval literals to tightest types

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35773:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Parse year-month interval literals to tightest types
> 
>
> Key: SPARK-35773
> URL: https://issues.apache.org/jira/browse/SPARK-35773
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> Modify AstBuilder.visitInterval to parse year-month interval literals to 
> tightest types. For example:
> INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but 
> not as YearMonthIntervalType(YEAR, MONTH).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35773) Parse year-month interval literals to tightest types

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364920#comment-17364920
 ] 

Apache Spark commented on SPARK-35773:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/32949

> Parse year-month interval literals to tightest types
> 
>
> Key: SPARK-35773
> URL: https://issues.apache.org/jira/browse/SPARK-35773
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> Modify AstBuilder.visitInterval to parse year-month interval literals to 
> tightest types. For example:
> INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but 
> not as YearMonthIntervalType(YEAR, MONTH).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35773) Parse year-month interval literals to tightest types

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35773:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Parse year-month interval literals to tightest types
> 
>
> Key: SPARK-35773
> URL: https://issues.apache.org/jira/browse/SPARK-35773
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> Modify AstBuilder.visitInterval to parse year-month interval literals to 
> tightest types. For example:
> INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but 
> not as YearMonthIntervalType(YEAR, MONTH).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364919#comment-17364919
 ] 

Apache Spark commented on SPARK-35749:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/32949

> Parse unit list interval literals as year-month/day-time interval types
> ---
>
> Key: SPARK-35749
> URL: https://issues.apache.org/jira/browse/SPARK-35749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Priority: Major
>
> Currently, unit list interval literals like `interval '1' year '2' months` or 
> `interval '1' day` or `interval '2' hours` are parsed as 
> `CalendarIntervalType`.
> Such fields should be parsed as `YearMonthIntervalType` or 
> `DayTimeIntervalType`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35749:


Assignee: Apache Spark

> Parse unit list interval literals as year-month/day-time interval types
> ---
>
> Key: SPARK-35749
> URL: https://issues.apache.org/jira/browse/SPARK-35749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> Currently, unit list interval literals like `interval '1' year '2' months` or 
> `interval '1' day` or `interval '2' hours` are parsed as 
> `CalendarIntervalType`.
> Such fields should be parsed as `YearMonthIntervalType` or 
> `DayTimeIntervalType`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35749:


Assignee: (was: Apache Spark)

> Parse unit list interval literals as year-month/day-time interval types
> ---
>
> Key: SPARK-35749
> URL: https://issues.apache.org/jira/browse/SPARK-35749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Priority: Major
>
> Currently, unit list interval literals like `interval '1' year '2' months` or 
> `interval '1' day` or `interval '2' hours` are parsed as 
> `CalendarIntervalType`.
> Such fields should be parsed as `YearMonthIntervalType` or 
> `DayTimeIntervalType`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364918#comment-17364918
 ] 

Apache Spark commented on SPARK-35749:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/32949

> Parse unit list interval literals as year-month/day-time interval types
> ---
>
> Key: SPARK-35749
> URL: https://issues.apache.org/jira/browse/SPARK-35749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Priority: Major
>
> Currently, unit list interval literals like `interval '1' year '2' months` or 
> `interval '1' day` or `interval '2' hours` are parsed as 
> `CalendarIntervalType`.
> Such fields should be parsed as `YearMonthIntervalType` or 
> `DayTimeIntervalType`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types

2021-06-17 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35749:
---
Description: 
Currently, unit list interval literals like `interval '1' year '2' months` or 
`interval '1' day` or `interval '2' hours` are parsed as `CalendarIntervalType`.
Such fields should be parsed as `YearMonthIntervalType` or 
`DayTimeIntervalType`.

  was:
Currently, single unit field interval literals like `interval '1' year '2' 
months` or `interval '1' day` or `interval '2' hours` are parsed as 
`CalendarIntervalType`.
Such fields should be parsed as `YearMonthIntervalType` or 
`DayTimeIntervalType`.


> Parse unit list interval literals as year-month/day-time interval types
> ---
>
> Key: SPARK-35749
> URL: https://issues.apache.org/jira/browse/SPARK-35749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Priority: Major
>
> Currently, unit list interval literals like `interval '1' year '2' months` or 
> `interval '1' day` or `interval '2' hours` are parsed as 
> `CalendarIntervalType`.
> Such fields should be parsed as `YearMonthIntervalType` or 
> `DayTimeIntervalType`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types

2021-06-17 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35749:
---
Summary: Parse unit list interval literals as year-month/day-time interval 
types  (was: Parse multiple unit fields interval literals as 
year-month/day-time interval types)

> Parse unit list interval literals as year-month/day-time interval types
> ---
>
> Key: SPARK-35749
> URL: https://issues.apache.org/jira/browse/SPARK-35749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Priority: Major
>
> Currently, single unit field interval literals like `interval '1' year '2' 
> months` or `interval '1' day` or `interval '2' hours` are parsed as 
> `CalendarIntervalType`.
> Such fields should be parsed as `YearMonthIntervalType` or 
> `DayTimeIntervalType`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364909#comment-17364909
 ] 

Apache Spark commented on SPARK-35796:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/32948

> UT `handles k8s cluster mode` fails on MacOs >= 10.15
> -
>
> Key: SPARK-35796
> URL: https://issues.apache.org/jira/browse/SPARK-35796
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.3
> Environment: MacOs 10.15.7
>Reporter: Yazhi Wang
>Priority: Minor
>
> When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for 
> `handles k8s cluster mode` test after pr 
> [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to 
> `File(path).getCanonicalFile().toURI()` function  with absolute path as 
> parameter will return path begin with /System/Volumes/Data.
> eg.  /home/testjars.jar will get 
> [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35798) Fix SparkPlan.sqlContext usage

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35798:


Assignee: (was: Apache Spark)

> Fix SparkPlan.sqlContext usage
> --
>
> Key: SPARK-35798
> URL: https://issues.apache.org/jira/browse/SPARK-35798
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Minor
>
> There might be SparkPlan nodes where canonicalization on executor side can 
> cause issues. 
> Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35798) Fix SparkPlan.sqlContext usage

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364903#comment-17364903
 ] 

Apache Spark commented on SPARK-35798:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/32947

> Fix SparkPlan.sqlContext usage
> --
>
> Key: SPARK-35798
> URL: https://issues.apache.org/jira/browse/SPARK-35798
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Minor
>
> There might be SparkPlan nodes where canonicalization on executor side can 
> cause issues. 
> Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35798) Fix SparkPlan.sqlContext usage

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35798:


Assignee: Apache Spark

> Fix SparkPlan.sqlContext usage
> --
>
> Key: SPARK-35798
> URL: https://issues.apache.org/jira/browse/SPARK-35798
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Assignee: Apache Spark
>Priority: Minor
>
> There might be SparkPlan nodes where canonicalization on executor side can 
> cause issues. 
> Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35798) Fix SparkPlan.sqlContext usage

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364902#comment-17364902
 ] 

Apache Spark commented on SPARK-35798:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/32947

> Fix SparkPlan.sqlContext usage
> --
>
> Key: SPARK-35798
> URL: https://issues.apache.org/jira/browse/SPARK-35798
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Minor
>
> There might be SparkPlan nodes where canonicalization on executor side can 
> cause issues. 
> Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35796:


Assignee: Apache Spark

> UT `handles k8s cluster mode` fails on MacOs >= 10.15
> -
>
> Key: SPARK-35796
> URL: https://issues.apache.org/jira/browse/SPARK-35796
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.3
> Environment: MacOs 10.15.7
>Reporter: Yazhi Wang
>Assignee: Apache Spark
>Priority: Minor
>
> When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for 
> `handles k8s cluster mode` test after pr 
> [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to 
> `File(path).getCanonicalFile().toURI()` function  with absolute path as 
> parameter will return path begin with /System/Volumes/Data.
> eg.  /home/testjars.jar will get 
> [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15

2021-06-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35796:


Assignee: (was: Apache Spark)

> UT `handles k8s cluster mode` fails on MacOs >= 10.15
> -
>
> Key: SPARK-35796
> URL: https://issues.apache.org/jira/browse/SPARK-35796
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.3
> Environment: MacOs 10.15.7
>Reporter: Yazhi Wang
>Priority: Minor
>
> When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for 
> `handles k8s cluster mode` test after pr 
> [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to 
> `File(path).getCanonicalFile().toURI()` function  with absolute path as 
> parameter will return path begin with /System/Volumes/Data.
> eg.  /home/testjars.jar will get 
> [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15

2021-06-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364899#comment-17364899
 ] 

Apache Spark commented on SPARK-35796:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/32946

> UT `handles k8s cluster mode` fails on MacOs >= 10.15
> -
>
> Key: SPARK-35796
> URL: https://issues.apache.org/jira/browse/SPARK-35796
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.3
> Environment: MacOs 10.15.7
>Reporter: Yazhi Wang
>Priority: Minor
>
> When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for 
> `handles k8s cluster mode` test after pr 
> [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to 
> `File(path).getCanonicalFile().toURI()` function  with absolute path as 
> parameter will return path begin with /System/Volumes/Data.
> eg.  /home/testjars.jar will get 
> [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35798) Fix SparkPlan.sqlContext usage

2021-06-17 Thread Peter Toth (Jira)

Peter Toth created SPARK-35798:
--

 Summary: Fix SparkPlan.sqlContext usage
 Key: SPARK-35798
 URL: https://issues.apache.org/jira/browse/SPARK-35798
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Peter Toth


There might be SparkPlan nodes where canonicalization on executor side can 
cause issues. 

Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 139 matches

Mail list logo