[jira] [Comment Edited] (SPARK-35000) Spark App in container will not exit when exception happen

2021-04-11 Thread KarlManong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319031#comment-17319031
 ] 

KarlManong edited comment on SPARK-35000 at 4/12/21, 5:36 AM:
--

Got Exception when 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager#createSchedulerBackend.
  
https://issues.apache.org/jira/browse/SPARK-34674 is reverted, so I haven't 
check it.


was (Author: karlmanong):
[#https://issues.apache.org/jira/browse/SPARK-35000?focusedCommentId=17317622=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17317622]
 
Got Exception when 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager#createSchedulerBackend.
 
https://issues.apache.org/jira/browse/SPARK-34674 is reverted, so I haven't 
check it.

> Spark App in container will not exit when exception happen
> --
>
> Key: SPARK-35000
> URL: https://issues.apache.org/jira/browse/SPARK-35000
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.1
>Reporter: KarlManong
>Priority: Major
> Attachments: 21664.stack, screenshot-1.png
>
>
> when submitting an app on k8s, someone can using an unauthed account to 
> submit, and exception happened. But the app will not exit until 5 minutes 
> later.
> the log:
>  !screenshot-1.png! 
> the trace:
>  [^21664.stack] 
> I have create a PR:  https://github.com/apache/spark/pull/32101. Maybe help



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35000) Spark App in container will not exit when exception happen

2021-04-11 Thread KarlManong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319031#comment-17319031
 ] 

KarlManong commented on SPARK-35000:


[#https://issues.apache.org/jira/browse/SPARK-35000?focusedCommentId=17317622=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17317622]
 
Got Exception when 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager#createSchedulerBackend.
 
https://issues.apache.org/jira/browse/SPARK-34674 is reverted, so I haven't 
check it.

> Spark App in container will not exit when exception happen
> --
>
> Key: SPARK-35000
> URL: https://issues.apache.org/jira/browse/SPARK-35000
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.1
>Reporter: KarlManong
>Priority: Major
> Attachments: 21664.stack, screenshot-1.png
>
>
> when submitting an app on k8s, someone can using an unauthed account to 
> submit, and exception happened. But the app will not exit until 5 minutes 
> later.
> the log:
>  !screenshot-1.png! 
> the trace:
>  [^21664.stack] 
> I have create a PR:  https://github.com/apache/spark/pull/32101. Maybe help



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35025) Move Parquet data source options from Python and Scala into a single page.

2021-04-11 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-35025:

Summary: Move Parquet data source options from Python and Scala into a 
single page.  (was: move Parquet data source options from Python and Scala into 
a single page.)

> Move Parquet data source options from Python and Scala into a single page.
> --
>
> Key: SPARK-35025
> URL: https://issues.apache.org/jira/browse/SPARK-35025
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Refer to https://issues.apache.org/jira/browse/SPARK-34491



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34494) Move other data source options than Parquet from Python and Scala into a single page.

2021-04-11 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-34494:

Summary: Move other data source options than Parquet from Python and Scala 
into a single page.  (was: Move other data source options from Python and Scala 
into a single page.)

> Move other data source options than Parquet from Python and Scala into a 
> single page.
> -
>
> Key: SPARK-34494
> URL: https://issues.apache.org/jira/browse/SPARK-34494
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Refer to https://issues.apache.org/jira/browse/SPARK-34491



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34494) Move other data source options from Python and Scala into a single page.

2021-04-11 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-34494:

Summary: Move other data source options from Python and Scala into a single 
page.  (was: Move data source options from Python and Scala into a single page.)

> Move other data source options from Python and Scala into a single page.
> 
>
> Key: SPARK-34494
> URL: https://issues.apache.org/jira/browse/SPARK-34494
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Refer to https://issues.apache.org/jira/browse/SPARK-34491



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35025) move Parquet data source options from Python and Scala into a single page.

2021-04-11 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-35025:
---

 Summary: move Parquet data source options from Python and Scala 
into a single page.
 Key: SPARK-35025
 URL: https://issues.apache.org/jira/browse/SPARK-35025
 Project: Spark
  Issue Type: Sub-task
  Components: docs
Affects Versions: 3.2.0
Reporter: Haejoon Lee


Refer to https://issues.apache.org/jira/browse/SPARK-34491



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34491) Move data source options docs for Python and Scala into a single page

2021-04-11 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-34491:

Description: 
Currently we documents options of various data sources for Python and Scala in 
a separated page.
 * Python 
[here|http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=registerjava#pyspark.sql.DataFrameReader]
 * Scala 
[here|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html]

However, we'd better to move this into a single page so that we can fix only 
one place when we need to manage the data source options. See 
[Avro|https://spark.apache.org/docs/latest/sql-data-sources-avro.html#apache-avro-data-source-guide]
 for example.

Also, we need to add missing "CSV Files" and "TEXT Files" page for [Data 
Sources 
documents|https://spark.apache.org/docs/latest/sql-data-sources.html#data-sources].

To clarify the task,
 # create "CSV Files" page for Data Source documents.
 # create "TEXT files" page for Data Source documents.
 # move Parquet data source options from Python and Scala into a single page.
 # move other data source options from Python and Scala into a single page.

  was:
Currently we documents options of various data sources for Python and Scala in 
a separated page.
 * Python 
[here|http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=registerjava#pyspark.sql.DataFrameReader]
 * Scala 
[here|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html]

However, we'd better to move this into a single page so that we can fix only 
one place when we need to manage the data source options. See 
[Avro|https://spark.apache.org/docs/latest/sql-data-sources-avro.html#apache-avro-data-source-guide]
 for example.

Also, we need to add missing "CSV Files" and "TEXT Files" page for [Data 
Sources 
documents|https://spark.apache.org/docs/latest/sql-data-sources.html#data-sources].

To clarify the task,
 # create "CSV Files" page for Data Source documents.
 # create "TEXT files" page for Data Source documents.
 # move data source options from Python and Scala into a single page.


> Move data source options docs for Python and Scala into a single page
> -
>
> Key: SPARK-34491
> URL: https://issues.apache.org/jira/browse/SPARK-34491
> Project: Spark
>  Issue Type: Documentation
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently we documents options of various data sources for Python and Scala 
> in a separated page.
>  * Python 
> [here|http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=registerjava#pyspark.sql.DataFrameReader]
>  * Scala 
> [here|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html]
> However, we'd better to move this into a single page so that we can fix only 
> one place when we need to manage the data source options. See 
> [Avro|https://spark.apache.org/docs/latest/sql-data-sources-avro.html#apache-avro-data-source-guide]
>  for example.
> Also, we need to add missing "CSV Files" and "TEXT Files" page for [Data 
> Sources 
> documents|https://spark.apache.org/docs/latest/sql-data-sources.html#data-sources].
> To clarify the task,
>  # create "CSV Files" page for Data Source documents.
>  # create "TEXT files" page for Data Source documents.
>  # move Parquet data source options from Python and Scala into a single page.
>  # move other data source options from Python and Scala into a single page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35002) Fix the java.net.BindException when testing with Github Action

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319018#comment-17319018
 ] 

Apache Spark commented on SPARK-35002:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/32126

> Fix the java.net.BindException when testing with Github Action
> --
>
> Key: SPARK-35002
> URL: https://issues.apache.org/jira/browse/SPARK-35002
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> {noformat}
> [info] org.apache.spark.sql.kafka010.producer.InternalKafkaProducerPoolSuite 
> *** ABORTED *** (282 milliseconds)
> [info]   java.net.BindException: Cannot assign requested address: Service 
> 'sparkDriver' failed after 100 retries (on a random free port)! Consider 
> explicitly setting the appropriate binding address for the service 
> 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the 
> correct binding address.
> [info]   at sun.nio.ch.Net.bind0(Native Method)
> [info]   at sun.nio.ch.Net.bind(Net.java:461)
> [info]   at sun.nio.ch.Net.bind(Net.java:453)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35002) Fix the java.net.BindException when testing with Github Action

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319017#comment-17319017
 ] 

Apache Spark commented on SPARK-35002:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/32126

> Fix the java.net.BindException when testing with Github Action
> --
>
> Key: SPARK-35002
> URL: https://issues.apache.org/jira/browse/SPARK-35002
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> {noformat}
> [info] org.apache.spark.sql.kafka010.producer.InternalKafkaProducerPoolSuite 
> *** ABORTED *** (282 milliseconds)
> [info]   java.net.BindException: Cannot assign requested address: Service 
> 'sparkDriver' failed after 100 retries (on a random free port)! Consider 
> explicitly setting the appropriate binding address for the service 
> 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the 
> correct binding address.
> [info]   at sun.nio.ch.Net.bind0(Native Method)
> [info]   at sun.nio.ch.Net.bind(Net.java:461)
> [info]   at sun.nio.ch.Net.bind(Net.java:453)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34916) Reduce tree traversals in transform/resolve function families

2021-04-11 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-34916.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32060
[https://github.com/apache/spark/pull/32060]

> Reduce tree traversals in transform/resolve function families
> -
>
> Key: SPARK-34916
> URL: https://issues.apache.org/jira/browse/SPARK-34916
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yingyi Bu
>Assignee: Yingyi Bu
>Priority: Major
> Fix For: 3.2.0
>
>
> Transform/resolve functions are called ~280k times per query on average for a 
> TPC-DS query, which are way more than necessary. We can reduce those calls 
> with early exit information and conditons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34916) Reduce tree traversals in transform/resolve function families

2021-04-11 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-34916:
--

Assignee: Yingyi Bu

> Reduce tree traversals in transform/resolve function families
> -
>
> Key: SPARK-34916
> URL: https://issues.apache.org/jira/browse/SPARK-34916
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yingyi Bu
>Assignee: Yingyi Bu
>Priority: Major
>
> Transform/resolve functions are called ~280k times per query on average for a 
> TPC-DS query, which are way more than necessary. We can reduce those calls 
> with early exit information and conditons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35017) Transfer ANSI intervals via Hive Thrift server

2021-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35017.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32121
[https://github.com/apache/spark/pull/32121]

> Transfer ANSI intervals via Hive Thrift server
> --
>
> Key: SPARK-35017
> URL: https://issues.apache.org/jira/browse/SPARK-35017
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, Hive Thrift server cannot recognize new ANSI interval types:
> {code:java}
>  $ ./sbin/start-thriftserver.sh
>  $ ./bin/beeline
> Beeline version 2.3.8 by Apache Hive
> beeline> !connect jdbc:hive2://localhost:1/default "" "" ""
> Connecting to jdbc:hive2://localhost:1/default
> Connected to: Spark SQL (version 3.2.0-SNAPSHOT)
> 0: jdbc:hive2://localhost:1/default> select timestamp'2021-01-01 
> 01:02:03.01' - date'2020-12-31';
> Error: java.lang.IllegalArgumentException: Unrecognized type name: day-time 
> interval (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35023:


Assignee: Denis Pyshev

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Assignee: Denis Pyshev
>Priority: Major
>
> Related to SPARK-34959
> SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
>  is recommended.
>  See 
> [https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35023.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32115
[https://github.com/apache/spark/pull/32115]

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Assignee: Denis Pyshev
>Priority: Major
> Fix For: 3.2.0
>
>
> Related to SPARK-34959
> SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
>  is recommended.
>  See 
> [https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34986) Aggregate ordinal should judge is it contains agg function

2021-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34986.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32089
[https://github.com/apache/spark/pull/32089]

> Aggregate ordinal should judge is it contains agg function
> --
>
> Key: SPARK-34986
> URL: https://issues.apache.org/jira/browse/SPARK-34986
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Aggregate ordinal should judge is it contains agg function.
> before this change it's 
> ```
> -- !query
> select a, b, sum(b) from data group by 3
> -- !query schema
> struct<>
> -- !query output
> org.apache.spark.sql.AnalysisException
> aggregate functions are not allowed in GROUP BY, but found sum(data.b)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34986) Aggregate ordinal should judge is it contains agg function

2021-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-34986:


Assignee: angerszhu

> Aggregate ordinal should judge is it contains agg function
> --
>
> Key: SPARK-34986
> URL: https://issues.apache.org/jira/browse/SPARK-34986
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Aggregate ordinal should judge is it contains agg function.
> before this change it's 
> ```
> -- !query
> select a, b, sum(b) from data group by 3
> -- !query schema
> struct<>
> -- !query output
> org.apache.spark.sql.AnalysisException
> aggregate functions are not allowed in GROUP BY, but found sum(data.b)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34983) Renaming the package alias from pp to ps

2021-04-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34983.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32108
[https://github.com/apache/spark/pull/32108]

> Renaming the package alias from pp to ps
> 
>
> Key: SPARK-34983
> URL: https://issues.apache.org/jira/browse/SPARK-34983
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.2.0
>
>
> Since the package alias for `pyspark.pandas` is fixed to `ps`, we should 
> renaming it from whole Koalas source codes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34630) Add type hints of pyspark.__version__ and pyspark.sql.Column.contains

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318975#comment-17318975
 ] 

Apache Spark commented on SPARK-34630:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/32125

> Add type hints of pyspark.__version__ and pyspark.sql.Column.contains
> -
>
> Key: SPARK-34630
> URL: https://issues.apache.org/jira/browse/SPARK-34630
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.2
>Reporter: Hyukjin Kwon
>Assignee: Danny Meijer
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> pyspark.__version__ and pyspark.sql.Column.contains are missing in python 
> type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35024) Refactor LinearSVC - support virtual centering

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318961#comment-17318961
 ] 

Apache Spark commented on SPARK-35024:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/32124

> Refactor LinearSVC - support virtual centering
> --
>
> Key: SPARK-35024
> URL: https://issues.apache.org/jira/browse/SPARK-35024
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35024) Refactor LinearSVC - support virtual centering

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318960#comment-17318960
 ] 

Apache Spark commented on SPARK-35024:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/32124

> Refactor LinearSVC - support virtual centering
> --
>
> Key: SPARK-35024
> URL: https://issues.apache.org/jira/browse/SPARK-35024
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35024) Refactor LinearSVC - support virtual centering

2021-04-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35024:


Assignee: (was: Apache Spark)

> Refactor LinearSVC - support virtual centering
> --
>
> Key: SPARK-35024
> URL: https://issues.apache.org/jira/browse/SPARK-35024
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35024) Refactor LinearSVC - support virtual centering

2021-04-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35024:


Assignee: Apache Spark

> Refactor LinearSVC - support virtual centering
> --
>
> Key: SPARK-35024
> URL: https://issues.apache.org/jira/browse/SPARK-35024
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34494) Move data source options from Python and Scala into a single page.

2021-04-11 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318959#comment-17318959
 ] 

Haejoon Lee commented on SPARK-34494:
-

I'm working on this

> Move data source options from Python and Scala into a single page.
> --
>
> Key: SPARK-34494
> URL: https://issues.apache.org/jira/browse/SPARK-34494
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Refer to https://issues.apache.org/jira/browse/SPARK-34491



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35024) Refactor LinearSVC - support virtual centering

2021-04-11 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-35024:


 Summary: Refactor LinearSVC - support virtual centering
 Key: SPARK-35024
 URL: https://issues.apache.org/jira/browse/SPARK-35024
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 3.2.0
Reporter: zhengruifeng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34087) a memory leak occurs when we clone the spark session

2021-04-11 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-34087:

Attachment: screenshot-1.png

> a memory leak occurs when we clone the spark session
> 
>
> Key: SPARK-34087
> URL: https://issues.apache.org/jira/browse/SPARK-34087
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Fu Chen
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: 1610451044690.jpg
>
>
> In Spark-3.0.1, the memory leak occurs when we keep cloning the spark session 
> because a new ExecutionListenerBus instance will add to AsyncEventQueue when 
> we clone a new session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34087) a memory leak occurs when we clone the spark session

2021-04-11 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-34087:

Attachment: (was: screenshot-1.png)

> a memory leak occurs when we clone the spark session
> 
>
> Key: SPARK-34087
> URL: https://issues.apache.org/jira/browse/SPARK-34087
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Fu Chen
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: 1610451044690.jpg
>
>
> In Spark-3.0.1, the memory leak occurs when we keep cloning the spark session 
> because a new ExecutionListenerBus instance will add to AsyncEventQueue when 
> we clone a new session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34883) Setting CSV reader option "multiLine" to "true" causes URISyntaxException when colon is in file path

2021-04-11 Thread Vikas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318949#comment-17318949
 ] 

Vikas commented on SPARK-34883:
---

I was able to use multiLine option without any issue .
it seems  cluster/infra environment  issue nothing related to spark or spark 
csv package. 

 

 __
 / __/__ ___ _/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /__ / .__/\_,_/_/ /_/\_\ version 3.0.0
 /_/

>>> inputFile="/Users/home_dir/Downloads/pageviews-by-second-tsv"
>>> tempDF = (spark.read.option("sep", "\t").option("multiLine", 
>>> "True").option("quote", "\"").option("escape", "\"").csv(inputFile))
>>> tempDF.show()
+---+--++
| _c0| _c1| _c2|
+---+--++
|2015-03-16T00:09:55|mobile|1595|
+---+--++

> Setting CSV reader option "multiLine" to "true" causes URISyntaxException 
> when colon is in file path
> 
>
> Key: SPARK-34883
> URL: https://issues.apache.org/jira/browse/SPARK-34883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Brady Tello
>Priority: Major
>
> Setting the CSV reader's "multiLine" option to "True" throws the following 
> exception when a ':' character is in the file path.
>  
> {code:java}
> java.net.URISyntaxException: Relative path in absolute URI: test:dir
> {code}
> I've tested this in both Spark 3.0.0 and Spark 3.1.1 and I get the same error 
> whether I use Scala, Python, or SQL.
> The following code works fine:
>  
> {code:java}
> csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" 
> tempDF = (spark.read.option("sep", "\t").csv(csvFile)
> {code}
> While the following code fails:
>  
> {code:java}
> csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv"
> tempDF = (spark.read.option("sep", "\t").option("multiLine", 
> "True").csv(csvFile)
> {code}
> Full Stack Trace from Python:
>  
> {code:java}
> --- 
> IllegalArgumentException Traceback (most recent call last)  
> in  
> 3 csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" 
> 4 
> > 5  tempDF = (spark.read.option("sep", "\t").option("multiLine", "True") 
> /databricks/spark/python/pyspark/sql/readwriter.py in csv(self, path, schema, 
> sep, encoding, quote, escape, comment, header, inferSchema, 
> ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, nullValue, nanValue, 
> positiveInf, negativeInf, dateFormat, timestampFormat, maxColumns, 
> maxCharsPerColumn, maxMalformedLogPerPartition, mode, 
> columnNameOfCorruptRecord, multiLine, charToEscapeQuoteEscaping, 
> samplingRatio, enforceSchema, emptyValue, locale, lineSep, pathGlobFilter, 
> recursiveFileLookup, modifiedBefore, modifiedAfter, unescapedQuoteHandling) 
> 735 path = [path] 
> 736 if type(path) == list: 
> --> 737 return 
> self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) 
> 738 elif isinstance(path, RDD): 
> 739 def func(iterator): 
> /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in 
> __call__(self, *args) 
> 1302 
> 1303 answer = self.gateway_client.send_command(command) 
> -> 1304 return_value = get_return_value( 
> 1305 answer, self.gateway_client, self.target_id, self.name) 
> 1306 
> /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 
> 114 # Hide where the exception came from that shows a non-Pythonic 
> 115 # JVM exception message. 
> --> 116 raise converted from None 
> 117 else: 
> 118 raise IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: test:dir
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34562) Leverage parquet bloom filters

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318822#comment-17318822
 ] 

Apache Spark commented on SPARK-34562:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/32123

> Leverage parquet bloom filters
> --
>
> Key: SPARK-34562
> URL: https://issues.apache.org/jira/browse/SPARK-34562
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
> PARQUET-41.
> From searching the issues, it seems there is no current tracker for this, 
> though I found a 
> [comment|https://issues.apache.org/jira/browse/SPARK-20901?focusedCommentId=17052473=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17052473]
>  from [~dongjoon] that points out the missing parquet support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34562) Leverage parquet bloom filters

2021-04-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34562:


Assignee: (was: Apache Spark)

> Leverage parquet bloom filters
> --
>
> Key: SPARK-34562
> URL: https://issues.apache.org/jira/browse/SPARK-34562
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
> PARQUET-41.
> From searching the issues, it seems there is no current tracker for this, 
> though I found a 
> [comment|https://issues.apache.org/jira/browse/SPARK-20901?focusedCommentId=17052473=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17052473]
>  from [~dongjoon] that points out the missing parquet support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34562) Leverage parquet bloom filters

2021-04-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34562:


Assignee: Apache Spark

> Leverage parquet bloom filters
> --
>
> Key: SPARK-34562
> URL: https://issues.apache.org/jira/browse/SPARK-34562
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Assignee: Apache Spark
>Priority: Major
>
> The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
> PARQUET-41.
> From searching the issues, it seems there is no current tracker for this, 
> though I found a 
> [comment|https://issues.apache.org/jira/browse/SPARK-20901?focusedCommentId=17052473=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17052473]
>  from [~dongjoon] that points out the missing parquet support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34562) Leverage parquet bloom filters

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318821#comment-17318821
 ] 

Apache Spark commented on SPARK-34562:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/32123

> Leverage parquet bloom filters
> --
>
> Key: SPARK-34562
> URL: https://issues.apache.org/jira/browse/SPARK-34562
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
> PARQUET-41.
> From searching the issues, it seems there is no current tracker for this, 
> though I found a 
> [comment|https://issues.apache.org/jira/browse/SPARK-20901?focusedCommentId=17052473=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17052473]
>  from [~dongjoon] that points out the missing parquet support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35023:


Assignee: Apache Spark

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35023:


Assignee: (was: Apache Spark)

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Priority: Major
>
> Related to SPARK-34959
> SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
>  is recommended.
>  See 
> [https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318730#comment-17318730
 ] 

Apache Spark commented on SPARK-35023:
--

User 'gemelen' has created a pull request for this issue:
https://github.com/apache/spark/pull/32115

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Priority: Major
>
> Related to SPARK-34959
> SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
>  is recommended.
>  See 
> [https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Denis Pyshev (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Pyshev updated SPARK-35023:
-
Comment: was deleted

(was: https://github.com/apache/spark/pull/32115)

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Priority: Major
>
> Related to SPARK-34959
> SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
>  is recommended.
>  See 
> [https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318728#comment-17318728
 ] 

Apache Spark commented on SPARK-35023:
--

User 'gemelen' has created a pull request for this issue:
https://github.com/apache/spark/pull/32115

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Denis Pyshev (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Pyshev updated SPARK-35023:
-
Description: 
Related to SPARK-34959

SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
 is recommended.
 See 
[https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Assignee: Apache Spark
>Priority: Major
>
> Related to SPARK-34959
> SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
>  is recommended.
>  See 
> [https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Denis Pyshev (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Pyshev updated SPARK-35023:
-
Environment: (was: 
[|https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax])

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Denis Pyshev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Denis Pyshev (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Pyshev updated SPARK-35023:
-
Environment: 
[|https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]
  (was: Related to SPARK-34959

SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
is recommended.
See 
[https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax])

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
> Environment: 
> [|https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]
>Reporter: Denis Pyshev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Denis Pyshev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318727#comment-17318727
 ] 

Denis Pyshev commented on SPARK-35023:
--

https://github.com/apache/spark/pull/32115

> Remove deprecated syntex in SBT build file
> --
>
> Key: SPARK-35023
> URL: https://issues.apache.org/jira/browse/SPARK-35023
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
> Environment: Related to SPARK-34959
> SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
> is recommended.
> See 
> [https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]
>Reporter: Denis Pyshev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35023) Remove deprecated syntex in SBT build file

2021-04-11 Thread Denis Pyshev (Jira)
Denis Pyshev created SPARK-35023:


 Summary: Remove deprecated syntex in SBT build file
 Key: SPARK-35023
 URL: https://issues.apache.org/jira/browse/SPARK-35023
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
 Environment: Related to SPARK-34959

SBT 1.5.0 deprecates {{in}} syntax from 0.13.x, so build file adjustment
is recommended.
See 
[https://www.scala-sbt.org/1.x/docs/Migrating-from-sbt-013x.html#Migrating+to+slash+syntax]
Reporter: Denis Pyshev






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35011) Avoid Block Manager registerations when StopExecutor msg is in-flight.

2021-04-11 Thread Sumeet (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318679#comment-17318679
 ] 

Sumeet commented on SPARK-35011:


[~holden], [~dongjoon], [~attilapiros] could you please take a look at this?

> Avoid Block Manager registerations when StopExecutor msg is in-flight.
> --
>
> Key: SPARK-35011
> URL: https://issues.apache.org/jira/browse/SPARK-35011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Sumeet
>Priority: Major
>  Labels: BlockManager, core
>
> *Note:* This is a follow-up on SPARK-34949, even after the heartbeat fix, 
> driver reports dead executors as alive.
> *Problem:*
> I was testing Dynamic Allocation on K8s with about 300 executors. While doing 
> so, when the executors were torn down due to 
> "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor 
> pods being removed from K8s, however, under the "Executors" tab in SparkUI, I 
> could see some executors listed as alive. 
> [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100]
>  also returned a value greater than 1. 
>  
> *Cause:*
>  * "CoarseGrainedSchedulerBackend" issues async "StopExecutor" on 
> executorEndpoint
>  * "CoarseGrainedSchedulerBackend" removes that executor from Driver's 
> internal data structures and publishes "SparkListenerExecutorRemoved" on the 
> "listenerBus".
>  * Executor has still not processed "StopExecutor" from the Driver
>  * Driver receives heartbeat from the Executor, since it cannot find the 
> "executorId" in its data structures, it responds with 
> "HeartbeatResponse(reregisterBlockManager = true)"
>  * "BlockManager" on the Executor reregisters with the "BlockManagerMaster" 
> and "SparkListenerBlockManagerAdded" is published on the "listenerBus"
>  * Executor starts processing the "StopExecutor" and exits
>  * "AppStatusListener" picks the "SparkListenerBlockManagerAdded" event and 
> updates "AppStatusStore"
>  * "statusTracker.getExecutorInfos" refers "AppStatusStore" to get the list 
> of executors which returns the dead executor as alive.
>  
> *Proposed Solution:*
> Maintain a Cache of recently removed executors on Driver. During the 
> registration in BlockManagerMasterEndpoint if the BlockManager belongs to a 
> recently removed executor, return None indicating the registration is ignored 
> since the executor will be shutting down soon.
> On BlockManagerHeartbeat, if the BlockManager belongs to a recently removed 
> executor, return true indicating the driver knows about it, thereby 
> preventing reregisteration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35016) Format ANSI intervals in Hive style

2021-04-11 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35016.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32120
[https://github.com/apache/spark/pull/32120]

> Format ANSI intervals in Hive style
> ---
>
> Key: SPARK-35016
> URL: https://issues.apache.org/jira/browse/SPARK-35016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Hive:
> {code:java}
> 0: jdbc:hive2://localhost:1/default> select timestamp'2021-01-01 
> 01:02:03.01' - date'2020-12-31';
> +---+
> |  _c0  |
> +---+
> | 1 01:02:03.01000  |
> +---+
> {code}
> Spark:
> {code:java}
> spark-sql> select timestamp'2021-01-01 01:02:03.01' - date'2020-12-31';
> INTERVAL '1 01:02:03.01' DAY TO SECOND
> {code}
> Need to align Spark's output to Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35022) Task Scheduling Plugin in Spark

2021-04-11 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-35022:
---

Assignee: L. C. Hsieh

> Task Scheduling Plugin in Spark
> ---
>
> Key: SPARK-35022
> URL: https://issues.apache.org/jira/browse/SPARK-35022
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Spark scheduler schedules tasks to executors in an indeterminate way. 
> Although there is locality configuration, the configuration is used for data 
> locality purposes. Generally we cannot suggest the scheduler where a task 
> should be scheduled to. Normally it is not a problem because the general task 
> is executor-agnostic. But for special tasks, for example stateful tasks in 
> Structured Streaming, state store is maintained at the executor side. 
> Changing task location means reloading checkpoint data from the last batch. 
> It has disadvantages from the performance perspective and also casts some 
> limitations when we want to implement advanced features in Structured 
> Streaming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35022) Task Scheduling Plugin in Spark

2021-04-11 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318666#comment-17318666
 ] 

L. C. Hsieh commented on SPARK-35022:
-

Design doc: 
https://docs.google.com/document/d/1wfEaAZA7t02P6uBH4F3NGuH_qjK5e4X05v1E5pWNhlQ/edit?usp=sharing

> Task Scheduling Plugin in Spark
> ---
>
> Key: SPARK-35022
> URL: https://issues.apache.org/jira/browse/SPARK-35022
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Spark scheduler schedules tasks to executors in an indeterminate way. 
> Although there is locality configuration, the configuration is used for data 
> locality purposes. Generally we cannot suggest the scheduler where a task 
> should be scheduled to. Normally it is not a problem because the general task 
> is executor-agnostic. But for special tasks, for example stateful tasks in 
> Structured Streaming, state store is maintained at the executor side. 
> Changing task location means reloading checkpoint data from the last batch. 
> It has disadvantages from the performance perspective and also casts some 
> limitations when we want to implement advanced features in Structured 
> Streaming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35022) Task Scheduling Plugin in Spark

2021-04-11 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-35022:
---

 Summary: Task Scheduling Plugin in Spark
 Key: SPARK-35022
 URL: https://issues.apache.org/jira/browse/SPARK-35022
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: L. C. Hsieh


Spark scheduler schedules tasks to executors in an indeterminate way. Although 
there is locality configuration, the configuration is used for data locality 
purposes. Generally we cannot suggest the scheduler where a task should be 
scheduled to. Normally it is not a problem because the general task is 
executor-agnostic. But for special tasks, for example stateful tasks in 
Structured Streaming, state store is maintained at the executor side. Changing 
task location means reloading checkpoint data from the last batch. It has 
disadvantages from the performance perspective and also casts some limitations 
when we want to implement advanced features in Structured Streaming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org