date:20201114

[jira] [Resolved] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33455.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30379
[https://github.com/apache/spark/pull/30379]

> Add SubExprEliminationBenchmark for benchmarking subexpression elimination
> --
>
> Key: SPARK-33455
> URL: https://issues.apache.org/jira/browse/SPARK-33455
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.1.0
>
>
> To have a benchmark for subexpression elimination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33001) Why am I receiving this warning?

2020-11-14 Thread Wing Yew Poon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232192#comment-17232192
 ] 

Wing Yew Poon commented on SPARK-33001:
---

I may have been the last to touch ProcfsMetricsGetter.scala but it was authored 
by [~rezasafi].
[~xorz57] and [~dannylee8], are you encountering the warning when running Spark 
on Windows? The warning is harmless. ProcfsMetricsGetter is only meant to be 
run on Linux machines with a /proc filesystem. The warning happened because the 
command "getconf PAGESIZE" was run and it is not a valid command on Windows so 
an exception was caught.
ProcfsMetricsGetter is actually only used when 
spark.executor.processTreeMetrics.enabled=true. However, the class is 
instantiated and the warning occurs then, even though after that the class is 
not used.
Ideally, you should not see this warning. Ideally, isProcfsAvailable should be 
checked before computePageSize() is called (the latter should not be called if 
procfs is not available, and it is not on Windows). So it is a minor bug that 
you see this warning. But it can be safely ignored.

> Why am I receiving this warning?
> 
>
> Key: SPARK-33001
> URL: https://issues.apache.org/jira/browse/SPARK-33001
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: George Fotopoulos
>Priority: Major
>
> I am running Apache Spark Core using Scala 2.12.12 on IntelliJ IDEA 2020.2 
> with Docker 2.3.0.5
> I am running Windows 10 build 2004
> Can somebody explain me why am I receiving this warning and what can I do 
> about it?
> I tried googling this warning but, all I found was people asking about it and 
> no answers.
> [screenshot|https://user-images.githubusercontent.com/1548352/94319642-c8102c80-ff93-11ea-9fea-f58de8da2268.png]
> {code:scala}
> WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a 
> result reporting of ProcessTree metrics is stopped
> {code}
> Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232165#comment-17232165
 ] 

Apache Spark commented on SPARK-33455:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30379

> Add SubExprEliminationBenchmark for benchmarking subexpression elimination
> --
>
> Key: SPARK-33455
> URL: https://issues.apache.org/jira/browse/SPARK-33455
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> To have a benchmark for subexpression elimination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232164#comment-17232164
 ] 

Apache Spark commented on SPARK-33455:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30379

> Add SubExprEliminationBenchmark for benchmarking subexpression elimination
> --
>
> Key: SPARK-33455
> URL: https://issues.apache.org/jira/browse/SPARK-33455
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> To have a benchmark for subexpression elimination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33455:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Add SubExprEliminationBenchmark for benchmarking subexpression elimination
> --
>
> Key: SPARK-33455
> URL: https://issues.apache.org/jira/browse/SPARK-33455
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> To have a benchmark for subexpression elimination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33455:


Assignee: L. C. Hsieh  (was: Apache Spark)

> Add SubExprEliminationBenchmark for benchmarking subexpression elimination
> --
>
> Key: SPARK-33455
> URL: https://issues.apache.org/jira/browse/SPARK-33455
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> To have a benchmark for subexpression elimination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination

2020-11-14 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-33455:
---

 Summary: Add SubExprEliminationBenchmark for benchmarking 
subexpression elimination
 Key: SPARK-33455
 URL: https://issues.apache.org/jira/browse/SPARK-33455
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.1.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


To have a benchmark for subexpression elimination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33432) SQL parser should use active SQLConf

2020-11-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33432.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30357
[https://github.com/apache/spark/pull/30357]

> SQL parser should use active SQLConf
> 
>
> Key: SPARK-33432
> URL: https://issues.apache.org/jira/browse/SPARK-33432
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Lu Lu
>Assignee: Lu Lu
>Priority: Major
> Fix For: 3.1.0
>
>
> In ANSI mode, schema string parsing should fail if the schema uses ANSI 
> reserved keyword as attribute name:
> {code:scala}
> spark.conf.set("spark.sql.ansi.enabled", "true")
> spark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', 
> map('timestampFormat', 'dd/MM/'));""").show
> output:
> Cannot parse the data type: 
> no viable alternative at input 'time'(line 1, pos 0)
> == SQL ==
> time Timestamp
> ^^^
> {code}
> But this query may accidentally succeed in certain cases cause the DataType 
> parser sticks to the configs of the first created session in the current 
> thread:
> {code:scala}
> DataType.fromDDL("time Timestamp")
> val newSpark = spark.newSession()
> newSpark.conf.set("spark.sql.ansi.enabled", "true")
> newSpark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', 
> map('timestampFormat', 'dd/MM/'));""").show
> output:
> ++
> |from_json({"time":"26/10/2015"})|
> ++
> |{2015-10-26 00:00...|
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33432) SQL parser should use active SQLConf

2020-11-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33432:
-

Assignee: Lu Lu

> SQL parser should use active SQLConf
> 
>
> Key: SPARK-33432
> URL: https://issues.apache.org/jira/browse/SPARK-33432
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Lu Lu
>Assignee: Lu Lu
>Priority: Major
>
> In ANSI mode, schema string parsing should fail if the schema uses ANSI 
> reserved keyword as attribute name:
> {code:scala}
> spark.conf.set("spark.sql.ansi.enabled", "true")
> spark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', 
> map('timestampFormat', 'dd/MM/'));""").show
> output:
> Cannot parse the data type: 
> no viable alternative at input 'time'(line 1, pos 0)
> == SQL ==
> time Timestamp
> ^^^
> {code}
> But this query may accidentally succeed in certain cases cause the DataType 
> parser sticks to the configs of the first created session in the current 
> thread:
> {code:scala}
> DataType.fromDDL("time Timestamp")
> val newSpark = spark.newSession()
> newSpark.conf.set("spark.sql.ansi.enabled", "true")
> newSpark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', 
> map('timestampFormat', 'dd/MM/'));""").show
> output:
> ++
> |from_json({"time":"26/10/2015"})|
> ++
> |{2015-10-26 00:00...|
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33454) Add GitHub Action job for Hadoop 2

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33454:


Assignee: (was: Apache Spark)

> Add GitHub Action job for Hadoop 2
> --
>
> Key: SPARK-33454
> URL: https://issues.apache.org/jira/browse/SPARK-33454
> Project: Spark
>  Issue Type: New Feature
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to prevent accidental compilation error with Hadoop 2 profile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33454) Add GitHub Action job for Hadoop 2

2020-11-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232118#comment-17232118
 ] 

Apache Spark commented on SPARK-33454:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30378

> Add GitHub Action job for Hadoop 2
> --
>
> Key: SPARK-33454
> URL: https://issues.apache.org/jira/browse/SPARK-33454
> Project: Spark
>  Issue Type: New Feature
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to prevent accidental compilation error with Hadoop 2 profile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33454) Add GitHub Action job for Hadoop 2

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33454:


Assignee: Apache Spark

> Add GitHub Action job for Hadoop 2
> --
>
> Key: SPARK-33454
> URL: https://issues.apache.org/jira/browse/SPARK-33454
> Project: Spark
>  Issue Type: New Feature
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> This issue aims to prevent accidental compilation error with Hadoop 2 profile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33454) Add GitHub Action job for Hadoop 2

2020-11-14 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-33454:
-

 Summary: Add GitHub Action job for Hadoop 2
 Key: SPARK-33454
 URL: https://issues.apache.org/jira/browse/SPARK-33454
 Project: Spark
  Issue Type: New Feature
  Components: Project Infra
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun


This issue aims to prevent accidental compilation error with Hadoop 2 profile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests

2020-11-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232097#comment-17232097
 ] 

Apache Spark commented on SPARK-33453:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30377

> Unify v1 and v2 SHOW PARTITIONS tests
> -
>
> Key: SPARK-33453
> URL: https://issues.apache.org/jira/browse/SPARK-33453
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common 
> test. Mix this trait to datasource specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33453:


Assignee: (was: Apache Spark)

> Unify v1 and v2 SHOW PARTITIONS tests
> -
>
> Key: SPARK-33453
> URL: https://issues.apache.org/jira/browse/SPARK-33453
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common 
> test. Mix this trait to datasource specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests

2020-11-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232098#comment-17232098
 ] 

Apache Spark commented on SPARK-33453:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30377

> Unify v1 and v2 SHOW PARTITIONS tests
> -
>
> Key: SPARK-33453
> URL: https://issues.apache.org/jira/browse/SPARK-33453
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common 
> test. Mix this trait to datasource specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33453:


Assignee: Apache Spark

> Unify v1 and v2 SHOW PARTITIONS tests
> -
>
> Key: SPARK-33453
> URL: https://issues.apache.org/jira/browse/SPARK-33453
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common 
> test. Mix this trait to datasource specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests

2020-11-14 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33453:
--

 Summary: Unify v1 and v2 SHOW PARTITIONS tests
 Key: SPARK-33453
 URL: https://issues.apache.org/jira/browse/SPARK-33453
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common test. 
Mix this trait to datasource specific test suites.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node

2020-11-14 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33452:
--

 Summary: Create a V2 SHOW PARTITIONS execution node
 Key: SPARK-33452
 URL: https://issues.apache.org/jira/browse/SPARK-33452
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


There is the V1 SHOW PARTITIONS implementation:
https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975
The ticket aims to add V2 implementation with similar behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node

2020-11-14 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232095#comment-17232095
 ] 

Maxim Gekk commented on SPARK-33452:


I plan to work on this soon.

> Create a V2 SHOW PARTITIONS execution node
> --
>
> Key: SPARK-33452
> URL: https://issues.apache.org/jira/browse/SPARK-33452
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> There is the V1 SHOW PARTITIONS implementation:
> https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975
> The ticket aims to add V2 implementation with similar behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33393) Support SHOW TABLE EXTENDED in DSv2

2020-11-14 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232094#comment-17232094
 ] 

Maxim Gekk commented on SPARK-33393:


I plan to work on this soon.

> Support SHOW TABLE EXTENDED in DSv2
> ---
>
> Key: SPARK-33393
> URL: https://issues.apache.org/jira/browse/SPARK-33393
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Current implementation of DSv2 SHOW TABLE doesn't support the EXTENDED mode 
> in:
> https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala#L33
> which is supported in DSv1:
> https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L870
> Need to add the same functionality to ShowTablesExec.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33252) Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-14 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232062#comment-17232062
 ] 

Hyukjin Kwon commented on SPARK-33252:
--

Thanks [~zero323].

> Migration to NumPy documentation style in MLlib (pyspark.mllib.*)
> -
>
> Key: SPARK-33252
> URL: https://issues.apache.org/jira/browse/SPARK-33252
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
>  This JIRA targets to migrate to NumPy documentation style in MLlib 
> (pyspark.mllib.*). Please also see the parent JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33252) Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

2020-11-14 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232059#comment-17232059
 ] 

Maciej Szymkiewicz commented on SPARK-33252:


I am starting to work on this one.

> Migration to NumPy documentation style in MLlib (pyspark.mllib.*)
> -
>
> Key: SPARK-33252
> URL: https://issues.apache.org/jira/browse/SPARK-33252
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
>  This JIRA targets to migrate to NumPy documentation style in MLlib 
> (pyspark.mllib.*). Please also see the parent JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33451:


Assignee: (was: Apache Spark)

> change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 
> 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
> 
>
> Key: SPARK-33451
> URL: https://issues.apache.org/jira/browse/SPARK-33451
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: aof
>Priority: Major
> Fix For: 3.0.0, 3.0.1
>
>
> In the 'Optimizing Skew Join' section of the following two pages:
>  # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html]
>  # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html]
> The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should 
> be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', 
> The former is missing the 'skewJoin'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33451:


Assignee: (was: Apache Spark)

> change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 
> 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
> 
>
> Key: SPARK-33451
> URL: https://issues.apache.org/jira/browse/SPARK-33451
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: aof
>Priority: Major
> Fix For: 3.0.0, 3.0.1
>
>
> In the 'Optimizing Skew Join' section of the following two pages:
>  # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html]
>  # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html]
> The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should 
> be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', 
> The former is missing the 'skewJoin'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'

2020-11-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33451:


Assignee: Apache Spark

> change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 
> 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
> 
>
> Key: SPARK-33451
> URL: https://issues.apache.org/jira/browse/SPARK-33451
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: aof
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.0.0, 3.0.1
>
>
> In the 'Optimizing Skew Join' section of the following two pages:
>  # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html]
>  # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html]
> The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should 
> be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', 
> The former is missing the 'skewJoin'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'

2020-11-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232028#comment-17232028
 ] 

Apache Spark commented on SPARK-33451:
--

User 'aof00' has created a pull request for this issue:
https://github.com/apache/spark/pull/30376

> change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 
> 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
> 
>
> Key: SPARK-33451
> URL: https://issues.apache.org/jira/browse/SPARK-33451
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: aof
>Priority: Major
> Fix For: 3.0.0, 3.0.1
>
>
> In the 'Optimizing Skew Join' section of the following two pages:
>  # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html]
>  # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html]
> The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should 
> be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', 
> The former is missing the 'skewJoin'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'

2020-11-14 Thread aof (Jira)

aof created SPARK-33451:
---

 Summary: change 
'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 
'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
 Key: SPARK-33451
 URL: https://issues.apache.org/jira/browse/SPARK-33451
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.0.1, 3.0.0
Reporter: aof
 Fix For: 3.0.1, 3.0.0


In the 'Optimizing Skew Join' section of the following two pages:
 # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html]
 # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html]

The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should 
be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', 
The former is missing the 'skewJoin'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'

2020-11-14 Thread aof (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aof updated SPARK-33451:

Shepherd:   (was: aof)
Target Version/s: 3.0.1, 3.0.0  (was: 3.0.0, 3.0.1)

> change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 
> 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
> 
>
> Key: SPARK-33451
> URL: https://issues.apache.org/jira/browse/SPARK-33451
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: aof
>Priority: Major
> Fix For: 3.0.0, 3.0.1
>
>
> In the 'Optimizing Skew Join' section of the following two pages:
>  # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html]
>  # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html]
> The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should 
> be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', 
> The former is missing the 'skewJoin'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'

2020-11-14 Thread aof (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aof updated SPARK-33451:

Shepherd: aof
Target Version/s: 3.0.1, 3.0.0  (was: 3.0.0, 3.0.1)

> change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 
> 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
> 
>
> Key: SPARK-33451
> URL: https://issues.apache.org/jira/browse/SPARK-33451
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: aof
>Priority: Major
> Fix For: 3.0.0, 3.0.1
>
>
> In the 'Optimizing Skew Join' section of the following two pages:
>  # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html]
>  # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html]
> The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should 
> be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', 
> The former is missing the 'skewJoin'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33450) Engenharia de Dados Cognitivo.ai

2020-11-14 Thread Takeshi Yamamuro (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232024#comment-17232024
 ] 

Takeshi Yamamuro commented on SPARK-33450:
--

Please write the description in English.

> Engenharia de Dados Cognitivo.ai
> 
>
> Key: SPARK-33450
> URL: https://issues.apache.org/jira/browse/SPARK-33450
> Project: Spark
>  Issue Type: Task
>  Components: Examples
>Affects Versions: 3.0.1
>Reporter: BRUNO MOROZINI DOS SANTOS
>Priority: Major
> Attachments: load.csv
>
>
> h2. Engenharia de Dados Cognitivo.ai
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33450) Engenharia de Dados Cognitivo.ai

2020-11-14 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-33450.
--
Resolution: Invalid

> Engenharia de Dados Cognitivo.ai
> 
>
> Key: SPARK-33450
> URL: https://issues.apache.org/jira/browse/SPARK-33450
> Project: Spark
>  Issue Type: Task
>  Components: Examples
>Affects Versions: 3.0.1
>Reporter: BRUNO MOROZINI DOS SANTOS
>Priority: Major
> Attachments: load.csv
>
>
> h2. Engenharia de Dados Cognitivo.ai
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33396) Spark SQL CLI not print application id in processing file mode

2020-11-14 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-33396:
---

Assignee: Lichuanliang

> Spark SQL CLI not print application id in processing file mode
> --
>
> Key: SPARK-33396
> URL: https://issues.apache.org/jira/browse/SPARK-33396
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Lichuanliang
>Assignee: Lichuanliang
>Priority: Minor
> Fix For: 3.1.0
>
>
> Even though in SPARK-25043 it has already added the printing application id 
> function. But when process sql file the print function will never be invoked. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33450) Engenharia de Dados Cognitivo.ai

2020-11-14 Thread BRUNO MOROZINI DOS SANTOS (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BRUNO MOROZINI DOS SANTOS updated SPARK-33450:
--
Description: 
h2. Engenharia de Dados Cognitivo.ai
 

> Engenharia de Dados Cognitivo.ai
> 
>
> Key: SPARK-33450
> URL: https://issues.apache.org/jira/browse/SPARK-33450
> Project: Spark
>  Issue Type: Task
>  Components: Examples
>Affects Versions: 3.0.1
>Reporter: BRUNO MOROZINI DOS SANTOS
>Priority: Major
> Attachments: load.csv
>
>
> h2. Engenharia de Dados Cognitivo.ai
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33450) Engenharia de Dados Cognitivo.ai

2020-11-14 Thread BRUNO MOROZINI DOS SANTOS (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BRUNO MOROZINI DOS SANTOS updated SPARK-33450:
--
Attachment: load.csv

> Engenharia de Dados Cognitivo.ai
> 
>
> Key: SPARK-33450
> URL: https://issues.apache.org/jira/browse/SPARK-33450
> Project: Spark
>  Issue Type: Task
>  Components: Examples
>Affects Versions: 3.0.1
>Reporter: BRUNO MOROZINI DOS SANTOS
>Priority: Major
> Attachments: load.csv
>
>
> h2. Engenharia de Dados Cognitivo.ai
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33450) Engenharia de Dados Cognitivo.ai

2020-11-14 Thread BRUNO MOROZINI DOS SANTOS (Jira)

BRUNO MOROZINI DOS SANTOS created SPARK-33450:
-

 Summary: Engenharia de Dados Cognitivo.ai
 Key: SPARK-33450
 URL: https://issues.apache.org/jira/browse/SPARK-33450
 Project: Spark
  Issue Type: Task
  Components: Examples
Affects Versions: 3.0.1
Reporter: BRUNO MOROZINI DOS SANTOS






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33396) Spark SQL CLI not print application id in processing file mode

2020-11-14 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-33396.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30301
[https://github.com/apache/spark/pull/30301]

> Spark SQL CLI not print application id in processing file mode
> --
>
> Key: SPARK-33396
> URL: https://issues.apache.org/jira/browse/SPARK-33396
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Lichuanliang
>Priority: Minor
> Fix For: 3.1.0
>
>
> Even though in SPARK-25043 it has already added the printing application id 
> function. But when process sql file the print function will never be invoked. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33449) Add cache for Parquet Metadata

2020-11-14 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33449:

Description: 
Get Parquet metadata may takes a lot of time, maybe we can cache it. Presto 
support it:
https://github.com/prestodb/presto/pull/15276

  was:
Get Parquet metadata takes a lot of time, maybe we can cache it. Presto support 
it:
https://github.com/prestodb/presto/pull/15276


> Add cache for Parquet Metadata
> --
>
> Key: SPARK-33449
> URL: https://issues.apache.org/jira/browse/SPARK-33449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: Get Parquet metadata.png
>
>
> Get Parquet metadata may takes a lot of time, maybe we can cache it. Presto 
> support it:
> https://github.com/prestodb/presto/pull/15276



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33449) Add cache for Parquet Metadata

2020-11-14 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33449:

Description: 
Get Parquet metadata takes a lot of time, maybe we can cache it. Presto support 
it:
https://github.com/prestodb/presto/pull/15276

> Add cache for Parquet Metadata
> --
>
> Key: SPARK-33449
> URL: https://issues.apache.org/jira/browse/SPARK-33449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: Get Parquet metadata.png
>
>
> Get Parquet metadata takes a lot of time, maybe we can cache it. Presto 
> support it:
> https://github.com/prestodb/presto/pull/15276



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33449) Add cache for Parquet Metadata

2020-11-14 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-33449:
---

 Summary: Add cache for Parquet Metadata
 Key: SPARK-33449
 URL: https://issues.apache.org/jira/browse/SPARK-33449
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang
 Attachments: Get Parquet metadata.png





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33449) Add cache for Parquet Metadata

2020-11-14 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33449:

Attachment: Get Parquet metadata.png

> Add cache for Parquet Metadata
> --
>
> Key: SPARK-33449
> URL: https://issues.apache.org/jira/browse/SPARK-33449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: Get Parquet metadata.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33288) Support k8s cluster manager with stage level scheduling

2020-11-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231998#comment-17231998
 ] 

Apache Spark commented on SPARK-33288:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30375

> Support k8s cluster manager with stage level scheduling
> ---
>
> Key: SPARK-33288
> URL: https://issues.apache.org/jira/browse/SPARK-33288
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.1.0
>
>
> Kubernetes supports dynamic allocation via the 
> {{spark.dynamicAllocation.shuffleTracking.enabled}}
> {{config, we can add support for stage level scheduling when that is turned 
> on.  }}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33288) Support k8s cluster manager with stage level scheduling

2020-11-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231999#comment-17231999
 ] 

Apache Spark commented on SPARK-33288:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30375

> Support k8s cluster manager with stage level scheduling
> ---
>
> Key: SPARK-33288
> URL: https://issues.apache.org/jira/browse/SPARK-33288
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.1.0
>
>
> Kubernetes supports dynamic allocation via the 
> {{spark.dynamicAllocation.shuffleTracking.enabled}}
> {{config, we can add support for stage level scheduling when that is turned 
> on.  }}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

43 matches

Mail list logo