[jira] [Resolved] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination
[ https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33455. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30379 [https://github.com/apache/spark/pull/30379] > Add SubExprEliminationBenchmark for benchmarking subexpression elimination > -- > > Key: SPARK-33455 > URL: https://issues.apache.org/jira/browse/SPARK-33455 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.1.0 > > > To have a benchmark for subexpression elimination. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33001) Why am I receiving this warning?
[ https://issues.apache.org/jira/browse/SPARK-33001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232192#comment-17232192 ] Wing Yew Poon commented on SPARK-33001: --- I may have been the last to touch ProcfsMetricsGetter.scala but it was authored by [~rezasafi]. [~xorz57] and [~dannylee8], are you encountering the warning when running Spark on Windows? The warning is harmless. ProcfsMetricsGetter is only meant to be run on Linux machines with a /proc filesystem. The warning happened because the command "getconf PAGESIZE" was run and it is not a valid command on Windows so an exception was caught. ProcfsMetricsGetter is actually only used when spark.executor.processTreeMetrics.enabled=true. However, the class is instantiated and the warning occurs then, even though after that the class is not used. Ideally, you should not see this warning. Ideally, isProcfsAvailable should be checked before computePageSize() is called (the latter should not be called if procfs is not available, and it is not on Windows). So it is a minor bug that you see this warning. But it can be safely ignored. > Why am I receiving this warning? > > > Key: SPARK-33001 > URL: https://issues.apache.org/jira/browse/SPARK-33001 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: George Fotopoulos >Priority: Major > > I am running Apache Spark Core using Scala 2.12.12 on IntelliJ IDEA 2020.2 > with Docker 2.3.0.5 > I am running Windows 10 build 2004 > Can somebody explain me why am I receiving this warning and what can I do > about it? > I tried googling this warning but, all I found was people asking about it and > no answers. > [screenshot|https://user-images.githubusercontent.com/1548352/94319642-c8102c80-ff93-11ea-9fea-f58de8da2268.png] > {code:scala} > WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a > result reporting of ProcessTree metrics is stopped > {code} > Thanks in advance! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination
[ https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232165#comment-17232165 ] Apache Spark commented on SPARK-33455: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30379 > Add SubExprEliminationBenchmark for benchmarking subexpression elimination > -- > > Key: SPARK-33455 > URL: https://issues.apache.org/jira/browse/SPARK-33455 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > To have a benchmark for subexpression elimination. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination
[ https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232164#comment-17232164 ] Apache Spark commented on SPARK-33455: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30379 > Add SubExprEliminationBenchmark for benchmarking subexpression elimination > -- > > Key: SPARK-33455 > URL: https://issues.apache.org/jira/browse/SPARK-33455 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > To have a benchmark for subexpression elimination. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination
[ https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33455: Assignee: Apache Spark (was: L. C. Hsieh) > Add SubExprEliminationBenchmark for benchmarking subexpression elimination > -- > > Key: SPARK-33455 > URL: https://issues.apache.org/jira/browse/SPARK-33455 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > > To have a benchmark for subexpression elimination. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination
[ https://issues.apache.org/jira/browse/SPARK-33455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33455: Assignee: L. C. Hsieh (was: Apache Spark) > Add SubExprEliminationBenchmark for benchmarking subexpression elimination > -- > > Key: SPARK-33455 > URL: https://issues.apache.org/jira/browse/SPARK-33455 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > To have a benchmark for subexpression elimination. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33455) Add SubExprEliminationBenchmark for benchmarking subexpression elimination
L. C. Hsieh created SPARK-33455: --- Summary: Add SubExprEliminationBenchmark for benchmarking subexpression elimination Key: SPARK-33455 URL: https://issues.apache.org/jira/browse/SPARK-33455 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.1.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh To have a benchmark for subexpression elimination. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33432) SQL parser should use active SQLConf
[ https://issues.apache.org/jira/browse/SPARK-33432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33432. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30357 [https://github.com/apache/spark/pull/30357] > SQL parser should use active SQLConf > > > Key: SPARK-33432 > URL: https://issues.apache.org/jira/browse/SPARK-33432 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Lu Lu >Assignee: Lu Lu >Priority: Major > Fix For: 3.1.0 > > > In ANSI mode, schema string parsing should fail if the schema uses ANSI > reserved keyword as attribute name: > {code:scala} > spark.conf.set("spark.sql.ansi.enabled", "true") > spark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', > map('timestampFormat', 'dd/MM/'));""").show > output: > Cannot parse the data type: > no viable alternative at input 'time'(line 1, pos 0) > == SQL == > time Timestamp > ^^^ > {code} > But this query may accidentally succeed in certain cases cause the DataType > parser sticks to the configs of the first created session in the current > thread: > {code:scala} > DataType.fromDDL("time Timestamp") > val newSpark = spark.newSession() > newSpark.conf.set("spark.sql.ansi.enabled", "true") > newSpark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', > map('timestampFormat', 'dd/MM/'));""").show > output: > ++ > |from_json({"time":"26/10/2015"})| > ++ > |{2015-10-26 00:00...| > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33432) SQL parser should use active SQLConf
[ https://issues.apache.org/jira/browse/SPARK-33432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33432: - Assignee: Lu Lu > SQL parser should use active SQLConf > > > Key: SPARK-33432 > URL: https://issues.apache.org/jira/browse/SPARK-33432 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Lu Lu >Assignee: Lu Lu >Priority: Major > > In ANSI mode, schema string parsing should fail if the schema uses ANSI > reserved keyword as attribute name: > {code:scala} > spark.conf.set("spark.sql.ansi.enabled", "true") > spark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', > map('timestampFormat', 'dd/MM/'));""").show > output: > Cannot parse the data type: > no viable alternative at input 'time'(line 1, pos 0) > == SQL == > time Timestamp > ^^^ > {code} > But this query may accidentally succeed in certain cases cause the DataType > parser sticks to the configs of the first created session in the current > thread: > {code:scala} > DataType.fromDDL("time Timestamp") > val newSpark = spark.newSession() > newSpark.conf.set("spark.sql.ansi.enabled", "true") > newSpark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp', > map('timestampFormat', 'dd/MM/'));""").show > output: > ++ > |from_json({"time":"26/10/2015"})| > ++ > |{2015-10-26 00:00...| > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33454) Add GitHub Action job for Hadoop 2
[ https://issues.apache.org/jira/browse/SPARK-33454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33454: Assignee: (was: Apache Spark) > Add GitHub Action job for Hadoop 2 > -- > > Key: SPARK-33454 > URL: https://issues.apache.org/jira/browse/SPARK-33454 > Project: Spark > Issue Type: New Feature > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > This issue aims to prevent accidental compilation error with Hadoop 2 profile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33454) Add GitHub Action job for Hadoop 2
[ https://issues.apache.org/jira/browse/SPARK-33454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232118#comment-17232118 ] Apache Spark commented on SPARK-33454: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30378 > Add GitHub Action job for Hadoop 2 > -- > > Key: SPARK-33454 > URL: https://issues.apache.org/jira/browse/SPARK-33454 > Project: Spark > Issue Type: New Feature > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > This issue aims to prevent accidental compilation error with Hadoop 2 profile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33454) Add GitHub Action job for Hadoop 2
[ https://issues.apache.org/jira/browse/SPARK-33454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33454: Assignee: Apache Spark > Add GitHub Action job for Hadoop 2 > -- > > Key: SPARK-33454 > URL: https://issues.apache.org/jira/browse/SPARK-33454 > Project: Spark > Issue Type: New Feature > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > This issue aims to prevent accidental compilation error with Hadoop 2 profile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33454) Add GitHub Action job for Hadoop 2
Dongjoon Hyun created SPARK-33454: - Summary: Add GitHub Action job for Hadoop 2 Key: SPARK-33454 URL: https://issues.apache.org/jira/browse/SPARK-33454 Project: Spark Issue Type: New Feature Components: Project Infra Affects Versions: 3.1.0 Reporter: Dongjoon Hyun This issue aims to prevent accidental compilation error with Hadoop 2 profile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests
[ https://issues.apache.org/jira/browse/SPARK-33453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232097#comment-17232097 ] Apache Spark commented on SPARK-33453: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30377 > Unify v1 and v2 SHOW PARTITIONS tests > - > > Key: SPARK-33453 > URL: https://issues.apache.org/jira/browse/SPARK-33453 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common > test. Mix this trait to datasource specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests
[ https://issues.apache.org/jira/browse/SPARK-33453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33453: Assignee: (was: Apache Spark) > Unify v1 and v2 SHOW PARTITIONS tests > - > > Key: SPARK-33453 > URL: https://issues.apache.org/jira/browse/SPARK-33453 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common > test. Mix this trait to datasource specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests
[ https://issues.apache.org/jira/browse/SPARK-33453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232098#comment-17232098 ] Apache Spark commented on SPARK-33453: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30377 > Unify v1 and v2 SHOW PARTITIONS tests > - > > Key: SPARK-33453 > URL: https://issues.apache.org/jira/browse/SPARK-33453 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common > test. Mix this trait to datasource specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests
[ https://issues.apache.org/jira/browse/SPARK-33453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33453: Assignee: Apache Spark > Unify v1 and v2 SHOW PARTITIONS tests > - > > Key: SPARK-33453 > URL: https://issues.apache.org/jira/browse/SPARK-33453 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common > test. Mix this trait to datasource specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests
Maxim Gekk created SPARK-33453: -- Summary: Unify v1 and v2 SHOW PARTITIONS tests Key: SPARK-33453 URL: https://issues.apache.org/jira/browse/SPARK-33453 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common test. Mix this trait to datasource specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node
Maxim Gekk created SPARK-33452: -- Summary: Create a V2 SHOW PARTITIONS execution node Key: SPARK-33452 URL: https://issues.apache.org/jira/browse/SPARK-33452 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk There is the V1 SHOW PARTITIONS implementation: https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975 The ticket aims to add V2 implementation with similar behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node
[ https://issues.apache.org/jira/browse/SPARK-33452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232095#comment-17232095 ] Maxim Gekk commented on SPARK-33452: I plan to work on this soon. > Create a V2 SHOW PARTITIONS execution node > -- > > Key: SPARK-33452 > URL: https://issues.apache.org/jira/browse/SPARK-33452 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > There is the V1 SHOW PARTITIONS implementation: > https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975 > The ticket aims to add V2 implementation with similar behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33393) Support SHOW TABLE EXTENDED in DSv2
[ https://issues.apache.org/jira/browse/SPARK-33393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232094#comment-17232094 ] Maxim Gekk commented on SPARK-33393: I plan to work on this soon. > Support SHOW TABLE EXTENDED in DSv2 > --- > > Key: SPARK-33393 > URL: https://issues.apache.org/jira/browse/SPARK-33393 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Current implementation of DSv2 SHOW TABLE doesn't support the EXTENDED mode > in: > https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala#L33 > which is supported in DSv1: > https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L870 > Need to add the same functionality to ShowTablesExec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33252) Migration to NumPy documentation style in MLlib (pyspark.mllib.*)
[ https://issues.apache.org/jira/browse/SPARK-33252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232062#comment-17232062 ] Hyukjin Kwon commented on SPARK-33252: -- Thanks [~zero323]. > Migration to NumPy documentation style in MLlib (pyspark.mllib.*) > - > > Key: SPARK-33252 > URL: https://issues.apache.org/jira/browse/SPARK-33252 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > This JIRA targets to migrate to NumPy documentation style in MLlib > (pyspark.mllib.*). Please also see the parent JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33252) Migration to NumPy documentation style in MLlib (pyspark.mllib.*)
[ https://issues.apache.org/jira/browse/SPARK-33252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232059#comment-17232059 ] Maciej Szymkiewicz commented on SPARK-33252: I am starting to work on this one. > Migration to NumPy documentation style in MLlib (pyspark.mllib.*) > - > > Key: SPARK-33252 > URL: https://issues.apache.org/jira/browse/SPARK-33252 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > This JIRA targets to migrate to NumPy documentation style in MLlib > (pyspark.mllib.*). Please also see the parent JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
[ https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33451: Assignee: (was: Apache Spark) > change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to > 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' > > > Key: SPARK-33451 > URL: https://issues.apache.org/jira/browse/SPARK-33451 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: aof >Priority: Major > Fix For: 3.0.0, 3.0.1 > > > In the 'Optimizing Skew Join' section of the following two pages: > # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html] > # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html] > The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should > be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', > The former is missing the 'skewJoin'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
[ https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33451: Assignee: (was: Apache Spark) > change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to > 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' > > > Key: SPARK-33451 > URL: https://issues.apache.org/jira/browse/SPARK-33451 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: aof >Priority: Major > Fix For: 3.0.0, 3.0.1 > > > In the 'Optimizing Skew Join' section of the following two pages: > # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html] > # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html] > The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should > be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', > The former is missing the 'skewJoin'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
[ https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33451: Assignee: Apache Spark > change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to > 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' > > > Key: SPARK-33451 > URL: https://issues.apache.org/jira/browse/SPARK-33451 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: aof >Assignee: Apache Spark >Priority: Major > Fix For: 3.0.0, 3.0.1 > > > In the 'Optimizing Skew Join' section of the following two pages: > # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html] > # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html] > The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should > be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', > The former is missing the 'skewJoin'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
[ https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232028#comment-17232028 ] Apache Spark commented on SPARK-33451: -- User 'aof00' has created a pull request for this issue: https://github.com/apache/spark/pull/30376 > change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to > 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' > > > Key: SPARK-33451 > URL: https://issues.apache.org/jira/browse/SPARK-33451 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: aof >Priority: Major > Fix For: 3.0.0, 3.0.1 > > > In the 'Optimizing Skew Join' section of the following two pages: > # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html] > # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html] > The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should > be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', > The former is missing the 'skewJoin'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
aof created SPARK-33451: --- Summary: change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' Key: SPARK-33451 URL: https://issues.apache.org/jira/browse/SPARK-33451 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 3.0.1, 3.0.0 Reporter: aof Fix For: 3.0.1, 3.0.0 In the 'Optimizing Skew Join' section of the following two pages: # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html] # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html] The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', The former is missing the 'skewJoin'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
[ https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] aof updated SPARK-33451: Shepherd: (was: aof) Target Version/s: 3.0.1, 3.0.0 (was: 3.0.0, 3.0.1) > change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to > 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' > > > Key: SPARK-33451 > URL: https://issues.apache.org/jira/browse/SPARK-33451 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: aof >Priority: Major > Fix For: 3.0.0, 3.0.1 > > > In the 'Optimizing Skew Join' section of the following two pages: > # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html] > # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html] > The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should > be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', > The former is missing the 'skewJoin'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33451) change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'
[ https://issues.apache.org/jira/browse/SPARK-33451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] aof updated SPARK-33451: Shepherd: aof Target Version/s: 3.0.1, 3.0.0 (was: 3.0.0, 3.0.1) > change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to > 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' > > > Key: SPARK-33451 > URL: https://issues.apache.org/jira/browse/SPARK-33451 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: aof >Priority: Major > Fix For: 3.0.0, 3.0.1 > > > In the 'Optimizing Skew Join' section of the following two pages: > # [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html] > # [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html] > The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should > be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', > The former is missing the 'skewJoin'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33450) Engenharia de Dados Cognitivo.ai
[ https://issues.apache.org/jira/browse/SPARK-33450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232024#comment-17232024 ] Takeshi Yamamuro commented on SPARK-33450: -- Please write the description in English. > Engenharia de Dados Cognitivo.ai > > > Key: SPARK-33450 > URL: https://issues.apache.org/jira/browse/SPARK-33450 > Project: Spark > Issue Type: Task > Components: Examples >Affects Versions: 3.0.1 >Reporter: BRUNO MOROZINI DOS SANTOS >Priority: Major > Attachments: load.csv > > > h2. Engenharia de Dados Cognitivo.ai > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33450) Engenharia de Dados Cognitivo.ai
[ https://issues.apache.org/jira/browse/SPARK-33450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-33450. -- Resolution: Invalid > Engenharia de Dados Cognitivo.ai > > > Key: SPARK-33450 > URL: https://issues.apache.org/jira/browse/SPARK-33450 > Project: Spark > Issue Type: Task > Components: Examples >Affects Versions: 3.0.1 >Reporter: BRUNO MOROZINI DOS SANTOS >Priority: Major > Attachments: load.csv > > > h2. Engenharia de Dados Cognitivo.ai > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33396) Spark SQL CLI not print application id in processing file mode
[ https://issues.apache.org/jira/browse/SPARK-33396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-33396: --- Assignee: Lichuanliang > Spark SQL CLI not print application id in processing file mode > -- > > Key: SPARK-33396 > URL: https://issues.apache.org/jira/browse/SPARK-33396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Lichuanliang >Assignee: Lichuanliang >Priority: Minor > Fix For: 3.1.0 > > > Even though in SPARK-25043 it has already added the printing application id > function. But when process sql file the print function will never be invoked. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33450) Engenharia de Dados Cognitivo.ai
[ https://issues.apache.org/jira/browse/SPARK-33450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BRUNO MOROZINI DOS SANTOS updated SPARK-33450: -- Description: h2. Engenharia de Dados Cognitivo.ai > Engenharia de Dados Cognitivo.ai > > > Key: SPARK-33450 > URL: https://issues.apache.org/jira/browse/SPARK-33450 > Project: Spark > Issue Type: Task > Components: Examples >Affects Versions: 3.0.1 >Reporter: BRUNO MOROZINI DOS SANTOS >Priority: Major > Attachments: load.csv > > > h2. Engenharia de Dados Cognitivo.ai > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33450) Engenharia de Dados Cognitivo.ai
[ https://issues.apache.org/jira/browse/SPARK-33450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BRUNO MOROZINI DOS SANTOS updated SPARK-33450: -- Attachment: load.csv > Engenharia de Dados Cognitivo.ai > > > Key: SPARK-33450 > URL: https://issues.apache.org/jira/browse/SPARK-33450 > Project: Spark > Issue Type: Task > Components: Examples >Affects Versions: 3.0.1 >Reporter: BRUNO MOROZINI DOS SANTOS >Priority: Major > Attachments: load.csv > > > h2. Engenharia de Dados Cognitivo.ai > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33450) Engenharia de Dados Cognitivo.ai
BRUNO MOROZINI DOS SANTOS created SPARK-33450: - Summary: Engenharia de Dados Cognitivo.ai Key: SPARK-33450 URL: https://issues.apache.org/jira/browse/SPARK-33450 Project: Spark Issue Type: Task Components: Examples Affects Versions: 3.0.1 Reporter: BRUNO MOROZINI DOS SANTOS -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33396) Spark SQL CLI not print application id in processing file mode
[ https://issues.apache.org/jira/browse/SPARK-33396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-33396. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30301 [https://github.com/apache/spark/pull/30301] > Spark SQL CLI not print application id in processing file mode > -- > > Key: SPARK-33396 > URL: https://issues.apache.org/jira/browse/SPARK-33396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Lichuanliang >Priority: Minor > Fix For: 3.1.0 > > > Even though in SPARK-25043 it has already added the printing application id > function. But when process sql file the print function will never be invoked. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33449) Add cache for Parquet Metadata
[ https://issues.apache.org/jira/browse/SPARK-33449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33449: Description: Get Parquet metadata may takes a lot of time, maybe we can cache it. Presto support it: https://github.com/prestodb/presto/pull/15276 was: Get Parquet metadata takes a lot of time, maybe we can cache it. Presto support it: https://github.com/prestodb/presto/pull/15276 > Add cache for Parquet Metadata > -- > > Key: SPARK-33449 > URL: https://issues.apache.org/jira/browse/SPARK-33449 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > Attachments: Get Parquet metadata.png > > > Get Parquet metadata may takes a lot of time, maybe we can cache it. Presto > support it: > https://github.com/prestodb/presto/pull/15276 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33449) Add cache for Parquet Metadata
[ https://issues.apache.org/jira/browse/SPARK-33449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33449: Description: Get Parquet metadata takes a lot of time, maybe we can cache it. Presto support it: https://github.com/prestodb/presto/pull/15276 > Add cache for Parquet Metadata > -- > > Key: SPARK-33449 > URL: https://issues.apache.org/jira/browse/SPARK-33449 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > Attachments: Get Parquet metadata.png > > > Get Parquet metadata takes a lot of time, maybe we can cache it. Presto > support it: > https://github.com/prestodb/presto/pull/15276 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33449) Add cache for Parquet Metadata
Yuming Wang created SPARK-33449: --- Summary: Add cache for Parquet Metadata Key: SPARK-33449 URL: https://issues.apache.org/jira/browse/SPARK-33449 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang Attachments: Get Parquet metadata.png -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33449) Add cache for Parquet Metadata
[ https://issues.apache.org/jira/browse/SPARK-33449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33449: Attachment: Get Parquet metadata.png > Add cache for Parquet Metadata > -- > > Key: SPARK-33449 > URL: https://issues.apache.org/jira/browse/SPARK-33449 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > Attachments: Get Parquet metadata.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33288) Support k8s cluster manager with stage level scheduling
[ https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231998#comment-17231998 ] Apache Spark commented on SPARK-33288: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30375 > Support k8s cluster manager with stage level scheduling > --- > > Key: SPARK-33288 > URL: https://issues.apache.org/jira/browse/SPARK-33288 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.1.0 > > > Kubernetes supports dynamic allocation via the > {{spark.dynamicAllocation.shuffleTracking.enabled}} > {{config, we can add support for stage level scheduling when that is turned > on. }} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33288) Support k8s cluster manager with stage level scheduling
[ https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231999#comment-17231999 ] Apache Spark commented on SPARK-33288: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30375 > Support k8s cluster manager with stage level scheduling > --- > > Key: SPARK-33288 > URL: https://issues.apache.org/jira/browse/SPARK-33288 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.1.0 > > > Kubernetes supports dynamic allocation via the > {{spark.dynamicAllocation.shuffleTracking.enabled}} > {{config, we can add support for stage level scheduling when that is turned > on. }} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org