[jira] [Resolved] (SPARK-37499) Close HiveClientImpl.sessionState when shutdown
[ https://issues.apache.org/jira/browse/SPARK-37499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu resolved SPARK-37499. --- Resolution: Not A Problem > Close HiveClientImpl.sessionState when shutdown > --- > > Key: SPARK-37499 > URL: https://issues.apache.org/jira/browse/SPARK-37499 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > HiveClientImple not close sessionState after application shutdown, cause > remain many session files. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37494) Unify v1 and v2 options output of `SHOW CREATE TABLE` command
[ https://issues.apache.org/jira/browse/SPARK-37494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450909#comment-17450909 ] Apache Spark commented on SPARK-37494: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/34753 > Unify v1 and v2 options output of `SHOW CREATE TABLE` command > - > > Key: SPARK-37494 > URL: https://issues.apache.org/jira/browse/SPARK-37494 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37494) Unify v1 and v2 options output of `SHOW CREATE TABLE` command
[ https://issues.apache.org/jira/browse/SPARK-37494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37494: Assignee: (was: Apache Spark) > Unify v1 and v2 options output of `SHOW CREATE TABLE` command > - > > Key: SPARK-37494 > URL: https://issues.apache.org/jira/browse/SPARK-37494 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37494) Unify v1 and v2 options output of `SHOW CREATE TABLE` command
[ https://issues.apache.org/jira/browse/SPARK-37494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37494: Assignee: Apache Spark > Unify v1 and v2 options output of `SHOW CREATE TABLE` command > - > > Key: SPARK-37494 > URL: https://issues.apache.org/jira/browse/SPARK-37494 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36396) Implement DataFrame.cov
[ https://issues.apache.org/jira/browse/SPARK-36396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36396. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34213 [https://github.com/apache/spark/pull/34213] > Implement DataFrame.cov > --- > > Key: SPARK-36396 > URL: https://issues.apache.org/jira/browse/SPARK-36396 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36396) Implement DataFrame.cov
[ https://issues.apache.org/jira/browse/SPARK-36396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36396: Assignee: Xinrong Meng > Implement DataFrame.cov > --- > > Key: SPARK-36396 > URL: https://issues.apache.org/jira/browse/SPARK-36396 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37489) Skip hasnans check in numops if eager_check disable
[ https://issues.apache.org/jira/browse/SPARK-37489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37489: Assignee: Yikun Jiang > Skip hasnans check in numops if eager_check disable > --- > > Key: SPARK-37489 > URL: https://issues.apache.org/jira/browse/SPARK-37489 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37489) Skip hasnans check in numops if eager_check disable
[ https://issues.apache.org/jira/browse/SPARK-37489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37489. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34746 [https://github.com/apache/spark/pull/34746] > Skip hasnans check in numops if eager_check disable > --- > > Key: SPARK-37489 > URL: https://issues.apache.org/jira/browse/SPARK-37489 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37499) Close HiveClientImpl.sessionState when shutdown
[ https://issues.apache.org/jira/browse/SPARK-37499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450884#comment-17450884 ] angerszhu commented on SPARK-37499: --- Raise PR soon > Close HiveClientImpl.sessionState when shutdown > --- > > Key: SPARK-37499 > URL: https://issues.apache.org/jira/browse/SPARK-37499 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > HiveClientImple not close sessionState after application shutdown, cause > remain many session files. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37499) Close HiveClientImpl.sessionState when shutdown
angerszhu created SPARK-37499: - Summary: Close HiveClientImpl.sessionState when shutdown Key: SPARK-37499 URL: https://issues.apache.org/jira/browse/SPARK-37499 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu HiveClientImple not close sessionState after application shutdown, cause remain many session files. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37497) Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi
[ https://issues.apache.org/jira/browse/SPARK-37497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450883#comment-17450883 ] Apache Spark commented on SPARK-37497: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/34751 > Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi > - > > Key: SPARK-37497 > URL: https://issues.apache.org/jira/browse/SPARK-37497 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37497) Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi
[ https://issues.apache.org/jira/browse/SPARK-37497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37497: Assignee: (was: Apache Spark) > Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi > - > > Key: SPARK-37497 > URL: https://issues.apache.org/jira/browse/SPARK-37497 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37497) Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi
[ https://issues.apache.org/jira/browse/SPARK-37497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450882#comment-17450882 ] Apache Spark commented on SPARK-37497: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/34751 > Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi > - > > Key: SPARK-37497 > URL: https://issues.apache.org/jira/browse/SPARK-37497 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37497) Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi
[ https://issues.apache.org/jira/browse/SPARK-37497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37497: Assignee: Apache Spark > Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi > - > > Key: SPARK-37497 > URL: https://issues.apache.org/jira/browse/SPARK-37497 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37498) test_reuse_worker_of_parallelize_range is flaky
Yikun Jiang created SPARK-37498: --- Summary: test_reuse_worker_of_parallelize_range is flaky Key: SPARK-37498 URL: https://issues.apache.org/jira/browse/SPARK-37498 Project: Spark Issue Type: Bug Components: PySpark, Tests Affects Versions: 3.3.0 Reporter: Yikun Jiang {code:java} ERROR [2.132s]: test_reuse_worker_of_parallelize_range (pyspark.tests.test_worker.WorkerReuseTest) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/tests/test_worker.py", line 195, in test_reuse_worker_of_parallelize_range self.assertTrue(pid in previous_pids) AssertionError: False is not true -- Ran 12 tests in 22.589s {code} [1] https://github.com/apache/spark/runs/1182154542?check_suite_focus=true [2] https://github.com/apache/spark/pull/33657#issuecomment-893969310 [3] https://github.com/Yikun/spark/runs/4362783540?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37497) Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi
Dongjoon Hyun created SPARK-37497: - Summary: Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi Key: SPARK-37497 URL: https://issues.apache.org/jira/browse/SPARK-37497 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37495) Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled
[ https://issues.apache.org/jira/browse/SPARK-37495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37495: Assignee: Apache Spark > Skip identical index checking of Series.compare when config > 'compute.eager_check' is disabled > - > > Key: SPARK-37495 > URL: https://issues.apache.org/jira/browse/SPARK-37495 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37495) Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled
[ https://issues.apache.org/jira/browse/SPARK-37495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37495: Assignee: (was: Apache Spark) > Skip identical index checking of Series.compare when config > 'compute.eager_check' is disabled > - > > Key: SPARK-37495 > URL: https://issues.apache.org/jira/browse/SPARK-37495 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37495) Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled
[ https://issues.apache.org/jira/browse/SPARK-37495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450875#comment-17450875 ] Apache Spark commented on SPARK-37495: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34750 > Skip identical index checking of Series.compare when config > 'compute.eager_check' is disabled > - > > Key: SPARK-37495 > URL: https://issues.apache.org/jira/browse/SPARK-37495 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37496) Migrate ReplaceTableAsSelectStatement to v2 command
Huaxin Gao created SPARK-37496: -- Summary: Migrate ReplaceTableAsSelectStatement to v2 command Key: SPARK-37496 URL: https://issues.apache.org/jira/browse/SPARK-37496 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by a sort
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Description: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-37487: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count and sum aggregate report twice the number of rows: {code} [info] - SPARK-37487: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging was: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count and sum aggregate report twice the number of rows: {code} [info] - SPARK-X: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging > CollectMetrics is executed twice if it is followed by a sort > > > Key: SPARK-37487 > URL: https://issues.apache.org/jira/browse/SPARK-37487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > Labels: correctness > > It is best examplified by this new UT in DataFrameCallbackSuite: > {code} > test("SPARK-37487: get observable metrics with sort by callback") { > val df = spark.range(100) >
[jira] [Resolved] (SPARK-37465) PySpark tests failing on Pandas 0.23
[ https://issues.apache.org/jira/browse/SPARK-37465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37465. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34717 [https://github.com/apache/spark/pull/34717] > PySpark tests failing on Pandas 0.23 > > > Key: SPARK-37465 > URL: https://issues.apache.org/jira/browse/SPARK-37465 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Willi Raschkowski >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0 > > > I was running Spark tests with Pandas {{0.23.4}} and got the error below. The > minimum Pandas version is currently {{0.23.2}} > [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. > Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix > (Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222] > in Pandas. > {code:java} > $ python/run-tests --testnames > 'pyspark.pandas.tests.data_type_ops.test_boolean_ops > BooleanOpsTest.test_floordiv' > ... > == > ERROR [5.785s]: test_floordiv > (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest) > -- > Traceback (most recent call last): > File > "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", > line 128, in test_floordiv > self.assert_eq(b_pser // b_pser.astype(int), b_psser // > b_psser.astype(int)) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1069, in wrapper > result = safe_na_op(lvalues, rvalues) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1033, in safe_na_op > return na_op(lvalues, rvalues) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1027, in na_op > result = missing.fill_zeros(result, x, y, op_name, fill_zeros) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", > line 641, in fill_zeros > signs = np.sign(y if name.startswith(('r', '__r')) else x) > TypeError: ufunc 'sign' did not contain a loop with signature matching types > dtype('bool') dtype('bool') > {code} > These are my relevant package versions: > {code:java} > $ conda list | grep -e numpy -e pyarrow -e pandas -e python > # packages in environment at /home/circleci/miniconda/envs/python3: > numpy 1.16.6 py36h0a8e133_3 > numpy-base1.16.6 py36h41b4c56_3 > pandas0.23.4 py36h04863e7_0 > pyarrow 1.0.1 py36h6200943_36_cpuconda-forge > python3.6.12 hcff3b4d_2anaconda > python-dateutil 2.8.1 py_0anaconda > python_abi3.6 1_cp36mconda-forg > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37465) PySpark tests failing on Pandas 0.23
[ https://issues.apache.org/jira/browse/SPARK-37465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37465: Assignee: Yikun Jiang (was: Hyukjin Kwon) > PySpark tests failing on Pandas 0.23 > > > Key: SPARK-37465 > URL: https://issues.apache.org/jira/browse/SPARK-37465 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Willi Raschkowski >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > > I was running Spark tests with Pandas {{0.23.4}} and got the error below. The > minimum Pandas version is currently {{0.23.2}} > [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. > Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix > (Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222] > in Pandas. > {code:java} > $ python/run-tests --testnames > 'pyspark.pandas.tests.data_type_ops.test_boolean_ops > BooleanOpsTest.test_floordiv' > ... > == > ERROR [5.785s]: test_floordiv > (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest) > -- > Traceback (most recent call last): > File > "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", > line 128, in test_floordiv > self.assert_eq(b_pser // b_pser.astype(int), b_psser // > b_psser.astype(int)) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1069, in wrapper > result = safe_na_op(lvalues, rvalues) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1033, in safe_na_op > return na_op(lvalues, rvalues) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1027, in na_op > result = missing.fill_zeros(result, x, y, op_name, fill_zeros) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", > line 641, in fill_zeros > signs = np.sign(y if name.startswith(('r', '__r')) else x) > TypeError: ufunc 'sign' did not contain a loop with signature matching types > dtype('bool') dtype('bool') > {code} > These are my relevant package versions: > {code:java} > $ conda list | grep -e numpy -e pyarrow -e pandas -e python > # packages in environment at /home/circleci/miniconda/envs/python3: > numpy 1.16.6 py36h0a8e133_3 > numpy-base1.16.6 py36h41b4c56_3 > pandas0.23.4 py36h04863e7_0 > pyarrow 1.0.1 py36h6200943_36_cpuconda-forge > python3.6.12 hcff3b4d_2anaconda > python-dateutil 2.8.1 py_0anaconda > python_abi3.6 1_cp36mconda-forg > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37465) PySpark tests failing on Pandas 0.23
[ https://issues.apache.org/jira/browse/SPARK-37465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37465: Assignee: Hyukjin Kwon > PySpark tests failing on Pandas 0.23 > > > Key: SPARK-37465 > URL: https://issues.apache.org/jira/browse/SPARK-37465 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Willi Raschkowski >Assignee: Hyukjin Kwon >Priority: Major > > I was running Spark tests with Pandas {{0.23.4}} and got the error below. The > minimum Pandas version is currently {{0.23.2}} > [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. > Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix > (Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222] > in Pandas. > {code:java} > $ python/run-tests --testnames > 'pyspark.pandas.tests.data_type_ops.test_boolean_ops > BooleanOpsTest.test_floordiv' > ... > == > ERROR [5.785s]: test_floordiv > (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest) > -- > Traceback (most recent call last): > File > "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", > line 128, in test_floordiv > self.assert_eq(b_pser // b_pser.astype(int), b_psser // > b_psser.astype(int)) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1069, in wrapper > result = safe_na_op(lvalues, rvalues) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1033, in safe_na_op > return na_op(lvalues, rvalues) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1027, in na_op > result = missing.fill_zeros(result, x, y, op_name, fill_zeros) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", > line 641, in fill_zeros > signs = np.sign(y if name.startswith(('r', '__r')) else x) > TypeError: ufunc 'sign' did not contain a loop with signature matching types > dtype('bool') dtype('bool') > {code} > These are my relevant package versions: > {code:java} > $ conda list | grep -e numpy -e pyarrow -e pandas -e python > # packages in environment at /home/circleci/miniconda/envs/python3: > numpy 1.16.6 py36h0a8e133_3 > numpy-base1.16.6 py36h41b4c56_3 > pandas0.23.4 py36h04863e7_0 > pyarrow 1.0.1 py36h6200943_36_cpuconda-forge > python3.6.12 hcff3b4d_2anaconda > python-dateutil 2.8.1 py_0anaconda > python_abi3.6 1_cp36mconda-forg > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37495) Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled
dch nguyen created SPARK-37495: -- Summary: Skip identical index checking of Series.compare when config 'compute.eager_check' is disabled Key: SPARK-37495 URL: https://issues.apache.org/jira/browse/SPARK-37495 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37492) Optimize Orc test code with withAllNativeOrcReaders
[ https://issues.apache.org/jira/browse/SPARK-37492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-37492: - Assignee: jiaan.geng > Optimize Orc test code with withAllNativeOrcReaders > --- > > Key: SPARK-37492 > URL: https://issues.apache.org/jira/browse/SPARK-37492 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37492) Optimize Orc test code with withAllNativeOrcReaders
[ https://issues.apache.org/jira/browse/SPARK-37492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-37492. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34723 [https://github.com/apache/spark/pull/34723] > Optimize Orc test code with withAllNativeOrcReaders > --- > > Key: SPARK-37492 > URL: https://issues.apache.org/jira/browse/SPARK-37492 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36850) Migrate CreateTableStatement to v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36850. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34060 [https://github.com/apache/spark/pull/34060] > Migrate CreateTableStatement to v2 command framework > > > Key: SPARK-36850 > URL: https://issues.apache.org/jira/browse/SPARK-36850 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36850) Migrate CreateTableStatement to v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36850: --- Assignee: Huaxin Gao > Migrate CreateTableStatement to v2 command framework > > > Key: SPARK-36850 > URL: https://issues.apache.org/jira/browse/SPARK-36850 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37452) Char and Varchar breaks backward compatibility between v3 and v2
[ https://issues.apache.org/jira/browse/SPARK-37452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37452: -- Fix Version/s: 3.1.3 > Char and Varchar breaks backward compatibility between v3 and v2 > > > Key: SPARK-37452 > URL: https://issues.apache.org/jira/browse/SPARK-37452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.3, 3.2.1, 3.3.0 > > > We will store table schema in table properties for the read-side to restore. > In Spark 3.1, we add char/varchar support natively. In some commands like > `create table`, `alter table` with these types, the char(n) or varchar(n) > will be stored directly to those properties. If a user uses Spark 2 to read > such a table it will fail to parse the schema. > A table can be a newly created one by Spark 3.1 and later or an existing one > modified by Spark 3.1 and on. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37472) Missing functionality in spark.pandas
[ https://issues.apache.org/jira/browse/SPARK-37472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37472: - Parent: SPARK-36394 Issue Type: Sub-task (was: New Feature) > Missing functionality in spark.pandas > - > > Key: SPARK-37472 > URL: https://issues.apache.org/jira/browse/SPARK-37472 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Rens Jochemsen >Priority: Major > > dear, > > I am missing the functionality to include local variables in the query method. > ``` > seg = 'A' > psdf.query("segment == @seg") > > ``` > > or > seg = ['A', 'B'] > psdf.query("segment == @seg") > ``` > > Furthermore I was wondering whether date-offset functionalty as > pd.offsets.MonthEnd will be added to future versions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35867) Enable vectorized read for VectorizedPlainValuesReader.readBooleans
[ https://issues.apache.org/jira/browse/SPARK-35867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-35867. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34611 [https://github.com/apache/spark/pull/34611] > Enable vectorized read for VectorizedPlainValuesReader.readBooleans > --- > > Key: SPARK-35867 > URL: https://issues.apache.org/jira/browse/SPARK-35867 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Assignee: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.3.0 > > > Currently we decode PLAIN encoded booleans as follow: > {code:java} > public final void readBooleans(int total, WritableColumnVector c, int > rowId) { > // TODO: properly vectorize this > for (int i = 0; i < total; i++) { > c.putBoolean(rowId + i, readBoolean()); > } > } > {code} > Ideally we should vectorize this. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35867) Enable vectorized read for VectorizedPlainValuesReader.readBooleans
[ https://issues.apache.org/jira/browse/SPARK-35867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-35867: Assignee: Kazuyuki Tanimura > Enable vectorized read for VectorizedPlainValuesReader.readBooleans > --- > > Key: SPARK-35867 > URL: https://issues.apache.org/jira/browse/SPARK-35867 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Assignee: Kazuyuki Tanimura >Priority: Minor > > Currently we decode PLAIN encoded booleans as follow: > {code:java} > public final void readBooleans(int total, WritableColumnVector c, int > rowId) { > // TODO: properly vectorize this > for (int i = 0; i < total; i++) { > c.putBoolean(rowId + i, readBoolean()); > } > } > {code} > Ideally we should vectorize this. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37493) expose driver gc time and duration time
[ https://issues.apache.org/jira/browse/SPARK-37493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450822#comment-17450822 ] zhoubin commented on SPARK-37493: - Issue resolved by pull request 34749 https://github.com/apache/spark/pull/34749 > expose driver gc time and duration time > --- > > Key: SPARK-37493 > URL: https://issues.apache.org/jira/browse/SPARK-37493 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: zhoubin >Priority: Major > > when we browse executor pages of driver side or history side sparkUI,driver's > gc statistics is not acquired obviously ,thus making it hard to deside how to > config the resource of driver. > > we can use the application time as driver's task duration time, and use > "TotalGCTime" to the JVMHeapMemory of ExecutorMetricType -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37486) an error occurred while using the udf jars located in the lakefs, a inner filesystem in Tencent Cloud DLC.
[ https://issues.apache.org/jira/browse/SPARK-37486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450820#comment-17450820 ] Apache Spark commented on SPARK-37486: -- User 'kevincmchen' has created a pull request for this issue: https://github.com/apache/spark/pull/34742 > an error occurred while using the udf jars located in the lakefs, a inner > filesystem in Tencent Cloud DLC. > -- > > Key: SPARK-37486 > URL: https://issues.apache.org/jira/browse/SPARK-37486 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kevin Pis >Priority: Major > > when using livy to execute sql statements that will call the udf jars located > in lakefs, a inner filesystem in Tencent Cloud DLC. it will threw the > following exceptions: > > {code:java} > 21/11/25 21:12:43 ERROR Session: Exception when executing code > java.lang.LinkageError: loader constraint violation: loader (instance of > sun/misc/Launcher$AppClassLoader) previously initiated loading for a > different type with name "com/qcloud/cos/auth/COSCredentials" > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:756) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2306) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2271) > at > org.apache.hadoop.conf.Configuration.getClasses(Configuration.java:2344) > at > org.apache.hadoop.fs.CosNUtils.loadCosProviderClasses(CosNUtils.java:68) > at > org.apache.hadoop.fs.CosFileSystem.initRangerClientImpl(CosFileSystem.java:848) > at org.apache.hadoop.fs.CosFileSystem.initialize(CosFileSystem.java:95) > at > com.tencent.cloud.fs.CompatibleFileSystem.initialize(CompatibleFileSystem.java:20) > at > com.tencent.cloud.fs.LakeFileSystem.initialize(LakeFileSystem.java:56) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2812) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) > at org.apache.hadoop.fs.FsUrlConnection.connect(FsUrlConnection.java:49) > at > org.apache.hadoop.fs.FsUrlConnection.getInputStream(FsUrlConnection.java:59) > at sun.net.www.protocol.jar.URLJarFile.retrieve(URLJarFile.java:214) > at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:71) > at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84) > at > sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122) > at > sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89) > at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:944) > at sun.misc.URLClassPath$JarLoader.access$800(URLClassPath.java:801) > at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:886) > at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:879) > at java.security.AccessController.doPrivileged(Native Method) > at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:878) > at sun.misc.URLClassPath$JarLoader.(URLClassPath.java:829) > at sun.misc.URLClassPath$3.run(URLClassPath.java:575) > at sun.misc.URLClassPath$3.run(URLClassPath.java:565) > at java.security.AccessController.doPrivileged(Native Method) > at sun.misc.URLClassPath.getLoader(URLClassPath.java:564) > at sun.misc.URLClassPath.getLoader(URLClassPath.java:529) > at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:494) > at sun.misc.URLClassPath.access$100(URLClassPath.java:66) > at sun.misc.URLClassPath$1.next(URLClassPath.
[jira] [Assigned] (SPARK-37493) expose driver gc time and duration time
[ https://issues.apache.org/jira/browse/SPARK-37493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37493: Assignee: (was: Apache Spark) > expose driver gc time and duration time > --- > > Key: SPARK-37493 > URL: https://issues.apache.org/jira/browse/SPARK-37493 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: zhoubin >Priority: Major > > when we browse executor pages of driver side or history side sparkUI,driver's > gc statistics is not acquired obviously ,thus making it hard to deside how to > config the resource of driver. > > we can use the application time as driver's task duration time, and use > "TotalGCTime" to the JVMHeapMemory of ExecutorMetricType -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37493) expose driver gc time and duration time
[ https://issues.apache.org/jira/browse/SPARK-37493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37493: Assignee: Apache Spark > expose driver gc time and duration time > --- > > Key: SPARK-37493 > URL: https://issues.apache.org/jira/browse/SPARK-37493 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: zhoubin >Assignee: Apache Spark >Priority: Major > > when we browse executor pages of driver side or history side sparkUI,driver's > gc statistics is not acquired obviously ,thus making it hard to deside how to > config the resource of driver. > > we can use the application time as driver's task duration time, and use > "TotalGCTime" to the JVMHeapMemory of ExecutorMetricType -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37493) expose driver gc time and duration time
[ https://issues.apache.org/jira/browse/SPARK-37493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450819#comment-17450819 ] Apache Spark commented on SPARK-37493: -- User 'summaryzb' has created a pull request for this issue: https://github.com/apache/spark/pull/34749 > expose driver gc time and duration time > --- > > Key: SPARK-37493 > URL: https://issues.apache.org/jira/browse/SPARK-37493 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: zhoubin >Priority: Major > > when we browse executor pages of driver side or history side sparkUI,driver's > gc statistics is not acquired obviously ,thus making it hard to deside how to > config the resource of driver. > > we can use the application time as driver's task duration time, and use > "TotalGCTime" to the JVMHeapMemory of ExecutorMetricType -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37482) Skip check monotonic increasing for Series.asof with 'compute.eager_check'
[ https://issues.apache.org/jira/browse/SPARK-37482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37482: Assignee: dch nguyen > Skip check monotonic increasing for Series.asof with 'compute.eager_check' > -- > > Key: SPARK-37482 > URL: https://issues.apache.org/jira/browse/SPARK-37482 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37482) Skip check monotonic increasing for Series.asof with 'compute.eager_check'
[ https://issues.apache.org/jira/browse/SPARK-37482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37482. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34737 [https://github.com/apache/spark/pull/34737] > Skip check monotonic increasing for Series.asof with 'compute.eager_check' > -- > > Key: SPARK-37482 > URL: https://issues.apache.org/jira/browse/SPARK-37482 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37484) Replace Get and getOrElse with getOrElse
[ https://issues.apache.org/jira/browse/SPARK-37484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37484. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34739 [https://github.com/apache/spark/pull/34739] > Replace Get and getOrElse with getOrElse > > > Key: SPARK-37484 > URL: https://issues.apache.org/jira/browse/SPARK-37484 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.3.0 > > > There are some combined calls of get and getOrElse that can be directly > replaced by getOrElse > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37484) Replace Get and getOrElse with getOrElse
[ https://issues.apache.org/jira/browse/SPARK-37484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37484: Assignee: Yang Jie > Replace Get and getOrElse with getOrElse > > > Key: SPARK-37484 > URL: https://issues.apache.org/jira/browse/SPARK-37484 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > > There are some combined calls of get and getOrElse that can be directly > replaced by getOrElse > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37485) Replace map with expressions which produce no result with foreach
[ https://issues.apache.org/jira/browse/SPARK-37485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37485. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34740 [https://github.com/apache/spark/pull/34740] > Replace map with expressions which produce no result with foreach > -- > > Key: SPARK-37485 > URL: https://issues.apache.org/jira/browse/SPARK-37485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.3.0 > > > Use foreach instead of map with expressions which produce no result. > > Before > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).map(functionWithNoReturnValue) {code} > > > After > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).foreach(functionWithNoReturnValue) {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37485) Replace map with expressions which produce no result with foreach
[ https://issues.apache.org/jira/browse/SPARK-37485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37485: Assignee: Yang Jie > Replace map with expressions which produce no result with foreach > -- > > Key: SPARK-37485 > URL: https://issues.apache.org/jira/browse/SPARK-37485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > > Use foreach instead of map with expressions which produce no result. > > Before > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).map(functionWithNoReturnValue) {code} > > > After > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).foreach(functionWithNoReturnValue) {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37494) Unify v1 and v2 options output of `SHOW CREATE TABLE` command
[ https://issues.apache.org/jira/browse/SPARK-37494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PengLei updated SPARK-37494: Summary: Unify v1 and v2 options output of `SHOW CREATE TABLE` command (was: Unify v1 and v2 option output of `SHOW CREATE TABLE` command) > Unify v1 and v2 options output of `SHOW CREATE TABLE` command > - > > Key: SPARK-37494 > URL: https://issues.apache.org/jira/browse/SPARK-37494 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37483) Support push down top N to JDBC data source V2
[ https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37483: --- Summary: Support push down top N to JDBC data source V2 (was: Support pushdown down top N to JDBC data source V2) > Support push down top N to JDBC data source V2 > -- > > Key: SPARK-37483 > URL: https://issues.apache.org/jira/browse/SPARK-37483 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37493) expose driver gc time and duration time
[ https://issues.apache.org/jira/browse/SPARK-37493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoubin updated SPARK-37493: Summary: expose driver gc time and duration time (was: expose driver gc time) > expose driver gc time and duration time > --- > > Key: SPARK-37493 > URL: https://issues.apache.org/jira/browse/SPARK-37493 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: zhoubin >Priority: Major > > when we browse executor pages of driver side or history side sparkUI,driver's > gc statistics is not acquired obviously ,thus making it hard to deside how to > config the resource of driver. > > we can use the application time as driver's task duration time, and use > "TotalGCTime" to the JVMHeapMemory of ExecutorMetricType -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37494) Unify v1 and v2 option output of `SHOW CREATE TABLE` command
PengLei created SPARK-37494: --- Summary: Unify v1 and v2 option output of `SHOW CREATE TABLE` command Key: SPARK-37494 URL: https://issues.apache.org/jira/browse/SPARK-37494 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: PengLei Fix For: 3.3.0 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37493) expose driver gc time
zhoubin created SPARK-37493: --- Summary: expose driver gc time Key: SPARK-37493 URL: https://issues.apache.org/jira/browse/SPARK-37493 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.1 Reporter: zhoubin when we browse executor pages of driver side or history side sparkUI,driver's gc statistics is not acquired obviously ,thus making it hard to deside how to config the resource of driver. we can use the application time as driver's task duration time, and use "TotalGCTime" to the JVMHeapMemory of ExecutorMetricType -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37492) Optimize Orc test code with withAllNativeOrcReaders
[ https://issues.apache.org/jira/browse/SPARK-37492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450795#comment-17450795 ] Apache Spark commented on SPARK-37492: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/34723 > Optimize Orc test code with withAllNativeOrcReaders > --- > > Key: SPARK-37492 > URL: https://issues.apache.org/jira/browse/SPARK-37492 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37492) Optimize Orc test code with withAllNativeOrcReaders
[ https://issues.apache.org/jira/browse/SPARK-37492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450794#comment-17450794 ] Apache Spark commented on SPARK-37492: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/34723 > Optimize Orc test code with withAllNativeOrcReaders > --- > > Key: SPARK-37492 > URL: https://issues.apache.org/jira/browse/SPARK-37492 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37492) Optimize Orc test code with withAllNativeOrcReaders
[ https://issues.apache.org/jira/browse/SPARK-37492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37492: Assignee: Apache Spark > Optimize Orc test code with withAllNativeOrcReaders > --- > > Key: SPARK-37492 > URL: https://issues.apache.org/jira/browse/SPARK-37492 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37492) Optimize Orc test code with withAllNativeOrcReaders
[ https://issues.apache.org/jira/browse/SPARK-37492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37492: Assignee: (was: Apache Spark) > Optimize Orc test code with withAllNativeOrcReaders > --- > > Key: SPARK-37492 > URL: https://issues.apache.org/jira/browse/SPARK-37492 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37492) Optimize Orc test code with withAllNativeOrcReaders
jiaan.geng created SPARK-37492: -- Summary: Optimize Orc test code with withAllNativeOrcReaders Key: SPARK-37492 URL: https://issues.apache.org/jira/browse/SPARK-37492 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37491) Fix Series.asof when values of the series is not sorted
dch nguyen created SPARK-37491: -- Summary: Fix Series.asof when values of the series is not sorted Key: SPARK-37491 URL: https://issues.apache.org/jira/browse/SPARK-37491 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen https://github.com/apache/spark/pull/34737#discussion_r758223279 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation
[ https://issues.apache.org/jira/browse/SPARK-37468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-37468. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34716 [https://github.com/apache/spark/pull/34716] > Support ANSI intervals and TimestampNTZ for UnionEstimation > --- > > Key: SPARK-37468 > URL: https://issues.apache.org/jira/browse/SPARK-37468 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.3.0 > > > Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. > But I think it can support those types because their underlying types are > integer or long, which UnionEstimation can compute stats for. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by a sort
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Labels: correctness (was: ) > CollectMetrics is executed twice if it is followed by a sort > > > Key: SPARK-37487 > URL: https://issues.apache.org/jira/browse/SPARK-37487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > Labels: correctness > > It is best examplified by this new UT in DataFrameCallbackSuite: > {code} > test("SPARK-X: get observable metrics with sort by callback") { > val df = spark.range(100) > .observe( > name = "my_event", > min($"id").as("min_val"), > max($"id").as("max_val"), > // Test unresolved alias > sum($"id"), > count(when($"id" % 2 === 0, 1)).as("num_even")) > .observe( > name = "other_event", > avg($"id").cast("int").as("avg_val")) > .sort($"id".desc) > validateObservedMetrics(df) > } > {code} > The count and sum aggregate report twice the number of rows: > {code} > [info] - SPARK-X: get observable metrics with sort by callback *** FAILED > *** (169 milliseconds) > [info] [0,99,9900,100] did not equal [0,99,4950,50] > (DataFrameCallbackSuite.scala:342) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > {code} > I could not figure out how this happes. Hopefully the UT can help with > debugging -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37490) Show hint if analyzer fails due to ANSI type coercion
[ https://issues.apache.org/jira/browse/SPARK-37490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37490: Assignee: Gengliang Wang (was: Apache Spark) > Show hint if analyzer fails due to ANSI type coercion > - > > Key: SPARK-37490 > URL: https://issues.apache.org/jira/browse/SPARK-37490 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Show hint in the error message if analysis failed only with ANSI type > coercion: > {code:java} > To fix the error, you might need to add explicit type casts. > To bypass the error with lenient type coercion rules, set > spark.sql.ansi.enabled as false. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37490) Show hint if analyzer fails due to ANSI type coercion
[ https://issues.apache.org/jira/browse/SPARK-37490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450593#comment-17450593 ] Apache Spark commented on SPARK-37490: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/34747 > Show hint if analyzer fails due to ANSI type coercion > - > > Key: SPARK-37490 > URL: https://issues.apache.org/jira/browse/SPARK-37490 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Show hint in the error message if analysis failed only with ANSI type > coercion: > {code:java} > To fix the error, you might need to add explicit type casts. > To bypass the error with lenient type coercion rules, set > spark.sql.ansi.enabled as false. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37490) Show hint if analyzer fails due to ANSI type coercion
[ https://issues.apache.org/jira/browse/SPARK-37490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37490: Assignee: Apache Spark (was: Gengliang Wang) > Show hint if analyzer fails due to ANSI type coercion > - > > Key: SPARK-37490 > URL: https://issues.apache.org/jira/browse/SPARK-37490 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Show hint in the error message if analysis failed only with ANSI type > coercion: > {code:java} > To fix the error, you might need to add explicit type casts. > To bypass the error with lenient type coercion rules, set > spark.sql.ansi.enabled as false. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37490) Show hint if analyzer fails due to ANSI type coercion
Gengliang Wang created SPARK-37490: -- Summary: Show hint if analyzer fails due to ANSI type coercion Key: SPARK-37490 URL: https://issues.apache.org/jira/browse/SPARK-37490 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Show hint in the error message if analysis failed only with ANSI type coercion: {code:java} To fix the error, you might need to add explicit type casts. To bypass the error with lenient type coercion rules, set spark.sql.ansi.enabled as false. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37489) Skip hasnans check in numops if eager_check disable
[ https://issues.apache.org/jira/browse/SPARK-37489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450569#comment-17450569 ] Apache Spark commented on SPARK-37489: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/34746 > Skip hasnans check in numops if eager_check disable > --- > > Key: SPARK-37489 > URL: https://issues.apache.org/jira/browse/SPARK-37489 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37489) Skip hasnans check in numops if eager_check disable
[ https://issues.apache.org/jira/browse/SPARK-37489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450567#comment-17450567 ] Apache Spark commented on SPARK-37489: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/34746 > Skip hasnans check in numops if eager_check disable > --- > > Key: SPARK-37489 > URL: https://issues.apache.org/jira/browse/SPARK-37489 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37489) Skip hasnans check in numops if eager_check disable
[ https://issues.apache.org/jira/browse/SPARK-37489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37489: Assignee: (was: Apache Spark) > Skip hasnans check in numops if eager_check disable > --- > > Key: SPARK-37489 > URL: https://issues.apache.org/jira/browse/SPARK-37489 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37489) Skip hasnans check in numops if eager_check disable
[ https://issues.apache.org/jira/browse/SPARK-37489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37489: Assignee: Apache Spark > Skip hasnans check in numops if eager_check disable > --- > > Key: SPARK-37489 > URL: https://issues.apache.org/jira/browse/SPARK-37489 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450560#comment-17450560 ] Apache Spark commented on SPARK-37391: -- User 'tdg5' has created a pull request for this issue: https://github.com/apache/spark/pull/34745 > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg > > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450559#comment-17450559 ] Apache Spark commented on SPARK-37391: -- User 'tdg5' has created a pull request for this issue: https://github.com/apache/spark/pull/34745 > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg > > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37391: Assignee: Apache Spark > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Assignee: Apache Spark >Priority: Major > Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg > > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37391: Assignee: (was: Apache Spark) > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg > > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37489) Skip hasnans check in numops if eager_check disable
Yikun Jiang created SPARK-37489: --- Summary: Skip hasnans check in numops if eager_check disable Key: SPARK-37489 URL: https://issues.apache.org/jira/browse/SPARK-37489 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37454) support expressions in time travel timestamp
[ https://issues.apache.org/jira/browse/SPARK-37454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450553#comment-17450553 ] Apache Spark commented on SPARK-37454: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/34744 > support expressions in time travel timestamp > > > Key: SPARK-37454 > URL: https://issues.apache.org/jira/browse/SPARK-37454 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37454) support expressions in time travel timestamp
[ https://issues.apache.org/jira/browse/SPARK-37454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450552#comment-17450552 ] Apache Spark commented on SPARK-37454: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/34744 > support expressions in time travel timestamp > > > Key: SPARK-37454 > URL: https://issues.apache.org/jira/browse/SPARK-37454 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37259) JDBC read is always going to wrap the query in a select statement
[ https://issues.apache.org/jira/browse/SPARK-37259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450525#comment-17450525 ] Kevin Appel commented on SPARK-37259: - [~petertoth] [~akhalymon] Thank you both for working on these patches, it took me a little bit to figure out how to test them but i got the Spark 3.3.0-SNAPSHOT compiled and then added both of your changes to different working copies and then recompile the spark-sql and then was able to test both of your changes. I added comments into the github pull request links with how the testing went so far > JDBC read is always going to wrap the query in a select statement > - > > Key: SPARK-37259 > URL: https://issues.apache.org/jira/browse/SPARK-37259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kevin Appel >Priority: Major > > The read jdbc is wrapping the query it sends to the database server inside a > select statement and there is no way to override this currently. > Initially I ran into this issue when trying to run a CTE query against SQL > server and it fails, the details of the failure is in these cases: > [https://github.com/microsoft/mssql-jdbc/issues/1340] > [https://github.com/microsoft/mssql-jdbc/issues/1657] > [https://github.com/microsoft/sql-spark-connector/issues/147] > https://issues.apache.org/jira/browse/SPARK-32825 > https://issues.apache.org/jira/browse/SPARK-34928 > I started to patch the code to get the query to run and ran into a few > different items, if there is a way to add these features to allow this code > path to run, this would be extremely helpful to running these type of edge > case queries. These are basic examples here the actual queries are much more > complex and would require significant time to rewrite. > Inside JDBCOptions.scala the query is being set to either, using the dbtable > this allows the query to be passed without modification > > {code:java} > name.trim > or > s"(${subquery}) SPARK_GEN_SUBQ_${curId.getAndIncrement()}" > {code} > > Inside JDBCRelation.scala this is going to try to get the schema for this > query, and this ends up running dialect.getSchemaQuery which is doing: > {code:java} > s"SELECT * FROM $table WHERE 1=0"{code} > Overriding the dialect here and initially just passing back the $table gets > passed here and to the next issue which is in the compute function in > JDBCRDD.scala > > {code:java} > val sqlText = s"SELECT $columnList FROM ${options.tableOrQuery} > $myTableSampleClause" + s" $myWhereClause $getGroupByClause $myLimitClause" > > {code} > > For these two queries, about a CTE query and using temp tables, finding out > the schema is difficult without actually running the query and for the temp > table if you run it in the schema check that will have the table now exist > and fail when it runs the actual query. > > The way I patched these is by doing these two items: > JDBCRDD.scala (compute) > > {code:java} > val runQueryAsIs = options.parameters.getOrElse("runQueryAsIs", > "false").toBoolean > val sqlText = if (runQueryAsIs) { > s"${options.tableOrQuery}" > } else { > s"SELECT $columnList FROM ${options.tableOrQuery} $myWhereClause" > } > {code} > JDBCRelation.scala (getSchema) > {code:java} > val useCustomSchema = jdbcOptions.parameters.getOrElse("useCustomSchema", > "false").toBoolean > if (useCustomSchema) { > val myCustomSchema = jdbcOptions.parameters.getOrElse("customSchema", > "").toString > val newSchema = CatalystSqlParser.parseTableSchema(myCustomSchema) > logInfo(s"Going to return the new $newSchema because useCustomSchema is > $useCustomSchema and passed in $myCustomSchema") > newSchema > } else { > val tableSchema = JDBCRDD.resolveTable(jdbcOptions) > jdbcOptions.customSchema match { > case Some(customSchema) => JdbcUtils.getCustomSchema( > tableSchema, customSchema, resolver) > case None => tableSchema > } > }{code} > > This is allowing the query to run as is, by using the dbtable option and then > provide a custom schema that will bypass the dialect schema check > > Test queries > > {code:java} > query1 = """ > SELECT 1 as DummyCOL > """ > query2 = """ > WITH DummyCTE AS > ( > SELECT 1 as DummyCOL > ) > SELECT * > FROM DummyCTE > """ > query3 = """ > (SELECT * > INTO #Temp1a > FROM > (SELECT @@VERSION as version) data > ) > (SELECT * > FROM > #Temp1a) > """ > {code} > > Test schema > > {code:java} > schema1 = """ > DummyXCOL INT > """ > schema2 = """ > DummyXCOL STRING > """ > {code} > > Test code > > {code:java} > jdbcDFWorking = ( > spark.read.format("jdbc") > .option("url", > f"jdbc:sqlserver://{server}:{port};dat
[jira] [Updated] (SPARK-37484) Replace Get and getOrElse with getOrElse
[ https://issues.apache.org/jira/browse/SPARK-37484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-37484: - Priority: Trivial (was: Minor) > Replace Get and getOrElse with getOrElse > > > Key: SPARK-37484 > URL: https://issues.apache.org/jira/browse/SPARK-37484 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Trivial > > There are some combined calls of get and getOrElse that can be directly > replaced by getOrElse > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37485) Replace map with expressions which produce no result with foreach
[ https://issues.apache.org/jira/browse/SPARK-37485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-37485: - Priority: Trivial (was: Minor) > Replace map with expressions which produce no result with foreach > -- > > Key: SPARK-37485 > URL: https://issues.apache.org/jira/browse/SPARK-37485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Trivial > > Use foreach instead of map with expressions which produce no result. > > Before > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).map(functionWithNoReturnValue) {code} > > > After > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).foreach(functionWithNoReturnValue) {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37488) With enough resources, the task may still be permanently pending
[ https://issues.apache.org/jira/browse/SPARK-37488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450492#comment-17450492 ] Apache Spark commented on SPARK-37488: -- User 'guiyanakuang' has created a pull request for this issue: https://github.com/apache/spark/pull/34743 > With enough resources, the task may still be permanently pending > > > Key: SPARK-37488 > URL: https://issues.apache.org/jira/browse/SPARK-37488 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.0 > Environment: Spark 3.1.2,Default Configuration >Reporter: Yiqun Zhang >Priority: Major > > {code:java} > // The online environment is actually hive partition data imported to tidb, > the code logic can be simplified as follows > SparkSession testApp = SparkSession.builder() > .master("local[*]") > .appName("test app") > .enableHiveSupport() > .getOrCreate(); > Dataset dataset = testApp.sql("select * from default.test where dt = > '20211129'"); > dataset.persist(StorageLevel.MEMORY_AND_DISK()); > dataset.count(); > {code} > I have observed that tasks are permanently blocked and reruns can always be > reproduced. > Since it is only reproducible online, I use the arthas runtime to see the > status of the function entries and returns within the TaskSetManager. > https://gist.github.com/guiyanakuang/431584f191645513552a937d16ae8fbd > NODE_LOCAL level, because the persist function is called, the > pendingTasks.forHost has a collection of pending tasks, but it points to the > machine where the block of partitioned data is located, and since the only > resource spark gets is the driver. In this case, it cannot be scheduled. > getAllowedLocalityLevel gives the wrong runlevel, so it cannot be run with > TaskLocality.Any > The task pending permanently because the scheduling time is very short and it > is too late to raise the runlevel with a timeout. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37488) With enough resources, the task may still be permanently pending
[ https://issues.apache.org/jira/browse/SPARK-37488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37488: Assignee: (was: Apache Spark) > With enough resources, the task may still be permanently pending > > > Key: SPARK-37488 > URL: https://issues.apache.org/jira/browse/SPARK-37488 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.0 > Environment: Spark 3.1.2,Default Configuration >Reporter: Yiqun Zhang >Priority: Major > > {code:java} > // The online environment is actually hive partition data imported to tidb, > the code logic can be simplified as follows > SparkSession testApp = SparkSession.builder() > .master("local[*]") > .appName("test app") > .enableHiveSupport() > .getOrCreate(); > Dataset dataset = testApp.sql("select * from default.test where dt = > '20211129'"); > dataset.persist(StorageLevel.MEMORY_AND_DISK()); > dataset.count(); > {code} > I have observed that tasks are permanently blocked and reruns can always be > reproduced. > Since it is only reproducible online, I use the arthas runtime to see the > status of the function entries and returns within the TaskSetManager. > https://gist.github.com/guiyanakuang/431584f191645513552a937d16ae8fbd > NODE_LOCAL level, because the persist function is called, the > pendingTasks.forHost has a collection of pending tasks, but it points to the > machine where the block of partitioned data is located, and since the only > resource spark gets is the driver. In this case, it cannot be scheduled. > getAllowedLocalityLevel gives the wrong runlevel, so it cannot be run with > TaskLocality.Any > The task pending permanently because the scheduling time is very short and it > is too late to raise the runlevel with a timeout. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37488) With enough resources, the task may still be permanently pending
[ https://issues.apache.org/jira/browse/SPARK-37488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37488: Assignee: Apache Spark > With enough resources, the task may still be permanently pending > > > Key: SPARK-37488 > URL: https://issues.apache.org/jira/browse/SPARK-37488 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.0 > Environment: Spark 3.1.2,Default Configuration >Reporter: Yiqun Zhang >Assignee: Apache Spark >Priority: Major > > {code:java} > // The online environment is actually hive partition data imported to tidb, > the code logic can be simplified as follows > SparkSession testApp = SparkSession.builder() > .master("local[*]") > .appName("test app") > .enableHiveSupport() > .getOrCreate(); > Dataset dataset = testApp.sql("select * from default.test where dt = > '20211129'"); > dataset.persist(StorageLevel.MEMORY_AND_DISK()); > dataset.count(); > {code} > I have observed that tasks are permanently blocked and reruns can always be > reproduced. > Since it is only reproducible online, I use the arthas runtime to see the > status of the function entries and returns within the TaskSetManager. > https://gist.github.com/guiyanakuang/431584f191645513552a937d16ae8fbd > NODE_LOCAL level, because the persist function is called, the > pendingTasks.forHost has a collection of pending tasks, but it points to the > machine where the block of partitioned data is located, and since the only > resource spark gets is the driver. In this case, it cannot be scheduled. > getAllowedLocalityLevel gives the wrong runlevel, so it cannot be run with > TaskLocality.Any > The task pending permanently because the scheduling time is very short and it > is too late to raise the runlevel with a timeout. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37488) With enough resources, the task may still be permanently pending
Yiqun Zhang created SPARK-37488: --- Summary: With enough resources, the task may still be permanently pending Key: SPARK-37488 URL: https://issues.apache.org/jira/browse/SPARK-37488 Project: Spark Issue Type: Bug Components: Scheduler, Spark Core Affects Versions: 3.2.0, 3.1.2, 3.0.3 Environment: Spark 3.1.2,Default Configuration Reporter: Yiqun Zhang {code:java} // The online environment is actually hive partition data imported to tidb, the code logic can be simplified as follows SparkSession testApp = SparkSession.builder() .master("local[*]") .appName("test app") .enableHiveSupport() .getOrCreate(); Dataset dataset = testApp.sql("select * from default.test where dt = '20211129'"); dataset.persist(StorageLevel.MEMORY_AND_DISK()); dataset.count(); {code} I have observed that tasks are permanently blocked and reruns can always be reproduced. Since it is only reproducible online, I use the arthas runtime to see the status of the function entries and returns within the TaskSetManager. https://gist.github.com/guiyanakuang/431584f191645513552a937d16ae8fbd NODE_LOCAL level, because the persist function is called, the pendingTasks.forHost has a collection of pending tasks, but it points to the machine where the block of partitioned data is located, and since the only resource spark gets is the driver. In this case, it cannot be scheduled. getAllowedLocalityLevel gives the wrong runlevel, so it cannot be run with TaskLocality.Any The task pending permanently because the scheduling time is very short and it is too late to raise the runlevel with a timeout. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by a sort
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Summary: CollectMetrics is executed twice if it is followed by a sort (was: CollectMetrics is executed twice if it is followed by an sort) > CollectMetrics is executed twice if it is followed by a sort > > > Key: SPARK-37487 > URL: https://issues.apache.org/jira/browse/SPARK-37487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > > It is best examplified by this new UT in DataFrameCallbackSuite: > {code} > test("SPARK-X: get observable metrics with sort by callback") { > val df = spark.range(100) > .observe( > name = "my_event", > min($"id").as("min_val"), > max($"id").as("max_val"), > // Test unresolved alias > sum($"id"), > count(when($"id" % 2 === 0, 1)).as("num_even")) > .observe( > name = "other_event", > avg($"id").cast("int").as("avg_val")) > .sort($"id".desc) > validateObservedMetrics(df) > } > {code} > The count and sum aggregate report twice the number of rows: > {code} > [info] - SPARK-X: get observable metrics with sort by callback *** FAILED > *** (169 milliseconds) > [info] [0,99,9900,100] did not equal [0,99,4950,50] > (DataFrameCallbackSuite.scala:342) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > {code} > I could not figure out how this happes. Hopefully the UT can help with > debugging -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450452#comment-17450452 ] Tanel Kiis commented on SPARK-37487: [~cloud_fan] and [~sarutak], you helped with the last CollectMetrics bug. Perhaps you have some idea, why this is happening. > CollectMetrics is executed twice if it is followed by an sort > - > > Key: SPARK-37487 > URL: https://issues.apache.org/jira/browse/SPARK-37487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > > It is best examplified by this new UT in DataFrameCallbackSuite: > {code} > test("SPARK-X: get observable metrics with sort by callback") { > val df = spark.range(100) > .observe( > name = "my_event", > min($"id").as("min_val"), > max($"id").as("max_val"), > // Test unresolved alias > sum($"id"), > count(when($"id" % 2 === 0, 1)).as("num_even")) > .observe( > name = "other_event", > avg($"id").cast("int").as("avg_val")) > .sort($"id".desc) > validateObservedMetrics(df) > } > {code} > The count and sum aggregate report twice the number of rows: > {code} > [info] - SPARK-X: get observable metrics with sort by callback *** FAILED > *** (169 milliseconds) > [info] [0,99,9900,100] did not equal [0,99,4950,50] > (DataFrameCallbackSuite.scala:342) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) > [info] at > org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > {code} > I could not figure out how this happes. Hopefully the UT can help with > debugging -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Description: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count and sum aggregate reports twice the number of rows: {code} [info] - SPARK-X: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging was: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count aggregate reports twice the number of rows: {code} [info] - SPARK-X: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging > CollectMetrics is executed twice if it is followed by an sort > - > > Key: SPARK-37487 > URL: https://issues.apache.org/jira/browse/SPARK-37487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > > It is best examplified by this new UT in DataFrameCallbackSuite: > {code} > test("SPARK-X: get observable metrics with sort by callback") { > val df = spark.range(100) > .observe( > name = "my_e
[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Description: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count aggregate reports twice the number of rows: {code} [info] - SPARK-X: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging was: It is bets examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count aggregate reports twice the number of rows: {code} [info] - SPARK-X: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging > CollectMetrics is executed twice if it is followed by an sort > - > > Key: SPARK-37487 > URL: https://issues.apache.org/jira/browse/SPARK-37487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > > It is best examplified by this new UT in DataFrameCallbackSuite: > {code} > test("SPARK-X: get observable metrics with sort by callback") { > val df = spark.range(100) > .observe( > name = "my_event", >
[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Description: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count and sum aggregate report twice the number of rows: {code} [info] - SPARK-X: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging was: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count and sum aggregate reports twice the number of rows: {code} [info] - SPARK-X: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging > CollectMetrics is executed twice if it is followed by an sort > - > > Key: SPARK-37487 > URL: https://issues.apache.org/jira/browse/SPARK-37487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > > It is best examplified by this new UT in DataFrameCallbackSuite: > {code} > test("SPARK-X: get observable metrics with sort by callback") { > val df = spark.range(100) > .observe( > name
[jira] [Created] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort
Tanel Kiis created SPARK-37487: -- Summary: CollectMetrics is executed twice if it is followed by an sort Key: SPARK-37487 URL: https://issues.apache.org/jira/browse/SPARK-37487 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Tanel Kiis It is bets examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X: get observable metrics with sort by callback") { val df = spark.range(100) .observe( name = "my_event", min($"id").as("min_val"), max($"id").as("max_val"), // Test unresolved alias sum($"id"), count(when($"id" % 2 === 0, 1)).as("num_even")) .observe( name = "other_event", avg($"id").cast("int").as("avg_val")) .sort($"id".desc) validateObservedMetrics(df) } {code} The count aggregate reports twice the number of rows: {code} [info] - SPARK-X: get observable metrics with sort by callback *** FAILED *** (169 milliseconds) [info] [0,99,9900,100] did not equal [0,99,4950,50] (DataFrameCallbackSuite.scala:342) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.checkMetrics$1(DataFrameCallbackSuite.scala:342) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.validateObservedMetrics(DataFrameCallbackSuite.scala:350) [info] at org.apache.spark.sql.util.DataFrameCallbackSuite.$anonfun$new$21(DataFrameCallbackSuite.scala:324) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) {code} I could not figure out how this happes. Hopefully the UT can help with debugging -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37463) Read/Write Timestamp ntz from/to Orc uses UTC time zone
[ https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450396#comment-17450396 ] Apache Spark commented on SPARK-37463: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/34741 > Read/Write Timestamp ntz from/to Orc uses UTC time zone > --- > > Key: SPARK-37463 > URL: https://issues.apache.org/jira/browse/SPARK-37463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > There are some example code: > import java.util.TimeZone > TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) > sql("set spark.sql.session.timeZone=America/Los_Angeles") > val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp > '2021-06-01 00:00:00' ts") > df.write.mode("overwrite").orc("ts_ntz_orc") > df.write.mode("overwrite").parquet("ts_ntz_parquet") > df.write.mode("overwrite").format("avro").save("ts_ntz_avro") > val query = """ > select 'orc', * > from `orc`.`ts_ntz_orc` > union all > select 'parquet', * > from `parquet`.`ts_ntz_parquet` > union all > select 'avro', * > from `avro`.`ts_ntz_avro` > """ > val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam") > for (tz <- tzs) { > TimeZone.setDefault(TimeZone.getTimeZone(tz)) > sql(s"set spark.sql.session.timeZone=$tz") > println(s"Time zone is ${TimeZone.getDefault.getID}") > sql(query).show(false) > } > The output show below looks so strange. > Time zone is America/Los_Angeles > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-06-01 00:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 00:00:00| > +---+---+---+ > Time zone is UTC > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 17:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 07:00:00| > +---+---+---+ > Time zone is Europe/Amsterdam > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 15:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 09:00:00| > +---+---+---+ -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37463) Read/Write Timestamp ntz from/to Orc uses UTC time zone
[ https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450395#comment-17450395 ] Apache Spark commented on SPARK-37463: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/34741 > Read/Write Timestamp ntz from/to Orc uses UTC time zone > --- > > Key: SPARK-37463 > URL: https://issues.apache.org/jira/browse/SPARK-37463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > There are some example code: > import java.util.TimeZone > TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) > sql("set spark.sql.session.timeZone=America/Los_Angeles") > val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp > '2021-06-01 00:00:00' ts") > df.write.mode("overwrite").orc("ts_ntz_orc") > df.write.mode("overwrite").parquet("ts_ntz_parquet") > df.write.mode("overwrite").format("avro").save("ts_ntz_avro") > val query = """ > select 'orc', * > from `orc`.`ts_ntz_orc` > union all > select 'parquet', * > from `parquet`.`ts_ntz_parquet` > union all > select 'avro', * > from `avro`.`ts_ntz_avro` > """ > val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam") > for (tz <- tzs) { > TimeZone.setDefault(TimeZone.getTimeZone(tz)) > sql(s"set spark.sql.session.timeZone=$tz") > println(s"Time zone is ${TimeZone.getDefault.getID}") > sql(query).show(false) > } > The output show below looks so strange. > Time zone is America/Los_Angeles > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-06-01 00:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 00:00:00| > +---+---+---+ > Time zone is UTC > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 17:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 07:00:00| > +---+---+---+ > Time zone is Europe/Amsterdam > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 15:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 09:00:00| > +---+---+---+ -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37463) Read/Write Timestamp ntz from/to Orc uses UTC timestamp
[ https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37463: --- Summary: Read/Write Timestamp ntz from/to Orc uses UTC timestamp (was: Read/Write Timestamp ntz to Orc uses UTC timestamp) > Read/Write Timestamp ntz from/to Orc uses UTC timestamp > --- > > Key: SPARK-37463 > URL: https://issues.apache.org/jira/browse/SPARK-37463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > There are some example code: > import java.util.TimeZone > TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) > sql("set spark.sql.session.timeZone=America/Los_Angeles") > val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp > '2021-06-01 00:00:00' ts") > df.write.mode("overwrite").orc("ts_ntz_orc") > df.write.mode("overwrite").parquet("ts_ntz_parquet") > df.write.mode("overwrite").format("avro").save("ts_ntz_avro") > val query = """ > select 'orc', * > from `orc`.`ts_ntz_orc` > union all > select 'parquet', * > from `parquet`.`ts_ntz_parquet` > union all > select 'avro', * > from `avro`.`ts_ntz_avro` > """ > val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam") > for (tz <- tzs) { > TimeZone.setDefault(TimeZone.getTimeZone(tz)) > sql(s"set spark.sql.session.timeZone=$tz") > println(s"Time zone is ${TimeZone.getDefault.getID}") > sql(query).show(false) > } > The output show below looks so strange. > Time zone is America/Los_Angeles > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-06-01 00:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 00:00:00| > +---+---+---+ > Time zone is UTC > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 17:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 07:00:00| > +---+---+---+ > Time zone is Europe/Amsterdam > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 15:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 09:00:00| > +---+---+---+ -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37463) Read/Write Timestamp ntz from/to Orc uses UTC time zone
[ https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37463: --- Summary: Read/Write Timestamp ntz from/to Orc uses UTC time zone (was: Read/Write Timestamp ntz from/to Orc uses UTC timestamp) > Read/Write Timestamp ntz from/to Orc uses UTC time zone > --- > > Key: SPARK-37463 > URL: https://issues.apache.org/jira/browse/SPARK-37463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > There are some example code: > import java.util.TimeZone > TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) > sql("set spark.sql.session.timeZone=America/Los_Angeles") > val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp > '2021-06-01 00:00:00' ts") > df.write.mode("overwrite").orc("ts_ntz_orc") > df.write.mode("overwrite").parquet("ts_ntz_parquet") > df.write.mode("overwrite").format("avro").save("ts_ntz_avro") > val query = """ > select 'orc', * > from `orc`.`ts_ntz_orc` > union all > select 'parquet', * > from `parquet`.`ts_ntz_parquet` > union all > select 'avro', * > from `avro`.`ts_ntz_avro` > """ > val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam") > for (tz <- tzs) { > TimeZone.setDefault(TimeZone.getTimeZone(tz)) > sql(s"set spark.sql.session.timeZone=$tz") > println(s"Time zone is ${TimeZone.getDefault.getID}") > sql(query).show(false) > } > The output show below looks so strange. > Time zone is America/Los_Angeles > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-06-01 00:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 00:00:00| > +---+---+---+ > Time zone is UTC > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 17:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 07:00:00| > +---+---+---+ > Time zone is Europe/Amsterdam > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 15:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 09:00:00| > +---+---+---+ -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37463) Read/Write Timestamp ntz to Orc uses UTC timestamp
[ https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37463: --- Summary: Read/Write Timestamp ntz to Orc uses UTC timestamp (was: Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp) > Read/Write Timestamp ntz to Orc uses UTC timestamp > -- > > Key: SPARK-37463 > URL: https://issues.apache.org/jira/browse/SPARK-37463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > There are some example code: > import java.util.TimeZone > TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) > sql("set spark.sql.session.timeZone=America/Los_Angeles") > val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp > '2021-06-01 00:00:00' ts") > df.write.mode("overwrite").orc("ts_ntz_orc") > df.write.mode("overwrite").parquet("ts_ntz_parquet") > df.write.mode("overwrite").format("avro").save("ts_ntz_avro") > val query = """ > select 'orc', * > from `orc`.`ts_ntz_orc` > union all > select 'parquet', * > from `parquet`.`ts_ntz_parquet` > union all > select 'avro', * > from `avro`.`ts_ntz_avro` > """ > val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam") > for (tz <- tzs) { > TimeZone.setDefault(TimeZone.getTimeZone(tz)) > sql(s"set spark.sql.session.timeZone=$tz") > println(s"Time zone is ${TimeZone.getDefault.getID}") > sql(query).show(false) > } > The output show below looks so strange. > Time zone is America/Los_Angeles > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-06-01 00:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 00:00:00| > +---+---+---+ > Time zone is UTC > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 17:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 07:00:00| > +---+---+---+ > Time zone is Europe/Amsterdam > +---+---+---+ > |orc|ts_ntz |ts | > +---+---+---+ > |orc|2021-05-31 15:00:00|2021-06-01 00:00:00| > |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00| > |avro |2021-06-01 00:00:00|2021-06-01 09:00:00| > +---+---+---+ -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37485) Replace map with expressions which produce no result with foreach
[ https://issues.apache.org/jira/browse/SPARK-37485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37485: Assignee: (was: Apache Spark) > Replace map with expressions which produce no result with foreach > -- > > Key: SPARK-37485 > URL: https://issues.apache.org/jira/browse/SPARK-37485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > Use foreach instead of map with expressions which produce no result. > > Before > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).map(functionWithNoReturnValue) {code} > > > After > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).foreach(functionWithNoReturnValue) {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37485) Replace map with expressions which produce no result with foreach
[ https://issues.apache.org/jira/browse/SPARK-37485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450390#comment-17450390 ] Apache Spark commented on SPARK-37485: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34740 > Replace map with expressions which produce no result with foreach > -- > > Key: SPARK-37485 > URL: https://issues.apache.org/jira/browse/SPARK-37485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > Use foreach instead of map with expressions which produce no result. > > Before > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).map(functionWithNoReturnValue) {code} > > > After > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).foreach(functionWithNoReturnValue) {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37485) Replace map with expressions which produce no result with foreach
[ https://issues.apache.org/jira/browse/SPARK-37485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37485: Assignee: Apache Spark > Replace map with expressions which produce no result with foreach > -- > > Key: SPARK-37485 > URL: https://issues.apache.org/jira/browse/SPARK-37485 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > Use foreach instead of map with expressions which produce no result. > > Before > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).map(functionWithNoReturnValue) {code} > > > After > > {code:java} > def functionWithNoReturnValue: Unit = {} > Seq(1, 2).foreach(functionWithNoReturnValue) {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37485) Replace map with expressions which produce no result with foreach
Yang Jie created SPARK-37485: Summary: Replace map with expressions which produce no result with foreach Key: SPARK-37485 URL: https://issues.apache.org/jira/browse/SPARK-37485 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Yang Jie Use foreach instead of map with expressions which produce no result. Before {code:java} def functionWithNoReturnValue: Unit = {} Seq(1, 2).map(functionWithNoReturnValue) {code} After {code:java} def functionWithNoReturnValue: Unit = {} Seq(1, 2).foreach(functionWithNoReturnValue) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37484) Replace Get and getOrElse with getOrElse
[ https://issues.apache.org/jira/browse/SPARK-37484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37484: Assignee: (was: Apache Spark) > Replace Get and getOrElse with getOrElse > > > Key: SPARK-37484 > URL: https://issues.apache.org/jira/browse/SPARK-37484 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > There are some combined calls of get and getOrElse that can be directly > replaced by getOrElse > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37484) Replace Get and getOrElse with getOrElse
[ https://issues.apache.org/jira/browse/SPARK-37484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450375#comment-17450375 ] Apache Spark commented on SPARK-37484: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34739 > Replace Get and getOrElse with getOrElse > > > Key: SPARK-37484 > URL: https://issues.apache.org/jira/browse/SPARK-37484 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > There are some combined calls of get and getOrElse that can be directly > replaced by getOrElse > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37484) Replace Get and getOrElse with getOrElse
[ https://issues.apache.org/jira/browse/SPARK-37484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37484: Assignee: Apache Spark > Replace Get and getOrElse with getOrElse > > > Key: SPARK-37484 > URL: https://issues.apache.org/jira/browse/SPARK-37484 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > There are some combined calls of get and getOrElse that can be directly > replaced by getOrElse > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37484) Replace Get and getOrElse with getOrElse
Yang Jie created SPARK-37484: Summary: Replace Get and getOrElse with getOrElse Key: SPARK-37484 URL: https://issues.apache.org/jira/browse/SPARK-37484 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.3.0 Reporter: Yang Jie There are some combined calls of get and getOrElse that can be directly replaced by getOrElse -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37483) Support pushdown down top N to JDBC data source V2
[ https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450316#comment-17450316 ] Apache Spark commented on SPARK-37483: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/34738 > Support pushdown down top N to JDBC data source V2 > -- > > Key: SPARK-37483 > URL: https://issues.apache.org/jira/browse/SPARK-37483 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37482) Skip check monotonic increasing for Series.asof with 'compute.eager_check'
[ https://issues.apache.org/jira/browse/SPARK-37482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450315#comment-17450315 ] Apache Spark commented on SPARK-37482: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34737 > Skip check monotonic increasing for Series.asof with 'compute.eager_check' > -- > > Key: SPARK-37482 > URL: https://issues.apache.org/jira/browse/SPARK-37482 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org