[jira] [Assigned] (SPARK-39063) Remove `finalize()` from `LevelDB/RocksDBIterator`
[ https://issues.apache.org/jira/browse/SPARK-39063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39063: Assignee: Apache Spark > Remove `finalize()` from `LevelDB/RocksDBIterator` > -- > > Key: SPARK-39063 > URL: https://issues.apache.org/jira/browse/SPARK-39063 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > After SPARK-38896, all `LevelDB/RocksDBIterator` handle open by > `LevelDB/RocksDB.view` method already closed by `tryWithResource` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39063) Remove `finalize()` from `LevelDB/RocksDBIterator`
[ https://issues.apache.org/jira/browse/SPARK-39063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529779#comment-17529779 ] Apache Spark commented on SPARK-39063: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36403 > Remove `finalize()` from `LevelDB/RocksDBIterator` > -- > > Key: SPARK-39063 > URL: https://issues.apache.org/jira/browse/SPARK-39063 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > After SPARK-38896, all `LevelDB/RocksDBIterator` handle open by > `LevelDB/RocksDB.view` method already closed by `tryWithResource` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39063) Remove `finalize()` from `LevelDB/RocksDBIterator`
[ https://issues.apache.org/jira/browse/SPARK-39063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39063: Assignee: (was: Apache Spark) > Remove `finalize()` from `LevelDB/RocksDBIterator` > -- > > Key: SPARK-39063 > URL: https://issues.apache.org/jira/browse/SPARK-39063 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > After SPARK-38896, all `LevelDB/RocksDBIterator` handle open by > `LevelDB/RocksDB.view` method already closed by `tryWithResource` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39063) Remove `finalize()` from `LevelDB/RocksDBIterator`
[ https://issues.apache.org/jira/browse/SPARK-39063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529780#comment-17529780 ] Apache Spark commented on SPARK-39063: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36403 > Remove `finalize()` from `LevelDB/RocksDBIterator` > -- > > Key: SPARK-39063 > URL: https://issues.apache.org/jira/browse/SPARK-39063 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > After SPARK-38896, all `LevelDB/RocksDBIterator` handle open by > `LevelDB/RocksDB.view` method already closed by `tryWithResource` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39063) Remove `finalize()` from `LevelDB/RocksDBIterator`
Yang Jie created SPARK-39063: Summary: Remove `finalize()` from `LevelDB/RocksDBIterator` Key: SPARK-39063 URL: https://issues.apache.org/jira/browse/SPARK-39063 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Yang Jie After SPARK-38896, all `LevelDB/RocksDBIterator` handle open by `LevelDB/RocksDB.view` method already closed by `tryWithResource` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38085) DataSource V2: Handle DELETE commands for group-based sources
[ https://issues.apache.org/jira/browse/SPARK-38085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529768#comment-17529768 ] Apache Spark commented on SPARK-38085: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/36402 > DataSource V2: Handle DELETE commands for group-based sources > - > > Key: SPARK-38085 > URL: https://issues.apache.org/jira/browse/SPARK-38085 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.3.0 > > > As per SPARK-35801, we should handle DELETE statements for sources that can > replace groups of data (e.g. partitions, files). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39040) Respect NaNvl in EquivalentExpressions for expression elimination
[ https://issues.apache.org/jira/browse/SPARK-39040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39040: --- Assignee: XiDuo You > Respect NaNvl in EquivalentExpressions for expression elimination > - > > Key: SPARK-39040 > URL: https://issues.apache.org/jira/browse/SPARK-39040 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.3.0 > > > For example the query will fail: > {code:java} > set spark.sql.ansi.enabled=true; > set > spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding; > SELECT nanvl(1, 1/0 + 1/0); {code} > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 > (TID 4) (10.221.98.68 executor driver): > org.apache.spark.SparkArithmeticException: divide by zero. To return NULL > instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false > (except for ANSI interval type) to bypass this error. > == SQL(line 1, position 17) == > select nanvl(1 , 1/0 + 1/0) > ^^^ at > org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:151) > {code} > We should respect the ordering of conditional expression that always evaluate > the predicate branch first, so the query above should not fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39040) Respect NaNvl in EquivalentExpressions for expression elimination
[ https://issues.apache.org/jira/browse/SPARK-39040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39040. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36376 [https://github.com/apache/spark/pull/36376] > Respect NaNvl in EquivalentExpressions for expression elimination > - > > Key: SPARK-39040 > URL: https://issues.apache.org/jira/browse/SPARK-39040 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > Fix For: 3.3.0 > > > For example the query will fail: > {code:java} > set spark.sql.ansi.enabled=true; > set > spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding; > SELECT nanvl(1, 1/0 + 1/0); {code} > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 > (TID 4) (10.221.98.68 executor driver): > org.apache.spark.SparkArithmeticException: divide by zero. To return NULL > instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false > (except for ANSI interval type) to bypass this error. > == SQL(line 1, position 17) == > select nanvl(1 , 1/0 + 1/0) > ^^^ at > org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:151) > {code} > We should respect the ordering of conditional expression that always evaluate > the predicate branch first, so the query above should not fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38993) Impl DataFrame.boxplot and DataFrame.plot.box
[ https://issues.apache.org/jira/browse/SPARK-38993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38993. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36317 [https://github.com/apache/spark/pull/36317] > Impl DataFrame.boxplot and DataFrame.plot.box > - > > Key: SPARK-38993 > URL: https://issues.apache.org/jira/browse/SPARK-38993 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38993) Impl DataFrame.boxplot and DataFrame.plot.box
[ https://issues.apache.org/jira/browse/SPARK-38993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38993: Assignee: zhengruifeng > Impl DataFrame.boxplot and DataFrame.plot.box > - > > Key: SPARK-38993 > URL: https://issues.apache.org/jira/browse/SPARK-38993 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39054) GroupByTest failed due to axis Length mismatch
[ https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529751#comment-17529751 ] Yikun Jiang commented on SPARK-39054: - [https://github.com/apache/spark/blob/973283c33ad908d071550e9be92a4fca76a8a9df/python/pyspark/pandas/groupby.py#L1377] Behavior changed in here > GroupByTest failed due to axis Length mismatch > -- > > Key: SPARK-39054 > URL: https://issues.apache.org/jira/browse/SPARK-39054 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > {code:java} > An error occurred while calling o27083.getResult. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) > at > org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97) > at > org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93) > at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:282) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at > py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) > at py4j.ClientServerConnection.run(ClientServerConnection.java:106) > at java.lang.Thread.run(Thread.java:750) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in > stage 808.0 (TID 650) (localhost executor driver): > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, > in main > process() > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, > in process > serializer.dump_stream(out_iter, outfile) > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 343, in dump_stream > return ArrowStreamSerializer.dump_stream(self, > init_stream_yield_batches(), stream) > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 84, in dump_stream > for batch in iterator: > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 336, in init_stream_yield_batches > for series in iterator: > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, > in mapper > return f(keys, vals) > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, > in > return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, > in wrapped > result = f(pd.concat(value_series, axis=1)) > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in > wrapper > return f(*args, **kwargs) > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in > rename_output > pdf.columns = return_schema.names > File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line > 5588, in __setattr__ > return object.__setattr__(self, name, value) > File "pandas/_libs/properties.pyx", line 70, in > pandas._libs.properties.AxisProperty.__set__ > File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line > 769, in _set_axis > self._mgr.set_axis(axis, labels) > File > "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", > line 214, in set_axis > self._validate_set_axis(axis, new_labels) > File > "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line > 69, in _validate_set_axis > raise ValueError( > ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 > elements {code} > > GroupByTest.test_apply_with_new_dataframe_without_shortcut -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39054) GroupByTest failed due to axis Length mismatch
[ https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529750#comment-17529750 ] Yikun Jiang commented on SPARK-39054: - https://github.com/pandas-dev/pandas/issues/46893 > GroupByTest failed due to axis Length mismatch > -- > > Key: SPARK-39054 > URL: https://issues.apache.org/jira/browse/SPARK-39054 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > {code:java} > An error occurred while calling o27083.getResult. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) > at > org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97) > at > org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93) > at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:282) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at > py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) > at py4j.ClientServerConnection.run(ClientServerConnection.java:106) > at java.lang.Thread.run(Thread.java:750) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in > stage 808.0 (TID 650) (localhost executor driver): > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, > in main > process() > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, > in process > serializer.dump_stream(out_iter, outfile) > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 343, in dump_stream > return ArrowStreamSerializer.dump_stream(self, > init_stream_yield_batches(), stream) > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 84, in dump_stream > for batch in iterator: > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 336, in init_stream_yield_batches > for series in iterator: > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, > in mapper > return f(keys, vals) > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, > in > return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, > in wrapped > result = f(pd.concat(value_series, axis=1)) > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in > wrapper > return f(*args, **kwargs) > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in > rename_output > pdf.columns = return_schema.names > File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line > 5588, in __setattr__ > return object.__setattr__(self, name, value) > File "pandas/_libs/properties.pyx", line 70, in > pandas._libs.properties.AxisProperty.__set__ > File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line > 769, in _set_axis > self._mgr.set_axis(axis, labels) > File > "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", > line 214, in set_axis > self._validate_set_axis(axis, new_labels) > File > "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line > 69, in _validate_set_axis > raise ValueError( > ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 > elements {code} > > GroupByTest.test_apply_with_new_dataframe_without_shortcut -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39035) Add tests for options from `to_csv` and `from_csv`.
[ https://issues.apache.org/jira/browse/SPARK-39035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529737#comment-17529737 ] Apache Spark commented on SPARK-39035: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/36401 > Add tests for options from `to_csv` and `from_csv`. > --- > > Key: SPARK-39035 > URL: https://issues.apache.org/jira/browse/SPARK-39035 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39035) Add tests for options from `to_csv` and `from_csv`.
[ https://issues.apache.org/jira/browse/SPARK-39035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39035: Assignee: (was: Apache Spark) > Add tests for options from `to_csv` and `from_csv`. > --- > > Key: SPARK-39035 > URL: https://issues.apache.org/jira/browse/SPARK-39035 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39035) Add tests for options from `to_csv` and `from_csv`.
[ https://issues.apache.org/jira/browse/SPARK-39035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529736#comment-17529736 ] Apache Spark commented on SPARK-39035: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/36401 > Add tests for options from `to_csv` and `from_csv`. > --- > > Key: SPARK-39035 > URL: https://issues.apache.org/jira/browse/SPARK-39035 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39035) Add tests for options from `to_csv` and `from_csv`.
[ https://issues.apache.org/jira/browse/SPARK-39035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39035: Assignee: Apache Spark > Add tests for options from `to_csv` and `from_csv`. > --- > > Key: SPARK-39035 > URL: https://issues.apache.org/jira/browse/SPARK-39035 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38748) Test the error class: PIVOT_VALUE_DATA_TYPE_MISMATCH
[ https://issues.apache.org/jira/browse/SPARK-38748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38748: Assignee: Apache Spark > Test the error class: PIVOT_VALUE_DATA_TYPE_MISMATCH > > > Key: SPARK-38748 > URL: https://issues.apache.org/jira/browse/SPARK-38748 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > > Add a test for the error classes *PIVOT_VALUE_DATA_TYPE_MISMATCH* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def pivotValDataTypeMismatchError(pivotVal: Expression, pivotCol: > Expression): Throwable = { > new AnalysisException( > errorClass = "PIVOT_VALUE_DATA_TYPE_MISMATCH", > messageParameters = Array( > pivotVal.toString, pivotVal.dataType.simpleString, > pivotCol.dataType.catalogString)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38748) Test the error class: PIVOT_VALUE_DATA_TYPE_MISMATCH
[ https://issues.apache.org/jira/browse/SPARK-38748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38748: Assignee: (was: Apache Spark) > Test the error class: PIVOT_VALUE_DATA_TYPE_MISMATCH > > > Key: SPARK-38748 > URL: https://issues.apache.org/jira/browse/SPARK-38748 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *PIVOT_VALUE_DATA_TYPE_MISMATCH* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def pivotValDataTypeMismatchError(pivotVal: Expression, pivotCol: > Expression): Throwable = { > new AnalysisException( > errorClass = "PIVOT_VALUE_DATA_TYPE_MISMATCH", > messageParameters = Array( > pivotVal.toString, pivotVal.dataType.simpleString, > pivotCol.dataType.catalogString)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38748) Test the error class: PIVOT_VALUE_DATA_TYPE_MISMATCH
[ https://issues.apache.org/jira/browse/SPARK-38748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529726#comment-17529726 ] Apache Spark commented on SPARK-38748: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/36400 > Test the error class: PIVOT_VALUE_DATA_TYPE_MISMATCH > > > Key: SPARK-38748 > URL: https://issues.apache.org/jira/browse/SPARK-38748 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *PIVOT_VALUE_DATA_TYPE_MISMATCH* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def pivotValDataTypeMismatchError(pivotVal: Expression, pivotCol: > Expression): Throwable = { > new AnalysisException( > errorClass = "PIVOT_VALUE_DATA_TYPE_MISMATCH", > messageParameters = Array( > pivotVal.toString, pivotVal.dataType.simpleString, > pivotCol.dataType.catalogString)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38748) Test the error class: PIVOT_VALUE_DATA_TYPE_MISMATCH
[ https://issues.apache.org/jira/browse/SPARK-38748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529725#comment-17529725 ] Apache Spark commented on SPARK-38748: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/36400 > Test the error class: PIVOT_VALUE_DATA_TYPE_MISMATCH > > > Key: SPARK-38748 > URL: https://issues.apache.org/jira/browse/SPARK-38748 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *PIVOT_VALUE_DATA_TYPE_MISMATCH* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def pivotValDataTypeMismatchError(pivotVal: Expression, pivotCol: > Expression): Throwable = { > new AnalysisException( > errorClass = "PIVOT_VALUE_DATA_TYPE_MISMATCH", > messageParameters = Array( > pivotVal.toString, pivotVal.dataType.simpleString, > pivotCol.dataType.catalogString)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39062) Add Standalone backend support for Stage Level Scheduling
[ https://issues.apache.org/jira/browse/SPARK-39062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529717#comment-17529717 ] huangtengfei commented on SPARK-39062: -- I am working on this. Thanks [~jiangxb1987] > Add Standalone backend support for Stage Level Scheduling > - > > Key: SPARK-39062 > URL: https://issues.apache.org/jira/browse/SPARK-39062 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0, 3.3.0 >Reporter: Xingbo Jiang >Priority: Major > > We should add Standalone backend support for Stage Level Scheduling: > * The Master should be able to generate executors for multiple > ResouceProfiles, currently it only considers available CPUs; > * The Worker need let the executor know about its ResourceProfile. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39062) Add Standalone backend support for Stage Level Scheduling
Xingbo Jiang created SPARK-39062: Summary: Add Standalone backend support for Stage Level Scheduling Key: SPARK-39062 URL: https://issues.apache.org/jira/browse/SPARK-39062 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0, 3.3.0 Reporter: Xingbo Jiang We should add Standalone backend support for Stage Level Scheduling: * The Master should be able to generate executors for multiple ResouceProfiles, currently it only considers available CPUs; * The Worker need let the executor know about its ResourceProfile. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-39061: -- Description: The following query returns incorrect results: {noformat} spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); 1 2 -1 -1 Time taken: 4.053 seconds, Fetched 2 row(s) spark-sql> {noformat} In Hive, the last row is {{NULL, NULL}}: {noformat} Beeline version 2.3.9 by Apache Hive 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, 'b', 2), null)); +---+---+ | a | b | +---+---+ | 1 | 2 | | NULL | NULL | +---+---+ 2 rows selected (1.355 seconds) 0: jdbc:hive2://localhost:1> {noformat} If the struct has string fields, you get a {{NullPointerException}}: {noformat} spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.NullPointerException: null at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source) ~[?:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] {noformat} You can work around the issue by casting the null entry of the array: {noformat} spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as struct))); 1 2 NULLNULL Time taken: 0.068 seconds, Fetched 2 row(s) spark-sql> {noformat} As far as I can tell, this issue only happens with arrays of structs where the structs are created in an inline table or in a projection. The fields of the struct are not getting set to {{nullable = true}} when there is no example in the array where the field is set to {{null}}. As a result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no code to create a row of null columns, so it just creates a row from variables set with default values. was: The following query returns incorrect results: {noformat} spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); 1 2 -1 -1 Time taken: 4.053 seconds, Fetched 2 row(s) spark-sql> {noformat} In Hive, the last row is {{NULL, NULL}}: {noformat} Beeline version 2.3.9 by Apache Hive 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, 'b', 2), null)); +---+---+ | a | b | +---+---+ | 1 | 2 | | NULL | NULL | +---+---+ 2 rows selected (1.355 seconds) 0: jdbc:hive2://localhost:1> {noformat} If the struct has string fields, you get a {{NullPointerException}}: {noformat} spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.NullPointerException: null at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source) ~[?:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] {noformat} (Note: In Spark 3.1.3, both examples result in NPE). You can work around the issue by casting the null entry of the array: {noformat} spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as struct))); 1 2 NULLNULL Time taken: 0.068 seconds, Fetched 2 row(s) spark-sql> {noformat} (Note: In Spark 3.1.3, the above workaround does not work). As far as I can tell, this issue only happens with arrays of structs where the structs are created in an inline table or in a projection. The fields of the struct are not getting set to {{nullable = true}} when there is no example in the array where the field is set to {{null}}. As a result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no code to create a row of null columns, so it just creates a row from variables set with default values. > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL:
[jira] [Updated] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-39061: -- Affects Version/s: (was: 3.1.3) > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Priority: Major > Labels: correctness > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > (Note: In Spark 3.1.3, both examples result in NPE). > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > (Note: In Spark 3.1.3, the above workaround does not work). > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39058) Add `getInputSignature` and `getOutputSignature` APIs for spark ML models/transformers
[ https://issues.apache.org/jira/browse/SPARK-39058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529703#comment-17529703 ] Weichen Xu commented on SPARK-39058: [~jasbali] volunteers to contribute this feature, so I would assign this ticket to him. [~ruifengz] Would you help review when you have time ? Thank you. > Add `getInputSignature` and `getOutputSignature` APIs for spark ML > models/transformers > -- > > Key: SPARK-39058 > URL: https://issues.apache.org/jira/browse/SPARK-39058 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 3.4.0 >Reporter: Weichen Xu >Priority: Major > > Add `getInputSignature` and `getOutputSignature` APIs for spark ML > models/transformers: > The `getInputSignature` API return a list of column name and type which > represent the ML model / transformer's input columns. > The `getOutputSignature` API return a list of column column name and type > which represent the ML model / transformer's output columns. > These 2 APIs are useful to in third-party library such as mlflow, which > requires the information of input / output signature of the Model. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529702#comment-17529702 ] Bruce Robbins edited comment on SPARK-39061 at 4/29/22 12:33 AM: - Btw, dataframe example: {noformat} scala> val df = Seq((1)).toDF.withColumn("c1", array(struct(lit(1).alias("a"), lit(2).alias("b")), lit(null))) df: org.apache.spark.sql.DataFrame = [value: int, c1: array>] scala> df.selectExpr("inline(c1)").collect res3: Array[org.apache.spark.sql.Row] = Array([1,2], [-1,-1]) {noformat} was (Author: bersprockets): Btw, dataframe example: {noformat} scala> val df = Seq((1)).toDF.withColumn("b", array(struct(lit(1).alias("a"), lit(2).alias("a")), lit(null))) df: org.apache.spark.sql.DataFrame = [value: int, b: array>] scala> df.selectExpr("inline(b)").collect res2: Array[org.apache.spark.sql.Row] = Array([1,2], [-1,-1]) {noformat} > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Priority: Major > Labels: correctness > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > (Note: In Spark 3.1.3, both examples result in NPE). > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > (Note: In Spark 3.1.3, the above workaround does not work). > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529702#comment-17529702 ] Bruce Robbins commented on SPARK-39061: --- Btw, dataframe example: {noformat} scala> val df = Seq((1)).toDF.withColumn("b", array(struct(lit(1).alias("a"), lit(2).alias("a")), lit(null))) df: org.apache.spark.sql.DataFrame = [value: int, b: array>] scala> df.selectExpr("inline(b)").collect res2: Array[org.apache.spark.sql.Row] = Array([1,2], [-1,-1]) {noformat} > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Priority: Major > Labels: correctness > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > (Note: In Spark 3.1.3, both examples result in NPE). > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > (Note: In Spark 3.1.3, the above workaround does not work). > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
Bruce Robbins created SPARK-39061: - Summary: Incorrect results or NPE when using Inline function against an array of dynamically created structs Key: SPARK-39061 URL: https://issues.apache.org/jira/browse/SPARK-39061 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1, 3.1.3, 3.3.0, 3.4.0 Reporter: Bruce Robbins The following query returns incorrect results: {noformat} spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); 1 2 -1 -1 Time taken: 4.053 seconds, Fetched 2 row(s) spark-sql> {noformat} In Hive, the last row is {{NULL, NULL}}: {noformat} Beeline version 2.3.9 by Apache Hive 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, 'b', 2), null)); +---+---+ | a | b | +---+---+ | 1 | 2 | | NULL | NULL | +---+---+ 2 rows selected (1.355 seconds) 0: jdbc:hive2://localhost:1> {noformat} If the struct has string fields, you get a {{NullPointerException}}: {noformat} spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.NullPointerException: null at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source) ~[?:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] {noformat} (Note: In Spark 3.1.3, both examples result in NPE). You can work around the issue by casting the null entry of the array: {noformat} spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as struct))); 1 2 NULLNULL Time taken: 0.068 seconds, Fetched 2 row(s) spark-sql> {noformat} (Note: In Spark 3.1.3, the above workaround does not work). As far as I can tell, this issue only happens with arrays of structs where the structs are created in an inline table or in a projection. The fields of the struct are not getting set to {{nullable = true}} when there is no example in the array where the field is set to {{null}}. As a result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no code to create a row of null columns, so it just creates a row from variables set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-35739. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36343 [https://github.com/apache/spark/pull/36343] > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Assignee: Brandon Dahler >Priority: Minor > Fix For: 3.4.0 > > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column"), "left") > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left") > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-35739: Assignee: Brandon Dahler > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Assignee: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column"), "left") > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left") > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38896) Use tryWithResource to recycling KVStoreIterator
[ https://issues.apache.org/jira/browse/SPARK-38896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-38896. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36237 [https://github.com/apache/spark/pull/36237] > Use tryWithResource to recycling KVStoreIterator > > > Key: SPARK-38896 > URL: https://issues.apache.org/jira/browse/SPARK-38896 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > Use `Utils.tryWithResource` to recycling all `KVStoreIterator` opened by > RocksDB/LevelDB -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38896) Use tryWithResource to recycling KVStoreIterator
[ https://issues.apache.org/jira/browse/SPARK-38896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-38896: Assignee: Yang Jie > Use tryWithResource to recycling KVStoreIterator > > > Key: SPARK-38896 > URL: https://issues.apache.org/jira/browse/SPARK-38896 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > Use `Utils.tryWithResource` to recycling all `KVStoreIterator` opened by > RocksDB/LevelDB -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39042) Use `Map.values()` instead of `Map.entrySet()` in scenarios that do not use `keys`
[ https://issues.apache.org/jira/browse/SPARK-39042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-39042: Assignee: Yang Jie > Use `Map.values()` instead of `Map.entrySet()` in scenarios that do not use > `keys` > -- > > Key: SPARK-39042 > URL: https://issues.apache.org/jira/browse/SPARK-39042 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > Some code in Spark use `Map.entrySet()` but not use `keys`, can use > `Map.values()` instead of -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39042) Use `Map.values()` instead of `Map.entrySet()` in scenarios that do not use `keys`
[ https://issues.apache.org/jira/browse/SPARK-39042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-39042: - Priority: Trivial (was: Minor) > Use `Map.values()` instead of `Map.entrySet()` in scenarios that do not use > `keys` > -- > > Key: SPARK-39042 > URL: https://issues.apache.org/jira/browse/SPARK-39042 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > > Some code in Spark use `Map.entrySet()` but not use `keys`, can use > `Map.values()` instead of -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39042) Use `Map.values()` instead of `Map.entrySet()` in scenarios that do not use `keys`
[ https://issues.apache.org/jira/browse/SPARK-39042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-39042. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36372 [https://github.com/apache/spark/pull/36372] > Use `Map.values()` instead of `Map.entrySet()` in scenarios that do not use > `keys` > -- > > Key: SPARK-39042 > URL: https://issues.apache.org/jira/browse/SPARK-39042 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.4.0 > > > Some code in Spark use `Map.entrySet()` but not use `keys`, can use > `Map.values()` instead of -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39034) Add tests for options from `to_json` and `from_json`.
[ https://issues.apache.org/jira/browse/SPARK-39034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39034: Assignee: (was: Apache Spark) > Add tests for options from `to_json` and `from_json`. > - > > Key: SPARK-39034 > URL: https://issues.apache.org/jira/browse/SPARK-39034 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39034) Add tests for options from `to_json` and `from_json`.
[ https://issues.apache.org/jira/browse/SPARK-39034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529690#comment-17529690 ] Apache Spark commented on SPARK-39034: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/36399 > Add tests for options from `to_json` and `from_json`. > - > > Key: SPARK-39034 > URL: https://issues.apache.org/jira/browse/SPARK-39034 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39034) Add tests for options from `to_json` and `from_json`.
[ https://issues.apache.org/jira/browse/SPARK-39034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39034: Assignee: Apache Spark > Add tests for options from `to_json` and `from_json`. > - > > Key: SPARK-39034 > URL: https://issues.apache.org/jira/browse/SPARK-39034 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39034) Add tests for options from `to_json` and `from_json`.
[ https://issues.apache.org/jira/browse/SPARK-39034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529689#comment-17529689 ] Apache Spark commented on SPARK-39034: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/36399 > Add tests for options from `to_json` and `from_json`. > - > > Key: SPARK-39034 > URL: https://issues.apache.org/jira/browse/SPARK-39034 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39035) Add tests for options from `to_csv` and `from_csv`.
[ https://issues.apache.org/jira/browse/SPARK-39035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529688#comment-17529688 ] Haejoon Lee commented on SPARK-39035: - I'm working on this > Add tests for options from `to_csv` and `from_csv`. > --- > > Key: SPARK-39035 > URL: https://issues.apache.org/jira/browse/SPARK-39035 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > There are many supported options for to_json and from_json > (https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option), > but they are currently not tested. > We should test for options to > `sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38838) Support ALTER TABLE ALTER COLUMN commands with DEFAULT values
[ https://issues.apache.org/jira/browse/SPARK-38838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529661#comment-17529661 ] Apache Spark commented on SPARK-38838: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/36398 > Support ALTER TABLE ALTER COLUMN commands with DEFAULT values > - > > Key: SPARK-38838 > URL: https://issues.apache.org/jira/browse/SPARK-38838 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38718) Test the error class: AMBIGUOUS_FIELD_NAME
[ https://issues.apache.org/jira/browse/SPARK-38718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-38718. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36395 [https://github.com/apache/spark/pull/36395] > Test the error class: AMBIGUOUS_FIELD_NAME > -- > > Key: SPARK-38718 > URL: https://issues.apache.org/jira/browse/SPARK-38718 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: panbingkun >Priority: Minor > Labels: starter > Fix For: 3.4.0 > > > Add at least one test for the error class *AMBIGUOUS_FIELD_NAME* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def ambiguousFieldNameError( > fieldName: Seq[String], numMatches: Int, context: Origin): Throwable = { > new AnalysisException( > errorClass = "AMBIGUOUS_FIELD_NAME", > messageParameters = Array(fieldName.quoted, numMatches.toString), > origin = context) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38718) Test the error class: AMBIGUOUS_FIELD_NAME
[ https://issues.apache.org/jira/browse/SPARK-38718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-38718: Assignee: panbingkun > Test the error class: AMBIGUOUS_FIELD_NAME > -- > > Key: SPARK-38718 > URL: https://issues.apache.org/jira/browse/SPARK-38718 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: panbingkun >Priority: Minor > Labels: starter > > Add at least one test for the error class *AMBIGUOUS_FIELD_NAME* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def ambiguousFieldNameError( > fieldName: Seq[String], numMatches: Int, context: Origin): Throwable = { > new AnalysisException( > errorClass = "AMBIGUOUS_FIELD_NAME", > messageParameters = Array(fieldName.quoted, numMatches.toString), > origin = context) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39045) INTERNAL_ERROR for "all" internal errors
[ https://issues.apache.org/jira/browse/SPARK-39045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Rielau updated SPARK-39045: - Description: We should be able to inject the [INTERNAL_ERROR] class for most cases without waiting to label the long tail on user facing error classes (was: We should be able to inject the [SYSTEM_ERROR] class for most cases without waiting to label the long tail on user facing error classes ) > INTERNAL_ERROR for "all" internal errors > > > Key: SPARK-39045 > URL: https://issues.apache.org/jira/browse/SPARK-39045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > We should be able to inject the [INTERNAL_ERROR] class for most cases > without waiting to label the long tail on user facing error classes -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39060) Typo in error messages of decimal overflow
[ https://issues.apache.org/jira/browse/SPARK-39060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529620#comment-17529620 ] Apache Spark commented on SPARK-39060: -- User 'vli-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36397 > Typo in error messages of decimal overflow > -- > > Key: SPARK-39060 > URL: https://issues.apache.org/jira/browse/SPARK-39060 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Vitalii Li >Priority: Major > > org.apache.spark.SparkArithmeticException > Decimal(expanded,10.1,39,1}) cannot be > represented as Decimal(38, 1). If necessary set spark.sql.ansi.enabled to > false to bypass this error. > > As shown in {{decimalArithmeticOperations.sql.out}} > Notice the extra {{}}} before ‘cannot’ > > > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39060) Typo in error messages of decimal overflow
[ https://issues.apache.org/jira/browse/SPARK-39060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39060: Assignee: Apache Spark > Typo in error messages of decimal overflow > -- > > Key: SPARK-39060 > URL: https://issues.apache.org/jira/browse/SPARK-39060 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Vitalii Li >Assignee: Apache Spark >Priority: Major > > org.apache.spark.SparkArithmeticException > Decimal(expanded,10.1,39,1}) cannot be > represented as Decimal(38, 1). If necessary set spark.sql.ansi.enabled to > false to bypass this error. > > As shown in {{decimalArithmeticOperations.sql.out}} > Notice the extra {{}}} before ‘cannot’ > > > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39060) Typo in error messages of decimal overflow
[ https://issues.apache.org/jira/browse/SPARK-39060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529619#comment-17529619 ] Apache Spark commented on SPARK-39060: -- User 'vli-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36397 > Typo in error messages of decimal overflow > -- > > Key: SPARK-39060 > URL: https://issues.apache.org/jira/browse/SPARK-39060 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Vitalii Li >Priority: Major > > org.apache.spark.SparkArithmeticException > Decimal(expanded,10.1,39,1}) cannot be > represented as Decimal(38, 1). If necessary set spark.sql.ansi.enabled to > false to bypass this error. > > As shown in {{decimalArithmeticOperations.sql.out}} > Notice the extra {{}}} before ‘cannot’ > > > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39060) Typo in error messages of decimal overflow
[ https://issues.apache.org/jira/browse/SPARK-39060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39060: Assignee: (was: Apache Spark) > Typo in error messages of decimal overflow > -- > > Key: SPARK-39060 > URL: https://issues.apache.org/jira/browse/SPARK-39060 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Vitalii Li >Priority: Major > > org.apache.spark.SparkArithmeticException > Decimal(expanded,10.1,39,1}) cannot be > represented as Decimal(38, 1). If necessary set spark.sql.ansi.enabled to > false to bypass this error. > > As shown in {{decimalArithmeticOperations.sql.out}} > Notice the extra {{}}} before ‘cannot’ > > > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39060) Typo in error messages of decimal overflow
[ https://issues.apache.org/jira/browse/SPARK-39060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Li updated SPARK-39060: --- Description: org.apache.spark.SparkArithmeticException Decimal(expanded,10.1,39,1}) cannot be represented as Decimal(38, 1). If necessary set spark.sql.ansi.enabled to false to bypass this error. As shown in {{decimalArithmeticOperations.sql.out}} Notice the extra {{}}} before ‘cannot’ was: ``` -- !query select (5e36BD + 0.1) + 5e36BD -- !query schema struct<> -- !query output org.apache.spark.SparkArithmeticException [CANNOT_CHANGE_DECIMAL_PRECISION] Decimal(expanded,10.1,39,1}) cannot be represented as Decimal(38, 1). If necessary set "spark.sql.ansi.enabled" to false to bypass this error. == SQL(line 1, position 7) == select (5e36BD + 0.1) + 5e36BD ^^^ ``` > Typo in error messages of decimal overflow > -- > > Key: SPARK-39060 > URL: https://issues.apache.org/jira/browse/SPARK-39060 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Vitalii Li >Priority: Major > > org.apache.spark.SparkArithmeticException > Decimal(expanded,10.1,39,1}) cannot be > represented as Decimal(38, 1). If necessary set spark.sql.ansi.enabled to > false to bypass this error. > > As shown in {{decimalArithmeticOperations.sql.out}} > Notice the extra {{}}} before ‘cannot’ > > > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39060) Typo in error messages of decimal overflow
Vitalii Li created SPARK-39060: -- Summary: Typo in error messages of decimal overflow Key: SPARK-39060 URL: https://issues.apache.org/jira/browse/SPARK-39060 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Vitalii Li ``` -- !query select (5e36BD + 0.1) + 5e36BD -- !query schema struct<> -- !query output org.apache.spark.SparkArithmeticException [CANNOT_CHANGE_DECIMAL_PRECISION] Decimal(expanded,10.1,39,1}) cannot be represented as Decimal(38, 1). If necessary set "spark.sql.ansi.enabled" to false to bypass this error. == SQL(line 1, position 7) == select (5e36BD + 0.1) + 5e36BD ^^^ ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39059) When using multiple SparkSessions, DataFrame.resolve uses configuration from the wrong session
[ https://issues.apache.org/jira/browse/SPARK-39059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furcy Pin updated SPARK-39059: -- Description: We encountered unexpected error when using SparkSession.newSession and the "spark.sql.caseSensitive" option. I wrote a handful of examples below to illustrate the problem, but from the examples below it looks like when you use _SparkSession.newSession()_ and change the configuration of that new session, _DataFrame.apply(col_name)_ seems to use the configuration from the initial session instead of the new one. *Example 1.A* This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. ``` *Example 1.B* This fails because "spark.sql.caseSensitive" has not been set on s2, even if it has been set on s1 *(OK)* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. ``` *Example 1.C* This works because "spark.sql.caseSensitive" has been set on s2 *[OK]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() ``` *Example 2.A* This fails because "spark.sql.caseSensitive" has not been set at all *[NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is ambiguous, could be: a, a. *Example 2.B* This should fail because "spark.sql.caseSensitive" has not been set on s2, but it works *[NOT NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") ``` *Example 2.C* This should work because "spark.sql.caseSensitive" has been set on s2, but it fails instead *[NOT NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is a ``` was: We encountered unexpected error when using SparkSession.newSession and the "spark.sql.caseSensitive" option. I wrote a handful of examples below to illustrate the problem, but from the examples below it looks like when you use _SparkSession.newSession()_ and change the configuration of that new session, _DataFrame.apply(col_name)_ seems to use the configuration from the initial session instead of the new one. *Example 1.A* This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. ``` *Example 1.B* This fails because "spark.sql.caseSensitive" has not been set on s2 *(OK)* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. ``` *Example 1.C* This works because "spark.sql.caseSensitive" has been set on s2 *[OK]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() ``` *Example 2.A* This fails because
[jira] [Created] (SPARK-39059) When using multiple SparkSessions, DataFrame.resolve uses configuration from the wrong session
Furcy Pin created SPARK-39059: - Summary: When using multiple SparkSessions, DataFrame.resolve uses configuration from the wrong session Key: SPARK-39059 URL: https://issues.apache.org/jira/browse/SPARK-39059 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Furcy Pin We encountered unexpected error when using SparkSession.newSession and the "spark.sql.caseSensitive" option. I wrote a handful of examples below to illustrate the problem, but from the examples below it looks like when you use _SparkSession.newSession()_ and change the configuration of that new session, _DataFrame.apply(col_name)_ seems to use the configuration from the initial session instead of the new one. *Example 1.A* This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. ``` *Example 1.B* This fails because "spark.sql.caseSensitive" has not been set on s2 *(OK)* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. ``` *Example 1.C* This works because "spark.sql.caseSensitive" has been set on s2 *[OK]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() ``` *Example 2.A* This fails because "spark.sql.caseSensitive" has not been set at all *[NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is ambiguous, could be: a, a. /* This should fail because "spark.sql.caseSensitive" has not been set on s2, but it works */ // *[NOT NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") ``` *Example 2.C* This should work because "spark.sql.caseSensitive" has been set on s2, but it fails instead *[NOT NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is a ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38741) Test the error class: MAP_KEY_DOES_NOT_EXIST*
[ https://issues.apache.org/jira/browse/SPARK-38741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-38741. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36232 [https://github.com/apache/spark/pull/36232] > Test the error class: MAP_KEY_DOES_NOT_EXIST* > - > > Key: SPARK-38741 > URL: https://issues.apache.org/jira/browse/SPARK-38741 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: panbingkun >Priority: Minor > Labels: starter > Fix For: 3.4.0 > > > Add tests for the error classes *MAP_KEY_DOES_NOT_EXIST** to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def mapKeyNotExistError(key: Any, isElementAtFunction: Boolean): > NoSuchElementException = { > if (isElementAtFunction) { > new SparkNoSuchElementException(errorClass = > "MAP_KEY_DOES_NOT_EXIST_IN_ELEMENT_AT", > messageParameters = Array(key.toString, SQLConf.ANSI_ENABLED.key)) > } else { > new SparkNoSuchElementException(errorClass = "MAP_KEY_DOES_NOT_EXIST", > messageParameters = Array(key.toString, > SQLConf.ANSI_STRICT_INDEX_OPERATOR.key)) > } > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38741) Test the error class: MAP_KEY_DOES_NOT_EXIST*
[ https://issues.apache.org/jira/browse/SPARK-38741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-38741: Assignee: panbingkun > Test the error class: MAP_KEY_DOES_NOT_EXIST* > - > > Key: SPARK-38741 > URL: https://issues.apache.org/jira/browse/SPARK-38741 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: panbingkun >Priority: Minor > Labels: starter > > Add tests for the error classes *MAP_KEY_DOES_NOT_EXIST** to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def mapKeyNotExistError(key: Any, isElementAtFunction: Boolean): > NoSuchElementException = { > if (isElementAtFunction) { > new SparkNoSuchElementException(errorClass = > "MAP_KEY_DOES_NOT_EXIST_IN_ELEMENT_AT", > messageParameters = Array(key.toString, SQLConf.ANSI_ENABLED.key)) > } else { > new SparkNoSuchElementException(errorClass = "MAP_KEY_DOES_NOT_EXIST", > messageParameters = Array(key.toString, > SQLConf.ANSI_STRICT_INDEX_OPERATOR.key)) > } > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39058) Add `getInputSignature` and `getOutputSignature` APIs for spark ML models/transformers
[ https://issues.apache.org/jira/browse/SPARK-39058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529462#comment-17529462 ] Weichen Xu commented on SPARK-39058: CC [~ruifengz] Are you interested in contributing this feature ? :) > Add `getInputSignature` and `getOutputSignature` APIs for spark ML > models/transformers > -- > > Key: SPARK-39058 > URL: https://issues.apache.org/jira/browse/SPARK-39058 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 3.4.0 >Reporter: Weichen Xu >Priority: Major > > Add `getInputSignature` and `getOutputSignature` APIs for spark ML > models/transformers: > The `getInputSignature` API return a list of column name and type which > represent the ML model / transformer's input columns. > The `getOutputSignature` API return a list of column column name and type > which represent the ML model / transformer's output columns. > These 2 APIs are useful to in third-party library such as mlflow, which > requires the information of input / output signature of the Model. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39058) Add `getInputSignature` and `getOutputSignature` APIs for spark ML models/transformers
Weichen Xu created SPARK-39058: -- Summary: Add `getInputSignature` and `getOutputSignature` APIs for spark ML models/transformers Key: SPARK-39058 URL: https://issues.apache.org/jira/browse/SPARK-39058 Project: Spark Issue Type: New Feature Components: ML Affects Versions: 3.4.0 Reporter: Weichen Xu Add `getInputSignature` and `getOutputSignature` APIs for spark ML models/transformers: The `getInputSignature` API return a list of column name and type which represent the ML model / transformer's input columns. The `getOutputSignature` API return a list of column column name and type which represent the ML model / transformer's output columns. These 2 APIs are useful to in third-party library such as mlflow, which requires the information of input / output signature of the Model. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39056) Use `Collections.singletonList` instead of `Arrays.asList` when there is only one argument
[ https://issues.apache.org/jira/browse/SPARK-39056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-39056. -- Resolution: Won't Fix > Use `Collections.singletonList` instead of `Arrays.asList` when there is > only one argument > > > Key: SPARK-39056 > URL: https://issues.apache.org/jira/browse/SPARK-39056 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Use `Collections.singletonList` instead of `Arrays.asList` when there is only > one argument. > > before > {code:java} > List one = Arrays.asList("one"); {code} > after > {code:java} > List one = Collections.singletonList("one"); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38718) Test the error class: AMBIGUOUS_FIELD_NAME
[ https://issues.apache.org/jira/browse/SPARK-38718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529385#comment-17529385 ] Apache Spark commented on SPARK-38718: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/36395 > Test the error class: AMBIGUOUS_FIELD_NAME > -- > > Key: SPARK-38718 > URL: https://issues.apache.org/jira/browse/SPARK-38718 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add at least one test for the error class *AMBIGUOUS_FIELD_NAME* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def ambiguousFieldNameError( > fieldName: Seq[String], numMatches: Int, context: Origin): Throwable = { > new AnalysisException( > errorClass = "AMBIGUOUS_FIELD_NAME", > messageParameters = Array(fieldName.quoted, numMatches.toString), > origin = context) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38718) Test the error class: AMBIGUOUS_FIELD_NAME
[ https://issues.apache.org/jira/browse/SPARK-38718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38718: Assignee: (was: Apache Spark) > Test the error class: AMBIGUOUS_FIELD_NAME > -- > > Key: SPARK-38718 > URL: https://issues.apache.org/jira/browse/SPARK-38718 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add at least one test for the error class *AMBIGUOUS_FIELD_NAME* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def ambiguousFieldNameError( > fieldName: Seq[String], numMatches: Int, context: Origin): Throwable = { > new AnalysisException( > errorClass = "AMBIGUOUS_FIELD_NAME", > messageParameters = Array(fieldName.quoted, numMatches.toString), > origin = context) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38718) Test the error class: AMBIGUOUS_FIELD_NAME
[ https://issues.apache.org/jira/browse/SPARK-38718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529384#comment-17529384 ] Apache Spark commented on SPARK-38718: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/36395 > Test the error class: AMBIGUOUS_FIELD_NAME > -- > > Key: SPARK-38718 > URL: https://issues.apache.org/jira/browse/SPARK-38718 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add at least one test for the error class *AMBIGUOUS_FIELD_NAME* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def ambiguousFieldNameError( > fieldName: Seq[String], numMatches: Int, context: Origin): Throwable = { > new AnalysisException( > errorClass = "AMBIGUOUS_FIELD_NAME", > messageParameters = Array(fieldName.quoted, numMatches.toString), > origin = context) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38718) Test the error class: AMBIGUOUS_FIELD_NAME
[ https://issues.apache.org/jira/browse/SPARK-38718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38718: Assignee: Apache Spark > Test the error class: AMBIGUOUS_FIELD_NAME > -- > > Key: SPARK-38718 > URL: https://issues.apache.org/jira/browse/SPARK-38718 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > > Add at least one test for the error class *AMBIGUOUS_FIELD_NAME* to > QueryCompilationErrorsSuite. The test should cover the exception throw in > QueryCompilationErrors: > {code:scala} > def ambiguousFieldNameError( > fieldName: Seq[String], numMatches: Int, context: Origin): Throwable = { > new AnalysisException( > errorClass = "AMBIGUOUS_FIELD_NAME", > messageParameters = Array(fieldName.quoted, numMatches.toString), > origin = context) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-39044) AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-39044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529381#comment-17529381 ] Willi Raschkowski edited comment on SPARK-39044 at 4/28/22 11:29 AM: - [~hyukjin.kwon], yes I know. But I wasn't able to get a self-contained reproducer. This reliably fails in prod. But using that same TypedImperativeAggregate with {{observe()}} in local tests works fine. If you have ideas on what to try, I will. (Also happy to share the aggregate, but from the stacktrace I understood the implementation isn't relevant - it's the {{AggregatingAccumulator}} buffer that is {{null}}. Anyway, I attached [^aggregate.scala].) I understand if you close this ticket because you cannot root-cause without a repro. was (Author: raschkowski): [~hyukjin.kwon], yes I know. But I wasn't able to get a self-contained reproducer. This reliably fails in prod. But using that same TypedImperativeAggregate with {{observe()}} in local tests works fine. If you have ideas on what to try, I will. (Also happy to share the aggregate, but from the stacktrace I understood the implementation isn't relevant - it's the {{AggregatingAccumulator}} buffer that is {{{}null{}}}.) I understand if you close this ticket because you cannot root-cause without a repro. > AggregatingAccumulator with TypedImperativeAggregate throwing > NullPointerException > -- > > Key: SPARK-39044 > URL: https://issues.apache.org/jira/browse/SPARK-39044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Willi Raschkowski >Priority: Major > Attachments: aggregate.scala > > > We're using a custom TypedImperativeAggregate inside an > AggregatingAccumulator (via {{observe()}} and get the error below. It looks > like we're trying to serialize an aggregation buffer that hasn't been > initialized yet. > {code} > Caused by: org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:251) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186) > at > org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:540) > ... > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in > stage 1.0 (TID 32) (10.0.134.136 executor 3): java.io.IOException: > java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) > at > org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) > at > java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) > at > java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:638) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:599) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:621) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) > at > org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:186) > at jdk.internal.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at >
[jira] [Updated] (SPARK-39044) AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-39044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Willi Raschkowski updated SPARK-39044: -- Attachment: aggregate.scala > AggregatingAccumulator with TypedImperativeAggregate throwing > NullPointerException > -- > > Key: SPARK-39044 > URL: https://issues.apache.org/jira/browse/SPARK-39044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Willi Raschkowski >Priority: Major > Attachments: aggregate.scala > > > We're using a custom TypedImperativeAggregate inside an > AggregatingAccumulator (via {{observe()}} and get the error below. It looks > like we're trying to serialize an aggregation buffer that hasn't been > initialized yet. > {code} > Caused by: org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:251) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186) > at > org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:540) > ... > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in > stage 1.0 (TID 32) (10.0.134.136 executor 3): java.io.IOException: > java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) > at > org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) > at > java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) > at > java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:638) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:599) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:621) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) > at > org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:186) > at jdk.internal.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > java.base/java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1235) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1137) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$1(TaskResult.scala:55) > at >
[jira] [Commented] (SPARK-39044) AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-39044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529381#comment-17529381 ] Willi Raschkowski commented on SPARK-39044: --- [~hyukjin.kwon], yes I know. But I wasn't able to get a self-contained reproducer. This reliably fails in prod. But using that same TypedImperativeAggregate with {{observe()}} in local tests works fine. If you have ideas on what to try, I will. (Also happy to share the aggregate, but from the stacktrace I understood the implementation isn't relevant - it's the {{AggregatingAccumulator}} buffer that is {{{}null{}}}.) I understand if you close this ticket because you cannot root-cause without a repro. > AggregatingAccumulator with TypedImperativeAggregate throwing > NullPointerException > -- > > Key: SPARK-39044 > URL: https://issues.apache.org/jira/browse/SPARK-39044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Willi Raschkowski >Priority: Major > > We're using a custom TypedImperativeAggregate inside an > AggregatingAccumulator (via {{observe()}} and get the error below. It looks > like we're trying to serialize an aggregation buffer that hasn't been > initialized yet. > {code} > Caused by: org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:251) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186) > at > org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:540) > ... > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in > stage 1.0 (TID 32) (10.0.134.136 executor 3): java.io.IOException: > java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) > at > org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) > at > java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) > at > java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:638) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:599) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:621) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) > at > org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:186) > at jdk.internal.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > java.base/java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1235) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1137) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at
[jira] [Commented] (SPARK-39057) Offset could work without Limit
[ https://issues.apache.org/jira/browse/SPARK-39057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529374#comment-17529374 ] Apache Spark commented on SPARK-39057: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36394 > Offset could work without Limit > --- > > Key: SPARK-39057 > URL: https://issues.apache.org/jira/browse/SPARK-39057 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Offset must work with Limit. The behavior limits to add offset api > into dataframe. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39057) Offset could work without Limit
[ https://issues.apache.org/jira/browse/SPARK-39057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529373#comment-17529373 ] Apache Spark commented on SPARK-39057: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36394 > Offset could work without Limit > --- > > Key: SPARK-39057 > URL: https://issues.apache.org/jira/browse/SPARK-39057 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Offset must work with Limit. The behavior limits to add offset api > into dataframe. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39057) Offset could work without Limit
[ https://issues.apache.org/jira/browse/SPARK-39057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39057: Assignee: Apache Spark > Offset could work without Limit > --- > > Key: SPARK-39057 > URL: https://issues.apache.org/jira/browse/SPARK-39057 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Currently, Offset must work with Limit. The behavior limits to add offset api > into dataframe. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39057) Offset could work without Limit
[ https://issues.apache.org/jira/browse/SPARK-39057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39057: Assignee: (was: Apache Spark) > Offset could work without Limit > --- > > Key: SPARK-39057 > URL: https://issues.apache.org/jira/browse/SPARK-39057 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Offset must work with Limit. The behavior limits to add offset api > into dataframe. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39055) Fix documentation 404 page
[ https://issues.apache.org/jira/browse/SPARK-39055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39055. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36392 [https://github.com/apache/spark/pull/36392] > Fix documentation 404 page > -- > > Key: SPARK-39055 > URL: https://issues.apache.org/jira/browse/SPARK-39055 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.3.0 > > > 404 page is currently not working -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39055) Fix documentation 404 page
[ https://issues.apache.org/jira/browse/SPARK-39055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-39055: Assignee: Kent Yao > Fix documentation 404 page > -- > > Key: SPARK-39055 > URL: https://issues.apache.org/jira/browse/SPARK-39055 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > 404 page is currently not working -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38870) SparkSession.builder returns a new builder in Scala, but not in Python
[ https://issues.apache.org/jira/browse/SPARK-38870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38870: Assignee: Furcy Pin > SparkSession.builder returns a new builder in Scala, but not in Python > -- > > Key: SPARK-38870 > URL: https://issues.apache.org/jira/browse/SPARK-38870 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.2.1 >Reporter: Furcy Pin >Assignee: Furcy Pin >Priority: Major > > In pyspark, _SparkSession.builder_ always returns the same static builder, > while the expected behaviour should be the same as in Scala, where it returns > a new builder each time. > *How to reproduce* > When we run the following code in Scala : > {code:java} > import org.apache.spark.sql.SparkSession > val s1 = SparkSession.builder.master("local[2]").config("key", > "value").getOrCreate() > println("A : " + s1.conf.get("key")) // value > s1.conf.set("key", "new_value") > println("B : " + s1.conf.get("key")) // new_value > val s2 = SparkSession.builder.getOrCreate() > println("C : " + s1.conf.get("key")) // new_value{code} > The output is : > {code:java} > A : value > B : new_value > C : new_value <<<{code} > > But when we run the following (supposedly equivalent) code in Python: > {code:java} > from pyspark.sql import SparkSession > s1 = SparkSession.builder.master("local[2]").config("key", > "value").getOrCreate() > print("A : " + s1.conf.get("key")) > s1.conf.set("key", "new_value") > print("B : " + s1.conf.get("key")) > s2 = SparkSession.builder.getOrCreate() > print("C : " + s1.conf.get("key")){code} > The output is : > {code:java} > A : value > B : new_value > C : value <<< > {code} > > > *Root cause analysis* > This comes from the fact that _SparkSession.builder_ behaves differently in > Python than in Scala. In Scala, it returns a *new builder* each time, in > Python it returns *the same builder* every time, and the > SparkSession.Builder._options are static, too. > Because of this, whenever _SparkSession.builder.getOrCreate()_ is called, the > options passed to the very first builder are re-applied every time, and > overrides the option that were set afterwards. > This leads to very awkward behavior in every Spark version up to 3.2.1 > included > {*}Example{*}: > This example crashes, but was fixed by SPARK-37638 > > {code:java} > from pyspark.sql import SparkSession > spark = > SparkSession.builder.config("spark.sql.sources.partitionOverwriteMode", > "DYNAMIC").getOrCreate() > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == > "DYNAMIC" # OK > spark.conf.set("spark.sql.sources.partitionOverwriteMode", "STATIC") > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == "STATIC" > # OK > from pyspark.sql import functions as f > from pyspark.sql.types import StringType > f.col("a").cast(StringType()) > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == "STATIC" > # This fails in all versions until the SPARK-37638 fix > # because before that fix, Column.cast() calle > SparkSession.builder.getOrCreate(){code} > > But this example still crashes in the current version on the master branch > {code:java} > from pyspark.sql import SparkSession > spark = > SparkSession.builder.config("spark.sql.sources.partitionOverwriteMode", > "DYNAMIC").getOrCreate() > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == > "DYNAMIC" # OK > spark.conf.set("spark.sql.sources.partitionOverwriteMode", "STATIC") > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == "STATIC" > # OK > SparkSession.builder.getOrCreate() > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == "STATIC" > # This assert fails in master{code} > > I made a Pull Request to fix this bug : > https://github.com/apache/spark/pull/36161 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38870) SparkSession.builder returns a new builder in Scala, but not in Python
[ https://issues.apache.org/jira/browse/SPARK-38870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38870. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36161 [https://github.com/apache/spark/pull/36161] > SparkSession.builder returns a new builder in Scala, but not in Python > -- > > Key: SPARK-38870 > URL: https://issues.apache.org/jira/browse/SPARK-38870 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.2.1 >Reporter: Furcy Pin >Assignee: Furcy Pin >Priority: Major > Fix For: 3.4.0 > > > In pyspark, _SparkSession.builder_ always returns the same static builder, > while the expected behaviour should be the same as in Scala, where it returns > a new builder each time. > *How to reproduce* > When we run the following code in Scala : > {code:java} > import org.apache.spark.sql.SparkSession > val s1 = SparkSession.builder.master("local[2]").config("key", > "value").getOrCreate() > println("A : " + s1.conf.get("key")) // value > s1.conf.set("key", "new_value") > println("B : " + s1.conf.get("key")) // new_value > val s2 = SparkSession.builder.getOrCreate() > println("C : " + s1.conf.get("key")) // new_value{code} > The output is : > {code:java} > A : value > B : new_value > C : new_value <<<{code} > > But when we run the following (supposedly equivalent) code in Python: > {code:java} > from pyspark.sql import SparkSession > s1 = SparkSession.builder.master("local[2]").config("key", > "value").getOrCreate() > print("A : " + s1.conf.get("key")) > s1.conf.set("key", "new_value") > print("B : " + s1.conf.get("key")) > s2 = SparkSession.builder.getOrCreate() > print("C : " + s1.conf.get("key")){code} > The output is : > {code:java} > A : value > B : new_value > C : value <<< > {code} > > > *Root cause analysis* > This comes from the fact that _SparkSession.builder_ behaves differently in > Python than in Scala. In Scala, it returns a *new builder* each time, in > Python it returns *the same builder* every time, and the > SparkSession.Builder._options are static, too. > Because of this, whenever _SparkSession.builder.getOrCreate()_ is called, the > options passed to the very first builder are re-applied every time, and > overrides the option that were set afterwards. > This leads to very awkward behavior in every Spark version up to 3.2.1 > included > {*}Example{*}: > This example crashes, but was fixed by SPARK-37638 > > {code:java} > from pyspark.sql import SparkSession > spark = > SparkSession.builder.config("spark.sql.sources.partitionOverwriteMode", > "DYNAMIC").getOrCreate() > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == > "DYNAMIC" # OK > spark.conf.set("spark.sql.sources.partitionOverwriteMode", "STATIC") > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == "STATIC" > # OK > from pyspark.sql import functions as f > from pyspark.sql.types import StringType > f.col("a").cast(StringType()) > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == "STATIC" > # This fails in all versions until the SPARK-37638 fix > # because before that fix, Column.cast() calle > SparkSession.builder.getOrCreate(){code} > > But this example still crashes in the current version on the master branch > {code:java} > from pyspark.sql import SparkSession > spark = > SparkSession.builder.config("spark.sql.sources.partitionOverwriteMode", > "DYNAMIC").getOrCreate() > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == > "DYNAMIC" # OK > spark.conf.set("spark.sql.sources.partitionOverwriteMode", "STATIC") > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == "STATIC" > # OK > SparkSession.builder.getOrCreate() > assert spark.conf.get("spark.sql.sources.partitionOverwriteMode") == "STATIC" > # This assert fails in master{code} > > I made a Pull Request to fix this bug : > https://github.com/apache/spark/pull/36161 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39057) Offset could work without Limit
jiaan.geng created SPARK-39057: -- Summary: Offset could work without Limit Key: SPARK-39057 URL: https://issues.apache.org/jira/browse/SPARK-39057 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.0 Reporter: jiaan.geng Currently, Offset must work with Limit. The behavior limits to add offset api into dataframe. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39056) Use `Collections.singletonList` instead of `Arrays.asList` when there is only one argument
[ https://issues.apache.org/jira/browse/SPARK-39056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39056: Assignee: (was: Apache Spark) > Use `Collections.singletonList` instead of `Arrays.asList` when there is > only one argument > > > Key: SPARK-39056 > URL: https://issues.apache.org/jira/browse/SPARK-39056 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Use `Collections.singletonList` instead of `Arrays.asList` when there is only > one argument. > > before > {code:java} > List one = Arrays.asList("one"); {code} > after > {code:java} > List one = Collections.singletonList("one"); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39056) Use `Collections.singletonList` instead of `Arrays.asList` when there is only one argument
[ https://issues.apache.org/jira/browse/SPARK-39056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529365#comment-17529365 ] Apache Spark commented on SPARK-39056: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36393 > Use `Collections.singletonList` instead of `Arrays.asList` when there is > only one argument > > > Key: SPARK-39056 > URL: https://issues.apache.org/jira/browse/SPARK-39056 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Use `Collections.singletonList` instead of `Arrays.asList` when there is only > one argument. > > before > {code:java} > List one = Arrays.asList("one"); {code} > after > {code:java} > List one = Collections.singletonList("one"); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39056) Use `Collections.singletonList` instead of `Arrays.asList` when there is only one argument
[ https://issues.apache.org/jira/browse/SPARK-39056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39056: Assignee: Apache Spark > Use `Collections.singletonList` instead of `Arrays.asList` when there is > only one argument > > > Key: SPARK-39056 > URL: https://issues.apache.org/jira/browse/SPARK-39056 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > Use `Collections.singletonList` instead of `Arrays.asList` when there is only > one argument. > > before > {code:java} > List one = Arrays.asList("one"); {code} > after > {code:java} > List one = Collections.singletonList("one"); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39056) Use `Collections.singletonList` instead of `Arrays.asList` when there is only one argument
Yang Jie created SPARK-39056: Summary: Use `Collections.singletonList` instead of `Arrays.asList` when there is only one argument Key: SPARK-39056 URL: https://issues.apache.org/jira/browse/SPARK-39056 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.4.0 Reporter: Yang Jie Use `Collections.singletonList` instead of `Arrays.asList` when there is only one argument. before {code:java} List one = Arrays.asList("one"); {code} after {code:java} List one = Collections.singletonList("one"); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39054) GroupByTest failed due to axis Length mismatch
[ https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529305#comment-17529305 ] Yikun Jiang commented on SPARK-39054: - Related: https://github.com/pandas-dev/pandas/commit/d037ff6a4757bf8af2ca2431ba7d4b22b1959075 > GroupByTest failed due to axis Length mismatch > -- > > Key: SPARK-39054 > URL: https://issues.apache.org/jira/browse/SPARK-39054 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > {code:java} > An error occurred while calling o27083.getResult. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) > at > org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97) > at > org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93) > at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:282) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at > py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) > at py4j.ClientServerConnection.run(ClientServerConnection.java:106) > at java.lang.Thread.run(Thread.java:750) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in > stage 808.0 (TID 650) (localhost executor driver): > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, > in main > process() > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, > in process > serializer.dump_stream(out_iter, outfile) > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 343, in dump_stream > return ArrowStreamSerializer.dump_stream(self, > init_stream_yield_batches(), stream) > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 84, in dump_stream > for batch in iterator: > File > "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 336, in init_stream_yield_batches > for series in iterator: > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, > in mapper > return f(keys, vals) > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, > in > return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, > in wrapped > result = f(pd.concat(value_series, axis=1)) > File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in > wrapper > return f(*args, **kwargs) > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in > rename_output > pdf.columns = return_schema.names > File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line > 5588, in __setattr__ > return object.__setattr__(self, name, value) > File "pandas/_libs/properties.pyx", line 70, in > pandas._libs.properties.AxisProperty.__set__ > File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line > 769, in _set_axis > self._mgr.set_axis(axis, labels) > File > "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", > line 214, in set_axis > self._validate_set_axis(axis, new_labels) > File > "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line > 69, in _validate_set_axis > raise ValueError( > ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 > elements {code} > > GroupByTest.test_apply_with_new_dataframe_without_shortcut -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39054) GroupByTest failed due to axis Length mismatch
[ https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang updated SPARK-39054: Description: {code:java} An error occurred while calling o27083.getResult. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93) at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in stage 808.0 (TID 650) (localhost executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, in main process() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, in process serializer.dump_stream(out_iter, outfile) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 343, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 84, in dump_stream for batch in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 336, in init_stream_yield_batches for series in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, in mapper return f(keys, vals) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, in return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, in wrapped result = f(pd.concat(value_series, axis=1)) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, **kwargs) File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in rename_output pdf.columns = return_schema.names File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 5588, in __setattr__ return object.__setattr__(self, name, value) File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__ File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 769, in _set_axis self._mgr.set_axis(axis, labels) File "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", line 214, in set_axis self._validate_set_axis(axis, new_labels) File "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line 69, in _validate_set_axis raise ValueError( ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements {code} was: {code:java} An error occurred while calling o27083.getResult. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93) at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at
[jira] [Updated] (SPARK-39054) GroupByTest failed due to axis Length mismatch
[ https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang updated SPARK-39054: Description: {code:java} An error occurred while calling o27083.getResult. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93) at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in stage 808.0 (TID 650) (localhost executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, in main process() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, in process serializer.dump_stream(out_iter, outfile) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 343, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 84, in dump_stream for batch in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 336, in init_stream_yield_batches for series in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, in mapper return f(keys, vals) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, in return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, in wrapped result = f(pd.concat(value_series, axis=1)) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, **kwargs) File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in rename_output pdf.columns = return_schema.names File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 5588, in __setattr__ return object.__setattr__(self, name, value) File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__ File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 769, in _set_axis self._mgr.set_axis(axis, labels) File "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", line 214, in set_axis self._validate_set_axis(axis, new_labels) File "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line 69, in _validate_set_axis raise ValueError( ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements {code} GroupByTest.test_apply_with_new_dataframe_without_shortcut was: {code:java} An error occurred while calling o27083.getResult. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93) at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at
[jira] [Assigned] (SPARK-39055) Fix documentation 404 page
[ https://issues.apache.org/jira/browse/SPARK-39055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39055: Assignee: (was: Apache Spark) > Fix documentation 404 page > -- > > Key: SPARK-39055 > URL: https://issues.apache.org/jira/browse/SPARK-39055 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > > 404 page is currently not working -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39055) Fix documentation 404 page
[ https://issues.apache.org/jira/browse/SPARK-39055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529291#comment-17529291 ] Apache Spark commented on SPARK-39055: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/36392 > Fix documentation 404 page > -- > > Key: SPARK-39055 > URL: https://issues.apache.org/jira/browse/SPARK-39055 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > > 404 page is currently not working -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39055) Fix documentation 404 page
[ https://issues.apache.org/jira/browse/SPARK-39055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39055: Assignee: Apache Spark > Fix documentation 404 page > -- > > Key: SPARK-39055 > URL: https://issues.apache.org/jira/browse/SPARK-39055 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > 404 page is currently not working -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39045) INTERNAL_ERROR for "all" internal errors
[ https://issues.apache.org/jira/browse/SPARK-39045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-39045: Affects Version/s: 3.4.0 (was: 3.3.0) > INTERNAL_ERROR for "all" internal errors > > > Key: SPARK-39045 > URL: https://issues.apache.org/jira/browse/SPARK-39045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > We should be able to inject the [SYSTEM_ERROR] class for most cases without > waiting to label the long tail on user facing error classes -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38819) Run Pandas on Spark with Pandas 1.4.x
[ https://issues.apache.org/jira/browse/SPARK-38819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529283#comment-17529283 ] Yikun Jiang commented on SPARK-38819: - FYI, all issues of upgrading pandas 1.4.x has been listed above. SPARK-38946 may be taking some more time, others are okay. > Run Pandas on Spark with Pandas 1.4.x > - > > Key: SPARK-38819 > URL: https://issues.apache.org/jira/browse/SPARK-38819 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > This is a umbrella to track issues when pandas upgrade to 1.4.x > > I disable the fast-failed in test, 19 failed: > [https://github.com/Yikun/spark/pull/88/checks?check_run_id=5873627048] > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39037) DS V2 Top N push-down supports order by expressions
[ https://issues.apache.org/jira/browse/SPARK-39037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-39037: --- Summary: DS V2 Top N push-down supports order by expressions (was: DS V2 aggregate push-down supports order by expressions) > DS V2 Top N push-down supports order by expressions > --- > > Key: SPARK-39037 > URL: https://issues.apache.org/jira/browse/SPARK-39037 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > Currently, Spark DS V2 aggregate push-down only supports order by column. > But the SQL show below is very useful and common. > SELECT CASE WHEN ("SALARY" > 8000.00) AND ("SALARY" < 1.00) THEN "SALARY" > ELSE 0.00 END AS key, dept, name FROM "test"."employee" ORDER BY key -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39037) DS V2 Top N push-down supports order by expressions
[ https://issues.apache.org/jira/browse/SPARK-39037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-39037: --- Description: Currently, Spark DS V2 Top N push-down only supports order by column. But the SQL show below is very useful and common. SELECT CASE WHEN ("SALARY" > 8000.00) AND ("SALARY" < 1.00) THEN "SALARY" ELSE 0.00 END AS key, dept, name FROM "test"."employee" ORDER BY key was: Currently, Spark DS V2 aggregate push-down only supports order by column. But the SQL show below is very useful and common. SELECT CASE WHEN ("SALARY" > 8000.00) AND ("SALARY" < 1.00) THEN "SALARY" ELSE 0.00 END AS key, dept, name FROM "test"."employee" ORDER BY key > DS V2 Top N push-down supports order by expressions > --- > > Key: SPARK-39037 > URL: https://issues.apache.org/jira/browse/SPARK-39037 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > Currently, Spark DS V2 Top N push-down only supports order by column. > But the SQL show below is very useful and common. > SELECT CASE WHEN ("SALARY" > 8000.00) AND ("SALARY" < 1.00) THEN "SALARY" > ELSE 0.00 END AS key, dept, name FROM "test"."employee" ORDER BY key -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39055) Fix documentation 404 page
Kent Yao created SPARK-39055: Summary: Fix documentation 404 page Key: SPARK-39055 URL: https://issues.apache.org/jira/browse/SPARK-39055 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.4.0 Reporter: Kent Yao 404 page is currently not working -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39054) GroupByTest failed due to axis Length mismatch
Yikun Jiang created SPARK-39054: --- Summary: GroupByTest failed due to axis Length mismatch Key: SPARK-39054 URL: https://issues.apache.org/jira/browse/SPARK-39054 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Yikun Jiang {code:java} An error occurred while calling o27083.getResult. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93) at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in stage 808.0 (TID 650) (localhost executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, in main process() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, in process serializer.dump_stream(out_iter, outfile) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 343, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 84, in dump_stream for batch in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 336, in init_stream_yield_batches for series in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, in mapper return f(keys, vals) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, in return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, in wrapped result = f(pd.concat(value_series, axis=1)) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, **kwargs) File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in rename_output pdf.columns = return_schema.names File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 5588, in __setattr__ return object.__setattr__(self, name, value) File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__ File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 769, in _set_axis self._mgr.set_axis(axis, labels) File "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", line 214, in set_axis self._validate_set_axis(axis, new_labels) File "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line 69, in _validate_set_axis raise ValueError( ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39053) test_multi_index_dtypes failed due to index mismatch
[ https://issues.apache.org/jira/browse/SPARK-39053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529274#comment-17529274 ] Apache Spark commented on SPARK-39053: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/36391 > test_multi_index_dtypes failed due to index mismatch > > > Key: SPARK-39053 > URL: https://issues.apache.org/jira/browse/SPARK-39053 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > {code:java} > DataFrameTest.test_multi_index_dtypesSeries.index are different > Series.index classes are different > [left]: MultiIndex([('zero', 'first'), > ( 'one', 'second')], >) > [right]: Index([('zero', 'first'), ('one', 'second')], dtype='object') > Left: > zero first int64 > one secondobject > dtype: object > object > Right: > (zero, first) int64 > (one, second)object > dtype: object > object {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39053) test_multi_index_dtypes failed due to index mismatch
[ https://issues.apache.org/jira/browse/SPARK-39053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529273#comment-17529273 ] Apache Spark commented on SPARK-39053: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/36391 > test_multi_index_dtypes failed due to index mismatch > > > Key: SPARK-39053 > URL: https://issues.apache.org/jira/browse/SPARK-39053 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > {code:java} > DataFrameTest.test_multi_index_dtypesSeries.index are different > Series.index classes are different > [left]: MultiIndex([('zero', 'first'), > ( 'one', 'second')], >) > [right]: Index([('zero', 'first'), ('one', 'second')], dtype='object') > Left: > zero first int64 > one secondobject > dtype: object > object > Right: > (zero, first) int64 > (one, second)object > dtype: object > object {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39053) test_multi_index_dtypes failed due to index mismatch
[ https://issues.apache.org/jira/browse/SPARK-39053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39053: Assignee: (was: Apache Spark) > test_multi_index_dtypes failed due to index mismatch > > > Key: SPARK-39053 > URL: https://issues.apache.org/jira/browse/SPARK-39053 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > {code:java} > DataFrameTest.test_multi_index_dtypesSeries.index are different > Series.index classes are different > [left]: MultiIndex([('zero', 'first'), > ( 'one', 'second')], >) > [right]: Index([('zero', 'first'), ('one', 'second')], dtype='object') > Left: > zero first int64 > one secondobject > dtype: object > object > Right: > (zero, first) int64 > (one, second)object > dtype: object > object {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39053) test_multi_index_dtypes failed due to index mismatch
[ https://issues.apache.org/jira/browse/SPARK-39053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39053: Assignee: Apache Spark > test_multi_index_dtypes failed due to index mismatch > > > Key: SPARK-39053 > URL: https://issues.apache.org/jira/browse/SPARK-39053 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > > {code:java} > DataFrameTest.test_multi_index_dtypesSeries.index are different > Series.index classes are different > [left]: MultiIndex([('zero', 'first'), > ( 'one', 'second')], >) > [right]: Index([('zero', 'first'), ('one', 'second')], dtype='object') > Left: > zero first int64 > one secondobject > dtype: object > object > Right: > (zero, first) int64 > (one, second)object > dtype: object > object {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39037) DS V2 aggregate push-down supports order by expressions
[ https://issues.apache.org/jira/browse/SPARK-39037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39037. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36370 [https://github.com/apache/spark/pull/36370] > DS V2 aggregate push-down supports order by expressions > --- > > Key: SPARK-39037 > URL: https://issues.apache.org/jira/browse/SPARK-39037 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > Currently, Spark DS V2 aggregate push-down only supports order by column. > But the SQL show below is very useful and common. > SELECT CASE WHEN ("SALARY" > 8000.00) AND ("SALARY" < 1.00) THEN "SALARY" > ELSE 0.00 END AS key, dept, name FROM "test"."employee" ORDER BY key -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39037) DS V2 aggregate push-down supports order by expressions
[ https://issues.apache.org/jira/browse/SPARK-39037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39037: --- Assignee: jiaan.geng > DS V2 aggregate push-down supports order by expressions > --- > > Key: SPARK-39037 > URL: https://issues.apache.org/jira/browse/SPARK-39037 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > Currently, Spark DS V2 aggregate push-down only supports order by column. > But the SQL show below is very useful and common. > SELECT CASE WHEN ("SALARY" > 8000.00) AND ("SALARY" < 1.00) THEN "SALARY" > ELSE 0.00 END AS key, dept, name FROM "test"."employee" ORDER BY key -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39053) test_multi_index_dtypes failed due to index mismatch
[ https://issues.apache.org/jira/browse/SPARK-39053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529259#comment-17529259 ] Yikun Jiang commented on SPARK-39053: - https://github.com/pandas-dev/pandas/commit/d06fb912782834125f1c9b0baaea1d60f2151c69 > test_multi_index_dtypes failed due to index mismatch > > > Key: SPARK-39053 > URL: https://issues.apache.org/jira/browse/SPARK-39053 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > {code:java} > DataFrameTest.test_multi_index_dtypesSeries.index are different > Series.index classes are different > [left]: MultiIndex([('zero', 'first'), > ( 'one', 'second')], >) > [right]: Index([('zero', 'first'), ('one', 'second')], dtype='object') > Left: > zero first int64 > one secondobject > dtype: object > object > Right: > (zero, first) int64 > (one, second)object > dtype: object > object {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39053) test_multi_index_dtypes failed due to index mismatch
[ https://issues.apache.org/jira/browse/SPARK-39053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang updated SPARK-39053: Parent: SPARK-38819 Issue Type: Sub-task (was: Bug) > test_multi_index_dtypes failed due to index mismatch > > > Key: SPARK-39053 > URL: https://issues.apache.org/jira/browse/SPARK-39053 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > {code:java} > DataFrameTest.test_multi_index_dtypesSeries.index are different > Series.index classes are different > [left]: MultiIndex([('zero', 'first'), > ( 'one', 'second')], >) > [right]: Index([('zero', 'first'), ('one', 'second')], dtype='object') > Left: > zero first int64 > one secondobject > dtype: object > object > Right: > (zero, first) int64 > (one, second)object > dtype: object > object {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39053) test_multi_index_dtypes failed due to index mismatch
Yikun Jiang created SPARK-39053: --- Summary: test_multi_index_dtypes failed due to index mismatch Key: SPARK-39053 URL: https://issues.apache.org/jira/browse/SPARK-39053 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.0 Reporter: Yikun Jiang {code:java} DataFrameTest.test_multi_index_dtypesSeries.index are different Series.index classes are different [left]: MultiIndex([('zero', 'first'), ( 'one', 'second')], ) [right]: Index([('zero', 'first'), ('one', 'second')], dtype='object') Left: zero first int64 one secondobject dtype: object object Right: (zero, first) int64 (one, second)object dtype: object object {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529233#comment-17529233 ] Apache Spark commented on SPARK-39047: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36390 > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529232#comment-17529232 ] Apache Spark commented on SPARK-39047: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36390 > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org