[jira] [Assigned] (SPARK-42179) Upgrade ORC to 1.7.8
[ https://issues.apache.org/jira/browse/SPARK-42179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42179: Assignee: (was: Apache Spark) > Upgrade ORC to 1.7.8 > > > Key: SPARK-42179 > URL: https://issues.apache.org/jira/browse/SPARK-42179 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.3.2 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42179) Upgrade ORC to 1.7.8
[ https://issues.apache.org/jira/browse/SPARK-42179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42179: Assignee: Apache Spark > Upgrade ORC to 1.7.8 > > > Key: SPARK-42179 > URL: https://issues.apache.org/jira/browse/SPARK-42179 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.3.2 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42179) Upgrade ORC to 1.7.8
[ https://issues.apache.org/jira/browse/SPARK-42179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680476#comment-17680476 ] Apache Spark commented on SPARK-42179: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39735 > Upgrade ORC to 1.7.8 > > > Key: SPARK-42179 > URL: https://issues.apache.org/jira/browse/SPARK-42179 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.3.2 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42179) Upgrade ORC to 1.7.8
[ https://issues.apache.org/jira/browse/SPARK-42179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42179: -- Component/s: Build > Upgrade ORC to 1.7.8 > > > Key: SPARK-42179 > URL: https://issues.apache.org/jira/browse/SPARK-42179 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.3.2 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42179) Upgrade ORC to 1.7.8
Dongjoon Hyun created SPARK-42179: - Summary: Upgrade ORC to 1.7.8 Key: SPARK-42179 URL: https://issues.apache.org/jira/browse/SPARK-42179 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.2 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41812) DataFrame.join: ambiguous column
[ https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680467#comment-17680467 ] Apache Spark commented on SPARK-41812: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39734 > DataFrame.join: ambiguous column > > > Key: SPARK-41812 > URL: https://issues.apache.org/jira/browse/SPARK-41812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df1.join(df2, df1["value"] == df2["value"]).count() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df1.join(df2, df1["value"] == df2["value"]).count() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in > count > pdd = self.agg(_invoke_function("count", lit(1))).toPandas() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, > in toPandas > return self._session.client.to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in > to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in > _execute_and_fetch > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in > _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, > `value`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680468#comment-17680468 ] Apache Spark commented on SPARK-41823: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39734 > DataFrame.join creating ambiguous column names > -- > > Key: SPARK-41823 > URL: https://issues.apache.org/jira/browse/SPARK-41823 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 254, in pyspark.sql.connect.dataframe.DataFrame.drop > Failed example: > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, > `name`]. > Plan: {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41812) DataFrame.join: ambiguous column
[ https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41812: Assignee: (was: Apache Spark) > DataFrame.join: ambiguous column > > > Key: SPARK-41812 > URL: https://issues.apache.org/jira/browse/SPARK-41812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df1.join(df2, df1["value"] == df2["value"]).count() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df1.join(df2, df1["value"] == df2["value"]).count() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in > count > pdd = self.agg(_invoke_function("count", lit(1))).toPandas() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, > in toPandas > return self._session.client.to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in > to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in > _execute_and_fetch > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in > _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, > `value`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41812) DataFrame.join: ambiguous column
[ https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41812: Assignee: Apache Spark > DataFrame.join: ambiguous column > > > Key: SPARK-41812 > URL: https://issues.apache.org/jira/browse/SPARK-41812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df1.join(df2, df1["value"] == df2["value"]).count() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df1.join(df2, df1["value"] == df2["value"]).count() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in > count > pdd = self.agg(_invoke_function("count", lit(1))).toPandas() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, > in toPandas > return self._session.client.to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in > to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in > _execute_and_fetch > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in > _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, > `value`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests
[ https://issues.apache.org/jira/browse/SPARK-42178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680455#comment-17680455 ] Apache Spark commented on SPARK-42178: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39732 > Handle remaining null string values in ui protobuf serializer and add tests > --- > > Key: SPARK-42178 > URL: https://issues.apache.org/jira/browse/SPARK-42178 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests
[ https://issues.apache.org/jira/browse/SPARK-42178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680454#comment-17680454 ] Apache Spark commented on SPARK-42178: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39732 > Handle remaining null string values in ui protobuf serializer and add tests > --- > > Key: SPARK-42178 > URL: https://issues.apache.org/jira/browse/SPARK-42178 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests
[ https://issues.apache.org/jira/browse/SPARK-42178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42178: Assignee: Gengliang Wang (was: Apache Spark) > Handle remaining null string values in ui protobuf serializer and add tests > --- > > Key: SPARK-42178 > URL: https://issues.apache.org/jira/browse/SPARK-42178 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests
[ https://issues.apache.org/jira/browse/SPARK-42178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42178: Assignee: Apache Spark (was: Gengliang Wang) > Handle remaining null string values in ui protobuf serializer and add tests > --- > > Key: SPARK-42178 > URL: https://issues.apache.org/jira/browse/SPARK-42178 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests
Gengliang Wang created SPARK-42178: -- Summary: Handle remaining null string values in ui protobuf serializer and add tests Key: SPARK-42178 URL: https://issues.apache.org/jira/browse/SPARK-42178 Project: Spark Issue Type: Sub-task Components: Web UI Affects Versions: 3.4.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42177) Change master to brach-3.4 in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680438#comment-17680438 ] Apache Spark commented on SPARK-42177: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39731 > Change master to brach-3.4 in GitHub Actions > > > Key: SPARK-42177 > URL: https://issues.apache.org/jira/browse/SPARK-42177 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42177) Change master to brach-3.4 in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680437#comment-17680437 ] Apache Spark commented on SPARK-42177: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39731 > Change master to brach-3.4 in GitHub Actions > > > Key: SPARK-42177 > URL: https://issues.apache.org/jira/browse/SPARK-42177 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42177) Change master to brach-3.4 in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680435#comment-17680435 ] Apache Spark commented on SPARK-42177: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39730 > Change master to brach-3.4 in GitHub Actions > > > Key: SPARK-42177 > URL: https://issues.apache.org/jira/browse/SPARK-42177 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42177) Change master to brach-3.4 in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42177: Assignee: Hyukjin Kwon > Change master to brach-3.4 in GitHub Actions > > > Key: SPARK-42177 > URL: https://issues.apache.org/jira/browse/SPARK-42177 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42177) Change master to brach-3.4 in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42177. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39731 [https://github.com/apache/spark/pull/39731] > Change master to brach-3.4 in GitHub Actions > > > Key: SPARK-42177 > URL: https://issues.apache.org/jira/browse/SPARK-42177 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42176: - Affects Version/s: 3.5.0 > Cast boolean to timestamp fails with ClassCastException > --- > > Key: SPARK-42176 > URL: https://issues.apache.org/jira/browse/SPARK-42176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0, 3.5.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Fix For: 3.3.2, 3.4.0, 3.5.0 > > > When casting a boolean value to timestamp, the following error is thrown: > {code:java} > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42176: - Fix Version/s: 3.5.0 > Cast boolean to timestamp fails with ClassCastException > --- > > Key: SPARK-42176 > URL: https://issues.apache.org/jira/browse/SPARK-42176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Fix For: 3.3.2, 3.4.0, 3.5.0 > > > When casting a boolean value to timestamp, the following error is thrown: > {code:java} > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42176: Assignee: Ivan Sadikov > Cast boolean to timestamp fails with ClassCastException > --- > > Key: SPARK-42176 > URL: https://issues.apache.org/jira/browse/SPARK-42176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > > When casting a boolean value to timestamp, the following error is thrown: > {code:java} > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42176. -- Fix Version/s: 3.3.2 3.4.0 Resolution: Fixed Issue resolved by pull request 39729 [https://github.com/apache/spark/pull/39729] > Cast boolean to timestamp fails with ClassCastException > --- > > Key: SPARK-42176 > URL: https://issues.apache.org/jira/browse/SPARK-42176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Fix For: 3.3.2, 3.4.0 > > > When casting a boolean value to timestamp, the following error is thrown: > {code:java} > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42177) Change master to brach-3.4 in GitHub Actions
Hyukjin Kwon created SPARK-42177: Summary: Change master to brach-3.4 in GitHub Actions Key: SPARK-42177 URL: https://issues.apache.org/jira/browse/SPARK-42177 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.4.0 Reporter: Hyukjin Kwon See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42176: Assignee: (was: Apache Spark) > Cast boolean to timestamp fails with ClassCastException > --- > > Key: SPARK-42176 > URL: https://issues.apache.org/jira/browse/SPARK-42176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > > When casting a boolean value to timestamp, the following error is thrown: > {code:java} > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42176: Assignee: Apache Spark > Cast boolean to timestamp fails with ClassCastException > --- > > Key: SPARK-42176 > URL: https://issues.apache.org/jira/browse/SPARK-42176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 >Reporter: Ivan Sadikov >Assignee: Apache Spark >Priority: Major > > When casting a boolean value to timestamp, the following error is thrown: > {code:java} > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680427#comment-17680427 ] Apache Spark commented on SPARK-42176: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/39729 > Cast boolean to timestamp fails with ClassCastException > --- > > Key: SPARK-42176 > URL: https://issues.apache.org/jira/browse/SPARK-42176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > > When casting a boolean value to timestamp, the following error is thrown: > {code:java} > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42107) Spark 3.3.0 binary breaking change missing from release notes
[ https://issues.apache.org/jira/browse/SPARK-42107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680420#comment-17680420 ] Hyukjin Kwon commented on SPARK-42107: -- cc [~cloud_fan] and [~dchvn]. I think we should at least add them into release notes. > Spark 3.3.0 binary breaking change missing from release notes > - > > Key: SPARK-42107 > URL: https://issues.apache.org/jira/browse/SPARK-42107 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 3.3.0 >Reporter: Ross Lawley >Priority: Major > > SPARK-37929 contains a binary breaking change in the SupportsNamespaces API > See: [https://github.com/apache/spark/pull/35246/files#r792289685] > > There is no mention in the [release > notes|https://spark.apache.org/releases/spark-release-3-3-0.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42118) Wrong result when parsing a multiline JSON file with differing types for same column
[ https://issues.apache.org/jira/browse/SPARK-42118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680419#comment-17680419 ] Hyukjin Kwon commented on SPARK-42118: -- As a workaround you can do: {code} val df = spark.read.format("json").option("multiLine", true).load("/tmp/json") val newDF = spark.createDataFrame(df.rdd, df.schema) {code} then {code} df.show(false) df.count {code} will show the consistent output. > Wrong result when parsing a multiline JSON file with differing types for same > column > > > Key: SPARK-42118 > URL: https://issues.apache.org/jira/browse/SPARK-42118 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: Dilip Biswal >Priority: Major > > Here is a simple reproduction of the problem. We have a JSON file whose > content looks like following and is in multiLine format. > {code} > [{"name":""},{"name":123.34}] > {code} > Here is the result of spark query when we read the above content. > scala> val df = spark.read.format("json").option("multiLine", > true).load("/tmp/json") > df: org.apache.spark.sql.DataFrame = [name: double] > scala> df.show(false) > ++ > |name| > ++ > |null| > ++ > scala> df.count > res5: Long = 2 > This is quite a serious problem for us as it's causing us to master corrupt > data in lake. If there is some issue with parsing the input, we expect spark > set the "_corrupt_record" so that we can act on it. Please note that df.count > is reporting 2 rows where as df.show only reports 1 row with null value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42175) Implement more methods in the Scala Client Dataset API
[ https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680418#comment-17680418 ] Apache Spark commented on SPARK-42175: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/39729 > Implement more methods in the Scala Client Dataset API > -- > > Key: SPARK-42175 > URL: https://issues.apache.org/jira/browse/SPARK-42175 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Also fix the TODOs in the MiMa compatibility test. > https://github.com/apache/spark/pull/39712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42175) Implement more methods in the Scala Client Dataset API
[ https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42175: Assignee: (was: Apache Spark) > Implement more methods in the Scala Client Dataset API > -- > > Key: SPARK-42175 > URL: https://issues.apache.org/jira/browse/SPARK-42175 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Also fix the TODOs in the MiMa compatibility test. > https://github.com/apache/spark/pull/39712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42175) Implement more methods in the Scala Client Dataset API
[ https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680417#comment-17680417 ] Apache Spark commented on SPARK-42175: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/39729 > Implement more methods in the Scala Client Dataset API > -- > > Key: SPARK-42175 > URL: https://issues.apache.org/jira/browse/SPARK-42175 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Also fix the TODOs in the MiMa compatibility test. > https://github.com/apache/spark/pull/39712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42175) Implement more methods in the Scala Client Dataset API
[ https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42175: Assignee: Apache Spark > Implement more methods in the Scala Client Dataset API > -- > > Key: SPARK-42175 > URL: https://issues.apache.org/jira/browse/SPARK-42175 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Major > > Also fix the TODOs in the MiMa compatibility test. > https://github.com/apache/spark/pull/39712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
[ https://issues.apache.org/jira/browse/SPARK-42127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680414#comment-17680414 ] Hyukjin Kwon edited comment on SPARK-42127 at 1/25/23 12:55 AM: [~shamim_er123]how did you face this error? would be great if there are steps to reproduce this. was (Author: gurwls223): [~shamim_er123]how did you reproduce this? > Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file > - > > Key: SPARK-42127 > URL: https://issues.apache.org/jira/browse/SPARK-42127 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: shamim >Priority: Major > > 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) > (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create > file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0 > (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081) > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113) > at > org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238) > at > org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126) > at > org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
[ https://issues.apache.org/jira/browse/SPARK-42127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680414#comment-17680414 ] Hyukjin Kwon commented on SPARK-42127: -- [~shamim_er123]how did you reproduce this? > Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file > - > > Key: SPARK-42127 > URL: https://issues.apache.org/jira/browse/SPARK-42127 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: shamim >Priority: Major > > 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) > (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create > file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0 > (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081) > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113) > at > org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238) > at > org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126) > at > org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
[ https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680413#comment-17680413 ] Hyukjin Kwon commented on SPARK-42033: -- How is this a Spark issue? > Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline > > > Key: SPARK-42033 > URL: https://issues.apache.org/jira/browse/SPARK-42033 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1 >Reporter: Pankaj Nagla >Priority: Major > > I'm going through the "Scalable FastAPI Application on AWS" course. My > gitlab-ci.yml file is below. > stages: > - docker > variables: > DOCKER_DRIVER: overlay2 > DOCKER_TLS_CERTDIR: "/certs" > cache: > key: ${CI_JOB_NAME} > paths: > - ${CI_PROJECT_DIR}/services/talk_booking/.venv/ > build-python-ci-image: > image: docker:19.03.0 > services: > - docker:19.03.0-dind > stage: docker > before_script: > - cd ci_cd/python/ > script: > - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" > $CI_REGISTRY > - docker build -t > registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim . > - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim > My Pipeline fails with this error: > See > [https://docs.docker.com/engine/reference/commandline/login/#credentials-store] > Login Succeeded > $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim > . > invalid argument > "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" > flag: invalid reference format > See 'docker build --help'. > Cleaning up project directory and file based variables > ERROR: Job failed: exit code 125 > It may or may not be relevant but the Container Registry for the GitLab > project says there's a Docker connection error. All these problems have been > discussed in this [Aws Sysops Training > |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow > the page. > Thanks > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42034) QueryExecutionListener and Observation API, df.observe do not work with `foreach` action.
[ https://issues.apache.org/jira/browse/SPARK-42034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680412#comment-17680412 ] Hyukjin Kwon commented on SPARK-42034: -- please go ahead a PR if you're interested in doing that! > QueryExecutionListener and Observation API, df.observe do not work with > `foreach` action. > - > > Key: SPARK-42034 > URL: https://issues.apache.org/jira/browse/SPARK-42034 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.2, 3.3.1 > Environment: I test it locally and on YARN in cluster mode. > Spark 3.3.1 and 3.2.2 and 3.1.1. > Yarn 2.9.2 and 3.2.1. >Reporter: Nick Hryhoriev >Priority: Major > Labels: sql-api > > Observation API, {{observe}} dataframe transformation, and custom > QueryExecutionListener. > Do not work with {{foreach}} or {{foreachPartition actions.}} > {{This is due to }}QueryExecutionListener functions do not trigger on queries > whose action is {{foreach}} or {{{}foreachPartition{}}}. > But the Spark GUI SQL tab sees this query as SQL query and shows its query > plans and etc. > here is the code to reproduce it: > https://gist.github.com/GrigorievNick/e7cf9ec5584b417d9719e2812722e6d3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42034) QueryExecutionListener and Observation API, df.observe do not work with `foreach` action.
[ https://issues.apache.org/jira/browse/SPARK-42034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680411#comment-17680411 ] Hyukjin Kwon commented on SPARK-42034: -- I think the action has to be triggered by using `withAction` in the codes. > QueryExecutionListener and Observation API, df.observe do not work with > `foreach` action. > - > > Key: SPARK-42034 > URL: https://issues.apache.org/jira/browse/SPARK-42034 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.2, 3.3.1 > Environment: I test it locally and on YARN in cluster mode. > Spark 3.3.1 and 3.2.2 and 3.1.1. > Yarn 2.9.2 and 3.2.1. >Reporter: Nick Hryhoriev >Priority: Major > Labels: sql-api > > Observation API, {{observe}} dataframe transformation, and custom > QueryExecutionListener. > Do not work with {{foreach}} or {{foreachPartition actions.}} > {{This is due to }}QueryExecutionListener functions do not trigger on queries > whose action is {{foreach}} or {{{}foreachPartition{}}}. > But the Spark GUI SQL tab sees this query as SQL query and shows its query > plans and etc. > here is the code to reproduce it: > https://gist.github.com/GrigorievNick/e7cf9ec5584b417d9719e2812722e6d3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
[ https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42033: - Target Version/s: (was: 1.5.2) > Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline > > > Key: SPARK-42033 > URL: https://issues.apache.org/jira/browse/SPARK-42033 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1 >Reporter: Pankaj Nagla >Priority: Major > Fix For: 1.6.2 > > > I'm going through the "Scalable FastAPI Application on AWS" course. My > gitlab-ci.yml file is below. > stages: > - docker > variables: > DOCKER_DRIVER: overlay2 > DOCKER_TLS_CERTDIR: "/certs" > cache: > key: ${CI_JOB_NAME} > paths: > - ${CI_PROJECT_DIR}/services/talk_booking/.venv/ > build-python-ci-image: > image: docker:19.03.0 > services: > - docker:19.03.0-dind > stage: docker > before_script: > - cd ci_cd/python/ > script: > - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" > $CI_REGISTRY > - docker build -t > registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim . > - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim > My Pipeline fails with this error: > See > [https://docs.docker.com/engine/reference/commandline/login/#credentials-store] > Login Succeeded > $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim > . > invalid argument > "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" > flag: invalid reference format > See 'docker build --help'. > Cleaning up project directory and file based variables > ERROR: Job failed: exit code 125 > It may or may not be relevant but the Container Registry for the GitLab > project says there's a Docker connection error. All these problems have been > discussed in this [Aws Sysops Training > |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow > the page. > Thanks > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
[ https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42033: - Fix Version/s: (was: 1.6.2) > Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline > > > Key: SPARK-42033 > URL: https://issues.apache.org/jira/browse/SPARK-42033 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1 >Reporter: Pankaj Nagla >Priority: Major > > I'm going through the "Scalable FastAPI Application on AWS" course. My > gitlab-ci.yml file is below. > stages: > - docker > variables: > DOCKER_DRIVER: overlay2 > DOCKER_TLS_CERTDIR: "/certs" > cache: > key: ${CI_JOB_NAME} > paths: > - ${CI_PROJECT_DIR}/services/talk_booking/.venv/ > build-python-ci-image: > image: docker:19.03.0 > services: > - docker:19.03.0-dind > stage: docker > before_script: > - cd ci_cd/python/ > script: > - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" > $CI_REGISTRY > - docker build -t > registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim . > - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim > My Pipeline fails with this error: > See > [https://docs.docker.com/engine/reference/commandline/login/#credentials-store] > Login Succeeded > $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim > . > invalid argument > "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" > flag: invalid reference format > See 'docker build --help'. > Cleaning up project directory and file based variables > ERROR: Job failed: exit code 125 > It may or may not be relevant but the Container Registry for the GitLab > project says there's a Docker connection error. All these problems have been > discussed in this [Aws Sysops Training > |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow > the page. > Thanks > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
Ivan Sadikov created SPARK-42176: Summary: Cast boolean to timestamp fails with ClassCastException Key: SPARK-42176 URL: https://issues.apache.org/jira/browse/SPARK-42176 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.1, 3.4.0 Reporter: Ivan Sadikov -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-42176: - Description: When casting a boolean value to timestamp, the following error is thrown: {code:java} [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) [info] at org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) [info] at org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) {code} > Cast boolean to timestamp fails with ClassCastException > --- > > Key: SPARK-42176 > URL: https://issues.apache.org/jira/browse/SPARK-42176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > > When casting a boolean value to timestamp, the following error is thrown: > {code:java} > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) > [info] at > org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42175) Implement more methods in the Scala Client Dataset API
[ https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42175: Description: Also fix the TODOs in the MiMa compatibility test. https://github.com/apache/spark/pull/39712 > Implement more methods in the Scala Client Dataset API > -- > > Key: SPARK-42175 > URL: https://issues.apache.org/jira/browse/SPARK-42175 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Also fix the TODOs in the MiMa compatibility test. > https://github.com/apache/spark/pull/39712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42175) Implement more methods in the Scala Client Dataset API
Zhen Li created SPARK-42175: --- Summary: Implement more methods in the Scala Client Dataset API Key: SPARK-42175 URL: https://issues.apache.org/jira/browse/SPARK-42175 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42173) IPv6 address mapping can fail with sparse addresses
[ https://issues.apache.org/jira/browse/SPARK-42173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680410#comment-17680410 ] Apache Spark commented on SPARK-42173: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/39728 > IPv6 address mapping can fail with sparse addresses > --- > > Key: SPARK-42173 > URL: https://issues.apache.org/jira/browse/SPARK-42173 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 > Environment: I've only observed this on Kube but it might be able to > happen on YARN since the mapping is in core. >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when > it comes time to handle the `onDisconnect` all of the implicit zeros are > expanded out. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42173) IPv6 address mapping can fail with sparse addresses
[ https://issues.apache.org/jira/browse/SPARK-42173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42173: Assignee: Holden Karau (was: Apache Spark) > IPv6 address mapping can fail with sparse addresses > --- > > Key: SPARK-42173 > URL: https://issues.apache.org/jira/browse/SPARK-42173 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 > Environment: I've only observed this on Kube but it might be able to > happen on YARN since the mapping is in core. >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when > it comes time to handle the `onDisconnect` all of the implicit zeros are > expanded out. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42173) IPv6 address mapping can fail with sparse addresses
[ https://issues.apache.org/jira/browse/SPARK-42173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680409#comment-17680409 ] Apache Spark commented on SPARK-42173: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/39728 > IPv6 address mapping can fail with sparse addresses > --- > > Key: SPARK-42173 > URL: https://issues.apache.org/jira/browse/SPARK-42173 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 > Environment: I've only observed this on Kube but it might be able to > happen on YARN since the mapping is in core. >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when > it comes time to handle the `onDisconnect` all of the implicit zeros are > expanded out. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42173) IPv6 address mapping can fail with sparse addresses
[ https://issues.apache.org/jira/browse/SPARK-42173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42173: Assignee: Apache Spark (was: Holden Karau) > IPv6 address mapping can fail with sparse addresses > --- > > Key: SPARK-42173 > URL: https://issues.apache.org/jira/browse/SPARK-42173 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 > Environment: I've only observed this on Kube but it might be able to > happen on YARN since the mapping is in core. >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Major > > e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when > it comes time to handle the `onDisconnect` all of the implicit zeros are > expanded out. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42119) Add built-in table-valued functions inline and inline_outer
[ https://issues.apache.org/jira/browse/SPARK-42119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42119: Assignee: Allison Wang > Add built-in table-valued functions inline and inline_outer > --- > > Key: SPARK-42119 > URL: https://issues.apache.org/jira/browse/SPARK-42119 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Add `inline` and `inline_outer` to the built-in table function registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42119) Add built-in table-valued functions inline and inline_outer
[ https://issues.apache.org/jira/browse/SPARK-42119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42119. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39656 [https://github.com/apache/spark/pull/39656] > Add built-in table-valued functions inline and inline_outer > --- > > Key: SPARK-42119 > URL: https://issues.apache.org/jira/browse/SPARK-42119 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.4.0 > > > Add `inline` and `inline_outer` to the built-in table function registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42174) Use scikit-learn instead of sklearn
[ https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42174: Assignee: Dongjoon Hyun > Use scikit-learn instead of sklearn > --- > > Key: SPARK-42174 > URL: https://issues.apache.org/jira/browse/SPARK-42174 > Project: Spark > Issue Type: Bug > Components: Project Infra, PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42174) Use scikit-learn instead of sklearn
[ https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42174. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39727 [https://github.com/apache/spark/pull/39727] > Use scikit-learn instead of sklearn > --- > > Key: SPARK-42174 > URL: https://issues.apache.org/jira/browse/SPARK-42174 > Project: Spark > Issue Type: Bug > Components: Project Infra, PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36124) Support set operators to be on correlation paths
[ https://issues.apache.org/jira/browse/SPARK-36124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36124: Assignee: Allison Wang > Support set operators to be on correlation paths > > > Key: SPARK-36124 > URL: https://issues.apache.org/jira/browse/SPARK-36124 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > A correlation path is defined as the sub-tree of all the operators that are > on the path from the operator hosting the correlated expressions up to the > operator producing the correlated values. > We want to support set operators such as union and intercept to be on > correlation paths by adding them in DecorrelateInnerQuery. Please see page > 391 for more details: > [https://dl.gi.de/bitstream/handle/20.500.12116/2418/383.pdf] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36124) Support set operators to be on correlation paths
[ https://issues.apache.org/jira/browse/SPARK-36124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36124. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39375 [https://github.com/apache/spark/pull/39375] > Support set operators to be on correlation paths > > > Key: SPARK-36124 > URL: https://issues.apache.org/jira/browse/SPARK-36124 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.4.0 > > > A correlation path is defined as the sub-tree of all the operators that are > on the path from the operator hosting the correlated expressions up to the > operator producing the correlated values. > We want to support set operators such as union and intercept to be on > correlation paths by adding them in DecorrelateInnerQuery. Please see page > 391 for more details: > [https://dl.gi.de/bitstream/handle/20.500.12116/2418/383.pdf] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42174) Use scikit-learn instead of sklearn
[ https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680379#comment-17680379 ] Apache Spark commented on SPARK-42174: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39727 > Use scikit-learn instead of sklearn > --- > > Key: SPARK-42174 > URL: https://issues.apache.org/jira/browse/SPARK-42174 > Project: Spark > Issue Type: Bug > Components: Project Infra, PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42174) Use scikit-learn instead of sklearn
[ https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42174: Assignee: Apache Spark > Use scikit-learn instead of sklearn > --- > > Key: SPARK-42174 > URL: https://issues.apache.org/jira/browse/SPARK-42174 > Project: Spark > Issue Type: Bug > Components: Project Infra, PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42174) Use scikit-learn instead of sklearn
[ https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42174: Assignee: (was: Apache Spark) > Use scikit-learn instead of sklearn > --- > > Key: SPARK-42174 > URL: https://issues.apache.org/jira/browse/SPARK-42174 > Project: Spark > Issue Type: Bug > Components: Project Infra, PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42174) Use scikit-learn instead of sklearn
Dongjoon Hyun created SPARK-42174: - Summary: Use scikit-learn instead of sklearn Key: SPARK-42174 URL: https://issues.apache.org/jira/browse/SPARK-42174 Project: Spark Issue Type: Bug Components: Project Infra, PySpark Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42123) Include column default values in DESCRIBE output
[ https://issues.apache.org/jira/browse/SPARK-42123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680362#comment-17680362 ] Apache Spark commented on SPARK-42123: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/39726 > Include column default values in DESCRIBE output > > > Key: SPARK-42123 > URL: https://issues.apache.org/jira/browse/SPARK-42123 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42171) Fix `pyspark-errors` module and enable it in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42171. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39721 [https://github.com/apache/spark/pull/39721] > Fix `pyspark-errors` module and enable it in GitHub Action > -- > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Labels: 3.4.0 > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42171) Fix `pyspark-errors` module and enable it in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42171: - Assignee: Dongjoon Hyun > Fix `pyspark-errors` module and enable it in GitHub Action > -- > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Labels: 3.4.0 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33573) Server side metrics related to push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-33573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680350#comment-17680350 ] Apache Spark commented on SPARK-33573: -- User 'rmcyang' has created a pull request for this issue: https://github.com/apache/spark/pull/39725 > Server side metrics related to push-based shuffle > - > > Key: SPARK-33573 > URL: https://issues.apache.org/jira/browse/SPARK-33573 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Assignee: Minchu Yang >Priority: Major > Fix For: 3.4.0 > > > Shuffle Server side metrics for push based shuffle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42173) IPv6 address mapping can fail with sparse addresses
Holden Karau created SPARK-42173: Summary: IPv6 address mapping can fail with sparse addresses Key: SPARK-42173 URL: https://issues.apache.org/jira/browse/SPARK-42173 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Environment: I've only observed this on Kube but it might be able to happen on YARN since the mapping is in core. Reporter: Holden Karau Assignee: Holden Karau e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when it comes time to handle the `onDisconnect` all of the implicit zeros are expanded out. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42103) Add Instrumentation
[ https://issues.apache.org/jira/browse/SPARK-42103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani resolved SPARK-42103. -- Resolution: Not A Problem > Add Instrumentation > --- > > Key: SPARK-42103 > URL: https://issues.apache.org/jira/browse/SPARK-42103 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Adding instrumentation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41590) Implement Baseline API Code
[ https://issues.apache.org/jira/browse/SPARK-41590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani resolved SPARK-41590. -- Resolution: Fixed > Implement Baseline API Code > --- > > Key: SPARK-41590 > URL: https://issues.apache.org/jira/browse/SPARK-41590 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Creating a baseline API so that we can agree on how the users will interact > with the code. This was determined in this [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and can be updated as necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680337#comment-17680337 ] Apache Spark commented on SPARK-41775: -- User 'rithwik-db' has created a pull request for this issue: https://github.com/apache/spark/pull/39724 > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680336#comment-17680336 ] Apache Spark commented on SPARK-41775: -- User 'rithwik-db' has created a pull request for this issue: https://github.com/apache/spark/pull/39724 > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185
[ https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41302: Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_1185 > > > Key: SPARK-41302 > URL: https://issues.apache.org/jira/browse/SPARK-41302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve > error message and tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185
[ https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680324#comment-17680324 ] Apache Spark commented on SPARK-41302: -- User 'NarekDW' has created a pull request for this issue: https://github.com/apache/spark/pull/39723 > Assign a name to the error class _LEGACY_ERROR_TEMP_1185 > > > Key: SPARK-41302 > URL: https://issues.apache.org/jira/browse/SPARK-41302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve > error message and tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185
[ https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41302: Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_1185 > > > Key: SPARK-41302 > URL: https://issues.apache.org/jira/browse/SPARK-41302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > > Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve > error message and tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42172) Compatibility check for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680321#comment-17680321 ] Apache Spark commented on SPARK-42172: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/39712 > Compatibility check for Scala Client > > > Key: SPARK-42172 > URL: https://issues.apache.org/jira/browse/SPARK-42172 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Adding compatibility checks for Scala client to ensure the Scala Client API > is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42172) Compatibility check for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42172: Assignee: (was: Apache Spark) > Compatibility check for Scala Client > > > Key: SPARK-42172 > URL: https://issues.apache.org/jira/browse/SPARK-42172 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Adding compatibility checks for Scala client to ensure the Scala Client API > is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42172) Compatibility check for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42172: Assignee: Apache Spark > Compatibility check for Scala Client > > > Key: SPARK-42172 > URL: https://issues.apache.org/jira/browse/SPARK-42172 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Major > > Adding compatibility checks for Scala client to ensure the Scala Client API > is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42172) Compatibility check for Scala Client
Zhen Li created SPARK-42172: --- Summary: Compatibility check for Scala Client Key: SPARK-42172 URL: https://issues.apache.org/jira/browse/SPARK-42172 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Adding compatibility checks for Scala client to ensure the Scala Client API is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations
[ https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680317#comment-17680317 ] Apache Spark commented on SPARK-42162: -- User 'db-scnakandala' has created a pull request for this issue: https://github.com/apache/spark/pull/39722 > Memory usage on executors increased drastically for a complex query with > large number of addition operations > > > Key: SPARK-42162 > URL: https://issues.apache.org/jira/browse/SPARK-42162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Supun Nakandala >Priority: Major > > With the [recent changes|https://github.com/apache/spark/pull/37851] in the > expression canonicalization, a complex query with a large number of Add > operations ends up consuming 10x more memory on the executors. > The reason for this issue is that with the new changes the canonicalization > process ends up generating lot of intermediate objects, especially for > complex queries with a large number of commutative operators. In this > specific case, a heap histogram analysis shows that a large number of Add > objects use the extra memory. > This issue does not happen before PR > [#37851.|https://github.com/apache/spark/pull/37851] > The high memory usage causes the executors to lose heartbeat signals and > results in task failures. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations
[ https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680316#comment-17680316 ] Apache Spark commented on SPARK-42162: -- User 'db-scnakandala' has created a pull request for this issue: https://github.com/apache/spark/pull/39722 > Memory usage on executors increased drastically for a complex query with > large number of addition operations > > > Key: SPARK-42162 > URL: https://issues.apache.org/jira/browse/SPARK-42162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Supun Nakandala >Priority: Major > > With the [recent changes|https://github.com/apache/spark/pull/37851] in the > expression canonicalization, a complex query with a large number of Add > operations ends up consuming 10x more memory on the executors. > The reason for this issue is that with the new changes the canonicalization > process ends up generating lot of intermediate objects, especially for > complex queries with a large number of commutative operators. In this > specific case, a heap histogram analysis shows that a large number of Add > objects use the extra memory. > This issue does not happen before PR > [#37851.|https://github.com/apache/spark/pull/37851] > The high memory usage causes the executors to lose heartbeat signals and > results in task failures. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations
[ https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42162: Assignee: (was: Apache Spark) > Memory usage on executors increased drastically for a complex query with > large number of addition operations > > > Key: SPARK-42162 > URL: https://issues.apache.org/jira/browse/SPARK-42162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Supun Nakandala >Priority: Major > > With the [recent changes|https://github.com/apache/spark/pull/37851] in the > expression canonicalization, a complex query with a large number of Add > operations ends up consuming 10x more memory on the executors. > The reason for this issue is that with the new changes the canonicalization > process ends up generating lot of intermediate objects, especially for > complex queries with a large number of commutative operators. In this > specific case, a heap histogram analysis shows that a large number of Add > objects use the extra memory. > This issue does not happen before PR > [#37851.|https://github.com/apache/spark/pull/37851] > The high memory usage causes the executors to lose heartbeat signals and > results in task failures. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations
[ https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42162: Assignee: Apache Spark > Memory usage on executors increased drastically for a complex query with > large number of addition operations > > > Key: SPARK-42162 > URL: https://issues.apache.org/jira/browse/SPARK-42162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Supun Nakandala >Assignee: Apache Spark >Priority: Major > > With the [recent changes|https://github.com/apache/spark/pull/37851] in the > expression canonicalization, a complex query with a large number of Add > operations ends up consuming 10x more memory on the executors. > The reason for this issue is that with the new changes the canonicalization > process ends up generating lot of intermediate objects, especially for > complex queries with a large number of commutative operators. In this > specific case, a heap histogram analysis shows that a large number of Add > objects use the extra memory. > This issue does not happen before PR > [#37851.|https://github.com/apache/spark/pull/37851] > The high memory usage causes the executors to lose heartbeat signals and > results in task failures. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42171) Fix `pyspark-errors` module and enable it in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42171: -- Summary: Fix `pyspark-errors` module and enable it in GitHub Action (was: Enable `pyspark-errors` module test in GitHub Action) > Fix `pyspark-errors` module and enable it in GitHub Action > -- > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Blocker > Labels: 3.4.0 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42171: -- Labels: 3.4.0 (was: ) > Enable `pyspark-errors` module test in GitHub Action > > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Blocker > Labels: 3.4.0 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42171: -- Priority: Blocker (was: Major) > Enable `pyspark-errors` module test in GitHub Action > > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42171: -- Component/s: Tests > Enable `pyspark-errors` module test in GitHub Action > > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Blocker > Labels: 3.4.0 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
[ https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-42090: Fix Version/s: 3.2.4 3.3.2 > Introduce sasl retry count in RetryingBlockTransferor > - > > Key: SPARK-42090 > URL: https://issues.apache.org/jira/browse/SPARK-42090 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.2.4, 3.3.2, 3.4.0 > > > Previously a boolean variable, saslTimeoutSeen, was used in > RetryingBlockTransferor. However, the boolean variable wouldn't cover the > following scenario: > 1. SaslTimeoutException > 2. IOException > 3. SaslTimeoutException > 4. IOException > Even though IOException at #2 is retried (resulting in increment of > retryCount), the retryCount would be cleared at step #4. > Since the intention of saslTimeoutSeen is to undo the increment due to > retrying SaslTimeoutException, we should keep a counter for > SaslTimeoutException retries and subtract the value of this counter from > retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42171: Assignee: Apache Spark > Enable `pyspark-errors` module test in GitHub Action > > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42171: Assignee: (was: Apache Spark) > Enable `pyspark-errors` module test in GitHub Action > > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680307#comment-17680307 ] Apache Spark commented on SPARK-42171: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39721 > Enable `pyspark-errors` module test in GitHub Action > > > Key: SPARK-42171 > URL: https://issues.apache.org/jira/browse/SPARK-42171 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action
Dongjoon Hyun created SPARK-42171: - Summary: Enable `pyspark-errors` module test in GitHub Action Key: SPARK-42171 URL: https://issues.apache.org/jira/browse/SPARK-42171 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42167) Improve GitHub Action `lint` job to stop on failures earlier
[ https://issues.apache.org/jira/browse/SPARK-42167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42167: - Assignee: Dongjoon Hyun > Improve GitHub Action `lint` job to stop on failures earlier > > > Key: SPARK-42167 > URL: https://issues.apache.org/jira/browse/SPARK-42167 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42167) Improve GitHub Action `lint` job to stop on failures earlier
[ https://issues.apache.org/jira/browse/SPARK-42167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42167. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39716 [https://github.com/apache/spark/pull/39716] > Improve GitHub Action `lint` job to stop on failures earlier > > > Key: SPARK-42167 > URL: https://issues.apache.org/jira/browse/SPARK-42167 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type
[ https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680279#comment-17680279 ] Apache Spark commented on SPARK-41500: -- User 'NarekDW' has created a pull request for this issue: https://github.com/apache/spark/pull/39720 > auto generate concat as Double when string minus an INTERVAL type > - > > Key: SPARK-41500 > URL: https://issues.apache.org/jira/browse/SPARK-41500 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.2.2 >Reporter: JacobZheng >Priority: Major > > h2. *Describe the bug* > Here is a sql. > {code:sql} > select '2022-02-01'- INTERVAL 1 year > {code} > spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically > and type mismatch happened. > h2. *To Reproduce* > On Spark 3.0.1 using spark-shell > {code:java} > scala> spark.sql("select '2022-02-01'- interval 1 year").show > +--+ > > |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)| > +--+ > | 2021-02-01 00:00:00| > +--+ > {code} > On Spark 3.2.1 using spark-shell > {code:java} > scala> spark.sql("select '2022-02-01'- interval 1 year").show > org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS > DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in > '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval > year).; line 1 pos 7; > 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), > None)] > +- OneRowRelation > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.immutable.List.foreach(List.scala:431) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.immutable.List.map(List.scala:305) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.s
[jira] [Commented] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type
[ https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680278#comment-17680278 ] Apache Spark commented on SPARK-41500: -- User 'NarekDW' has created a pull request for this issue: https://github.com/apache/spark/pull/39720 > auto generate concat as Double when string minus an INTERVAL type > - > > Key: SPARK-41500 > URL: https://issues.apache.org/jira/browse/SPARK-41500 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.2.2 >Reporter: JacobZheng >Priority: Major > > h2. *Describe the bug* > Here is a sql. > {code:sql} > select '2022-02-01'- INTERVAL 1 year > {code} > spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically > and type mismatch happened. > h2. *To Reproduce* > On Spark 3.0.1 using spark-shell > {code:java} > scala> spark.sql("select '2022-02-01'- interval 1 year").show > +--+ > > |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)| > +--+ > | 2021-02-01 00:00:00| > +--+ > {code} > On Spark 3.2.1 using spark-shell > {code:java} > scala> spark.sql("select '2022-02-01'- interval 1 year").show > org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS > DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in > '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval > year).; line 1 pos 7; > 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), > None)] > +- OneRowRelation > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.immutable.List.foreach(List.scala:431) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.immutable.List.map(List.scala:305) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.s
[jira] [Assigned] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type
[ https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41500: Assignee: (was: Apache Spark) > auto generate concat as Double when string minus an INTERVAL type > - > > Key: SPARK-41500 > URL: https://issues.apache.org/jira/browse/SPARK-41500 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.2.2 >Reporter: JacobZheng >Priority: Major > > h2. *Describe the bug* > Here is a sql. > {code:sql} > select '2022-02-01'- INTERVAL 1 year > {code} > spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically > and type mismatch happened. > h2. *To Reproduce* > On Spark 3.0.1 using spark-shell > {code:java} > scala> spark.sql("select '2022-02-01'- interval 1 year").show > +--+ > > |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)| > +--+ > | 2021-02-01 00:00:00| > +--+ > {code} > On Spark 3.2.1 using spark-shell > {code:java} > scala> spark.sql("select '2022-02-01'- interval 1 year").show > org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS > DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in > '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval > year).; line 1 pos 7; > 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), > None)] > +- OneRowRelation > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.immutable.List.foreach(List.scala:431) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.immutable.List.map(List.scala:305) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at > org.apache.spark.sql.catalyst.analysis.Ch
[jira] [Assigned] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type
[ https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41500: Assignee: Apache Spark > auto generate concat as Double when string minus an INTERVAL type > - > > Key: SPARK-41500 > URL: https://issues.apache.org/jira/browse/SPARK-41500 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.2.2 >Reporter: JacobZheng >Assignee: Apache Spark >Priority: Major > > h2. *Describe the bug* > Here is a sql. > {code:sql} > select '2022-02-01'- INTERVAL 1 year > {code} > spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically > and type mismatch happened. > h2. *To Reproduce* > On Spark 3.0.1 using spark-shell > {code:java} > scala> spark.sql("select '2022-02-01'- interval 1 year").show > +--+ > > |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)| > +--+ > | 2021-02-01 00:00:00| > +--+ > {code} > On Spark 3.2.1 using spark-shell > {code:java} > scala> spark.sql("select '2022-02-01'- interval 1 year").show > org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS > DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in > '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval > year).; line 1 pos 7; > 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), > None)] > +- OneRowRelation > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.immutable.List.foreach(List.scala:431) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.immutable.List.map(List.scala:305) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at > org.apache.spark
[jira] [Commented] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)
[ https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680254#comment-17680254 ] Apache Spark commented on SPARK-42169: -- User 'NarekDW' has created a pull request for this issue: https://github.com/apache/spark/pull/39719 > Implement code generation for `to_csv` function (StructsToCsv) > -- > > Key: SPARK-42169 > URL: https://issues.apache.org/jira/browse/SPARK-42169 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Narek Karapetian >Priority: Minor > Labels: csv, sql > Fix For: 3.4.0 > > > Implement code generation for `to_csv` function instead of extending it from > CodegenFallback trait. > {code:java} > org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code} > > This is good to have from performance point of view. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)
[ https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42169: Assignee: Apache Spark > Implement code generation for `to_csv` function (StructsToCsv) > -- > > Key: SPARK-42169 > URL: https://issues.apache.org/jira/browse/SPARK-42169 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Narek Karapetian >Assignee: Apache Spark >Priority: Minor > Labels: csv, sql > Fix For: 3.4.0 > > > Implement code generation for `to_csv` function instead of extending it from > CodegenFallback trait. > {code:java} > org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code} > > This is good to have from performance point of view. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)
[ https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42169: Assignee: (was: Apache Spark) > Implement code generation for `to_csv` function (StructsToCsv) > -- > > Key: SPARK-42169 > URL: https://issues.apache.org/jira/browse/SPARK-42169 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Narek Karapetian >Priority: Minor > Labels: csv, sql > Fix For: 3.4.0 > > > Implement code generation for `to_csv` function instead of extending it from > CodegenFallback trait. > {code:java} > org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code} > > This is good to have from performance point of view. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)
[ https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680251#comment-17680251 ] Apache Spark commented on SPARK-42169: -- User 'NarekDW' has created a pull request for this issue: https://github.com/apache/spark/pull/39097 > Implement code generation for `to_csv` function (StructsToCsv) > -- > > Key: SPARK-42169 > URL: https://issues.apache.org/jira/browse/SPARK-42169 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Narek Karapetian >Priority: Minor > Labels: csv, sql > Fix For: 3.4.0 > > > Implement code generation for `to_csv` function instead of extending it from > CodegenFallback trait. > {code:java} > org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code} > > This is good to have from performance point of view. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42163) Schema pruning fails on non-foldable array index or map key
[ https://issues.apache.org/jira/browse/SPARK-42163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680248#comment-17680248 ] Apache Spark commented on SPARK-42163: -- User 'cashmand' has created a pull request for this issue: https://github.com/apache/spark/pull/39718 > Schema pruning fails on non-foldable array index or map key > --- > > Key: SPARK-42163 > URL: https://issues.apache.org/jira/browse/SPARK-42163 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.3 >Reporter: David Cashman >Priority: Major > > Schema pruning tries to extract selected fields from struct extractors. It > looks through GetArrayItem/GetMapItem, but when doing so, it ignores the > index/key, which may itself be a struct field. If it is a struct field that > is not otherwise selected, and some other field of the same attribute is > selected, then pruning will drop the field, resulting in an optimizer error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42163) Schema pruning fails on non-foldable array index or map key
[ https://issues.apache.org/jira/browse/SPARK-42163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42163: Assignee: Apache Spark > Schema pruning fails on non-foldable array index or map key > --- > > Key: SPARK-42163 > URL: https://issues.apache.org/jira/browse/SPARK-42163 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.3 >Reporter: David Cashman >Assignee: Apache Spark >Priority: Major > > Schema pruning tries to extract selected fields from struct extractors. It > looks through GetArrayItem/GetMapItem, but when doing so, it ignores the > index/key, which may itself be a struct field. If it is a struct field that > is not otherwise selected, and some other field of the same attribute is > selected, then pruning will drop the field, resulting in an optimizer error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42163) Schema pruning fails on non-foldable array index or map key
[ https://issues.apache.org/jira/browse/SPARK-42163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42163: Assignee: (was: Apache Spark) > Schema pruning fails on non-foldable array index or map key > --- > > Key: SPARK-42163 > URL: https://issues.apache.org/jira/browse/SPARK-42163 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.3 >Reporter: David Cashman >Priority: Major > > Schema pruning tries to extract selected fields from struct extractors. It > looks through GetArrayItem/GetMapItem, but when doing so, it ignores the > index/key, which may itself be a struct field. If it is a struct field that > is not otherwise selected, and some other field of the same attribute is > selected, then pruning will drop the field, resulting in an optimizer error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org