[jira] [Updated] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source for filtering
[ https://issues.apache.org/jira/browse/SPARK-46392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiahong.li updated SPARK-46392: --- Summary: In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source for filtering (was: In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source filter) > In Function DataSourceStrategy.translateFilterWithMapping, we need transfer > cast expression to data source for filtering > - > > Key: SPARK-46392 > URL: https://issues.apache.org/jira/browse/SPARK-46392 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiahong.li >Priority: Minor > Labels: pull-request-available > > Considering this Situation: > We create a partition table that created by source which is extends > TableProvider, if we select data from some specific partitions, choose > partition dataType differ from table partition type leads partition can not > be pushed down . > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46402) Add getMessageParameters and getQueryContext support
[ https://issues.apache.org/jira/browse/SPARK-46402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46402: --- Labels: pull-request-available (was: ) > Add getMessageParameters and getQueryContext support > > > Key: SPARK-46402 > URL: https://issues.apache.org/jira/browse/SPARK-46402 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46402) Add getMessageParameters and getQueryContext support
Hyukjin Kwon created SPARK-46402: Summary: Add getMessageParameters and getQueryContext support Key: SPARK-46402 URL: https://issues.apache.org/jira/browse/SPARK-46402 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46394) spark.catalog.listDatabases() fails if containing schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true
[ https://issues.apache.org/jira/browse/SPARK-46394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46394. - Fix Version/s: 4.0.0 Assignee: Xinyi Yu Resolution: Fixed > spark.catalog.listDatabases() fails if containing schemas with special > characters when spark.sql.legacy.keepCommandOutputSchema set to true > --- > > Key: SPARK-46394 > URL: https://issues.apache.org/jira/browse/SPARK-46394 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Xinyi Yu >Assignee: Xinyi Yu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When the SQL conf {{spark.sql.legacy.keepCommandOutputSchema}} is set to true: > Before: > // support there is a xyyu-db-with-hyphen schema in the catalog > spark.catalog.listDatabases() > [INVALID_IDENTIFIER] The identifier xyyu-db-with-hyphen is invalid. Please, > consider quoting it with back-quotes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232
[ https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796561#comment-17796561 ] Yang Jie commented on SPARK-42307: -- Sorry, I mistakenly responded to this Jira, please ignore the previous message. > Assign name to _LEGACY_ERROR_TEMP_2232 > -- > > Key: SPARK-42307 > URL: https://issues.apache.org/jira/browse/SPARK-42307 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44172) Use Jackson API Instead of Json4s
[ https://issues.apache.org/jira/browse/SPARK-44172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796560#comment-17796560 ] Yang Jie edited comment on SPARK-44172 at 12/14/23 5:30 AM: [~hannahkamundson] Thank you very much for your interest in this work, but it would be best to initiate a formal discussion in the dev mailing list before starting work. A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], but didn't get much of a response. I created this Jira, but I'm also not sure if now is the right time to drop the dependency on Json4s. was (Author: luciferyang): [~hannahkamundson] Thank you very much for your interest in this ticket, but it would be best to initiate a formal discussion in the dev mailing list before starting work. A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], but didn't get much of a response. I created this Jira, but I'm also not sure if now is the right time to drop the dependency on Json4s. > Use Jackson API Instead of Json4s > - > > Key: SPARK-44172 > URL: https://issues.apache.org/jira/browse/SPARK-44172 > Project: Spark > Issue Type: Sub-task > Components: Connect, MLlib, Spark Core, SQL, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232
[ https://issues.apache.org/jira/browse/SPARK-42307 ] Yang Jie deleted comment on SPARK-42307: -- was (Author: luciferyang): [~hannahkamundson] Thank you very much for your interest in this ticket, but it would be best to initiate a formal discussion in the dev mailing list before starting work. A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], but didn't get much of a response. I created this Jira, but I'm also not sure if now is the right time to drop the dependency on Json4s. > Assign name to _LEGACY_ERROR_TEMP_2232 > -- > > Key: SPARK-42307 > URL: https://issues.apache.org/jira/browse/SPARK-42307 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44172) Use Jackson API Instead of Json4s
[ https://issues.apache.org/jira/browse/SPARK-44172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796560#comment-17796560 ] Yang Jie commented on SPARK-44172: -- [~hannahkamundson] Thank you very much for your interest in this ticket, but it would be best to initiate a formal discussion in the dev mailing list before starting work. A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], but didn't get much of a response. I created this Jira, but I'm also not sure if now is the right time to drop the dependency on Json4s. > Use Jackson API Instead of Json4s > - > > Key: SPARK-44172 > URL: https://issues.apache.org/jira/browse/SPARK-44172 > Project: Spark > Issue Type: Sub-task > Components: Connect, MLlib, Spark Core, SQL, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232
[ https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796559#comment-17796559 ] Yang Jie edited comment on SPARK-42307 at 12/14/23 5:30 AM: [~hannahkamundson] Thank you very much for your interest in this ticket, but it would be best to initiate a formal discussion in the dev mailing list before starting work. A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], but didn't get much of a response. I created this Jira, but I'm also not sure if now is the right time to drop the dependency on Json4s. was (Author: luciferyang): [~hannahkamundson] Thank you very much for your interest in this ticket, but it would be best to initiate a formal discussion in the dev mailing list before starting work. A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], but didn't get much of a response. I created this Jira, but I'm also not sure if now is the right time to drop the dependency on Json4s. > Assign name to _LEGACY_ERROR_TEMP_2232 > -- > > Key: SPARK-42307 > URL: https://issues.apache.org/jira/browse/SPARK-42307 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232
[ https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796559#comment-17796559 ] Yang Jie commented on SPARK-42307: -- [~hannahkamundson] Thank you very much for your interest in this ticket, but it would be best to initiate a formal discussion in the dev mailing list before starting work. A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], but didn't get much of a response. I created this Jira, but I'm also not sure if now is the right time to drop the dependency on Json4s. > Assign name to _LEGACY_ERROR_TEMP_2232 > -- > > Key: SPARK-42307 > URL: https://issues.apache.org/jira/browse/SPARK-42307 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
[ https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46401: --- Labels: pull-request-available (was: ) > Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0` > -- > > Key: SPARK-46401 > URL: https://issues.apache.org/jira/browse/SPARK-46401 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
Yang Jie created SPARK-46401: Summary: Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0` Key: SPARK-46401 URL: https://issues.apache.org/jira/browse/SPARK-46401 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46032: --- Labels: pull-request-available (was: ) > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > Labels: pull-request-available > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.readOb
[jira] [Updated] (SPARK-46384) Structured Streaming UI doesn't display graph correctly
[ https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46384: --- Labels: pull-request-available (was: ) > Structured Streaming UI doesn't display graph correctly > --- > > Key: SPARK-46384 > URL: https://issues.apache.org/jira/browse/SPARK-46384 > Project: Spark > Issue Type: Task > Components: Structured Streaming, Web UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > > The Streaming UI is broken currently at spark master. Running a simple query: > ``` > q = > spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start() > ``` > Would make the spark UI shows empty graph for "operation duration": > !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8! > Here is the error: > !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20! > > I verified the same query runs fine on spark 3.5, as in the following graph. > !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI! > > This should be a problem from the library updates, this could be a potential > source of error: [https://github.com/apache/spark/pull/42879] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46357) Replace use of setConf with conf.set in docs
[ https://issues.apache.org/jira/browse/SPARK-46357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46357. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44290 [https://github.com/apache/spark/pull/44290] > Replace use of setConf with conf.set in docs > > > Key: SPARK-46357 > URL: https://issues.apache.org/jira/browse/SPARK-46357 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46357) Replace use of setConf with conf.set in docs
[ https://issues.apache.org/jira/browse/SPARK-46357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-46357: Assignee: Nicholas Chammas > Replace use of setConf with conf.set in docs > > > Key: SPARK-46357 > URL: https://issues.apache.org/jira/browse/SPARK-46357 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46400) When there are corrupted files in the local maven repo, retry to skip this cache
[ https://issues.apache.org/jira/browse/SPARK-46400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-46400: Summary: When there are corrupted files in the local maven repo, retry to skip this cache (was: When there are corrupted files in the local maven repo, try to skip this cache) > When there are corrupted files in the local maven repo, retry to skip this > cache > > > Key: SPARK-46400 > URL: https://issues.apache.org/jira/browse/SPARK-46400 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46400) When there are corrupted files in the local maven repo, try to skip this cache
[ https://issues.apache.org/jira/browse/SPARK-46400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46400: --- Labels: pull-request-available (was: ) > When there are corrupted files in the local maven repo, try to skip this cache > -- > > Key: SPARK-46400 > URL: https://issues.apache.org/jira/browse/SPARK-46400 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46400) When there are corrupted files in the local maven repo, try to skip this cache
BingKun Pan created SPARK-46400: --- Summary: When there are corrupted files in the local maven repo, try to skip this cache Key: SPARK-46400 URL: https://issues.apache.org/jira/browse/SPARK-46400 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46399) Add exit status to the Application End event for the use of Spark Listener
[ https://issues.apache.org/jira/browse/SPARK-46399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46399: --- Labels: pull-request-available (was: ) > Add exit status to the Application End event for the use of Spark Listener > -- > > Key: SPARK-46399 > URL: https://issues.apache.org/jira/browse/SPARK-46399 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Reza Safi >Priority: Minor > Labels: pull-request-available > > Currently SparkListenerApplicationEnd only has a timestamp value and there is > not exit status recorded with it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46399) Add exit status to the Application End event for the use of Spark Listener
Reza Safi created SPARK-46399: - Summary: Add exit status to the Application End event for the use of Spark Listener Key: SPARK-46399 URL: https://issues.apache.org/jira/browse/SPARK-46399 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.5.0 Reporter: Reza Safi Currently SparkListenerApplicationEnd only has a timestamp value and there is not exit status recorded with it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)
[ https://issues.apache.org/jira/browse/SPARK-46398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46398: --- Labels: pull-request-available (was: ) > Test rangeBetween window function (pyspark.sql.window) > -- > > Key: SPARK-46398 > URL: https://issues.apache.org/jira/browse/SPARK-46398 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)
Xinrong Meng created SPARK-46398: Summary: Test rangeBetween window function (pyspark.sql.window) Key: SPARK-46398 URL: https://issues.apache.org/jira/browse/SPARK-46398 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46397) sha2(df.a, 1024) throws a different exception in Spark Connect
Hyukjin Kwon created SPARK-46397: Summary: sha2(df.a, 1024) throws a different exception in Spark Connect Key: SPARK-46397 URL: https://issues.apache.org/jira/browse/SPARK-46397 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} from pyspark.sql import functions as sf spark.range(1).select(sf.sha2(sf.col("id"), 1024)).collect() {code} Non-connect: {code} ... pyspark.errors.exceptions.captured.IllegalArgumentException: requirement failed: numBits 1024 is not in the permitted values (0, 224, 256, 384, 512) {code} Connect: {code} ... pyspark.errors.exceptions.connect.AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "sha2(id, 1024)" due to data type mismatch: Parameter 1 requires the "BINARY" type, however "id" has the type "BIGINT". SQLSTATE: 42K09; 'Project [unresolvedalias(sha2(id#1L, 1024))] +- Range (0, 1, step=1, splits=Some(1)) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46396) LegacyFastTimestampFormatter.parseOptional should not throw exception
[ https://issues.apache.org/jira/browse/SPARK-46396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46396: --- Labels: pull-request-available (was: ) > LegacyFastTimestampFormatter.parseOptional should not throw exception > - > > Key: SPARK-46396 > URL: https://issues.apache.org/jira/browse/SPARK-46396 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > When setting spark.sql.legacy.timeParserPolicy=LEGACY, Spark will use the > LegacyFastTimestampFormatter to infer potential timestamp columns. The > inference shouldn't throw exception. > However, when the input is 23012150952, there is exception: > ``` > For input string: "23012150952" > java.lang.NumberFormatException: For input string: "23012150952" > at > java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) > at java.base/java.lang.Integer.parseInt(Integer.java:668) > at java.base/java.lang.Integer.parseInt(Integer.java:786) > at > org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304) > at > org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045) > at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651) > at > org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46396) LegacyFastTimestampFormatter.parseOptional should not throw exception
Gengliang Wang created SPARK-46396: -- Summary: LegacyFastTimestampFormatter.parseOptional should not throw exception Key: SPARK-46396 URL: https://issues.apache.org/jira/browse/SPARK-46396 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Gengliang Wang Assignee: Gengliang Wang When setting spark.sql.legacy.timeParserPolicy=LEGACY, Spark will use the LegacyFastTimestampFormatter to infer potential timestamp columns. The inference shouldn't throw exception. However, when the input is 23012150952, there is exception: ``` For input string: "23012150952" java.lang.NumberFormatException: For input string: "23012150952" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) at java.base/java.lang.Integer.parseInt(Integer.java:668) at java.base/java.lang.Integer.parseInt(Integer.java:786) at org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304) at org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045) at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651) at org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46395) Automatically generate SQL configuration tables for documentation
Nicholas Chammas created SPARK-46395: Summary: Automatically generate SQL configuration tables for documentation Key: SPARK-46395 URL: https://issues.apache.org/jira/browse/SPARK-46395 Project: Spark Issue Type: Improvement Components: Documentation, SQL Affects Versions: 3.5.0 Reporter: Nicholas Chammas -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232
[ https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42307: --- Labels: pull-request-available (was: ) > Assign name to _LEGACY_ERROR_TEMP_2232 > -- > > Key: SPARK-42307 > URL: https://issues.apache.org/jira/browse/SPARK-42307 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232
[ https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796453#comment-17796453 ] Hannah Amundson commented on SPARK-42307: - I'll work on this > Assign name to _LEGACY_ERROR_TEMP_2232 > -- > > Key: SPARK-42307 > URL: https://issues.apache.org/jira/browse/SPARK-42307 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46393) Classify exceptions in the JDBC table catalog
[ https://issues.apache.org/jira/browse/SPARK-46393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46393: --- Labels: pull-request-available (was: ) > Classify exceptions in the JDBC table catalog > - > > Key: SPARK-46393 > URL: https://issues.apache.org/jira/browse/SPARK-46393 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Handle exceptions from JDBC drivers and convert them to AnalysisException > with error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46385) Test aggregate functions for groups (pyspark.sql.group)
[ https://issues.apache.org/jira/browse/SPARK-46385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46385: Assignee: Xinrong Meng > Test aggregate functions for groups (pyspark.sql.group) > --- > > Key: SPARK-46385 > URL: https://issues.apache.org/jira/browse/SPARK-46385 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46385) Test aggregate functions for groups (pyspark.sql.group)
[ https://issues.apache.org/jira/browse/SPARK-46385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46385. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44322 [https://github.com/apache/spark/pull/44322] > Test aggregate functions for groups (pyspark.sql.group) > --- > > Key: SPARK-46385 > URL: https://issues.apache.org/jira/browse/SPARK-46385 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46394) spark.catalog.listDatabases() fails if containing schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true
[ https://issues.apache.org/jira/browse/SPARK-46394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46394: --- Labels: pull-request-available (was: ) > spark.catalog.listDatabases() fails if containing schemas with special > characters when spark.sql.legacy.keepCommandOutputSchema set to true > --- > > Key: SPARK-46394 > URL: https://issues.apache.org/jira/browse/SPARK-46394 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Xinyi Yu >Priority: Major > Labels: pull-request-available > > When the SQL conf {{spark.sql.legacy.keepCommandOutputSchema}} is set to true: > Before: > // support there is a xyyu-db-with-hyphen schema in the catalog > spark.catalog.listDatabases() > [INVALID_IDENTIFIER] The identifier xyyu-db-with-hyphen is invalid. Please, > consider quoting it with back-quotes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46153) XML: Add TimestampNTZType support
[ https://issues.apache.org/jira/browse/SPARK-46153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46153. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44329 [https://github.com/apache/spark/pull/44329] > XML: Add TimestampNTZType support > - > > Key: SPARK-46153 > URL: https://issues.apache.org/jira/browse/SPARK-46153 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46153) XML: Add TimestampNTZType support
[ https://issues.apache.org/jira/browse/SPARK-46153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46153: Assignee: Sandip Agarwala > XML: Add TimestampNTZType support > - > > Key: SPARK-46153 > URL: https://issues.apache.org/jira/browse/SPARK-46153 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-38465) Use error classes in org.apache.spark.launcher
[ https://issues.apache.org/jira/browse/SPARK-38465 ] Hannah Amundson deleted comment on SPARK-38465: - was (Author: hannahkamundson): I will work on this since it has been a while since the last comment > Use error classes in org.apache.spark.launcher > -- > > Key: SPARK-38465 > URL: https://issues.apache.org/jira/browse/SPARK-38465 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46394) spark.catalog.listDatabases() fails if containing schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true
Xinyi Yu created SPARK-46394: Summary: spark.catalog.listDatabases() fails if containing schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true Key: SPARK-46394 URL: https://issues.apache.org/jira/browse/SPARK-46394 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Xinyi Yu When the SQL conf {{spark.sql.legacy.keepCommandOutputSchema}} is set to true: Before: // support there is a xyyu-db-with-hyphen schema in the catalog spark.catalog.listDatabases() [INVALID_IDENTIFIER] The identifier xyyu-db-with-hyphen is invalid. Please, consider quoting it with back-quotes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38465) Use error classes in org.apache.spark.launcher
[ https://issues.apache.org/jira/browse/SPARK-38465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796412#comment-17796412 ] Hannah Amundson commented on SPARK-38465: - I will work on this since it has been a while since the last comment > Use error classes in org.apache.spark.launcher > -- > > Key: SPARK-38465 > URL: https://issues.apache.org/jira/browse/SPARK-38465 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46373) Create DataFrame Bug
[ https://issues.apache.org/jira/browse/SPARK-46373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796385#comment-17796385 ] Bruce Robbins commented on SPARK-46373: --- Maybe due to this (from [the docs|https://spark.apache.org/docs/3.5.0/]): {quote}Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.8+, and R 3.5+.{quote} Scala 3 is not listed as a supported version. > Create DataFrame Bug > > > Key: SPARK-46373 > URL: https://issues.apache.org/jira/browse/SPARK-46373 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bleibtreu >Priority: Major > > Scala version is 3.3.1 > Spark version is 3.5.0 > I am using spark-core 3.5.1. I am trying to create a DataFrame through the > reflection api, but "No TypeTag available for Person" will appear. I have > tried for a long time, but I still don't quite understand why TypeTag cannot > recognize my Person case class. > {code:java} > import sparkSession.implicits._ > import scala.reflect.runtime.universe._ > case class Person(name: String) > val a = List(Person("A"), Person("B"), Person("C")) > val df = sparkSession.createDataFrame(a) > df.show(){code} > !https://media.discordapp.net/attachments/839723072239566878/1183747749204725821/image.png?ex=65897600&is=65770100&hm=4eeba8d8499499439590a34260f8b441c6594c572c545f5f61f8dc65beeb6a4b&=&format=webp&quality=lossless&width=1178&height=142! > I tested it and it is indeed a problem unique to Scala3 > There is no problem on Scala2.13 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42332) Convert `require` in `ComplexTypeMergingExpression.dataTypeCheck` to an error class
[ https://issues.apache.org/jira/browse/SPARK-42332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42332: --- Labels: pull-request-available (was: ) > Convert `require` in `ComplexTypeMergingExpression.dataTypeCheck` to an error > class > --- > > Key: SPARK-42332 > URL: https://issues.apache.org/jira/browse/SPARK-42332 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/spark/pull/39851#pullrequestreview-1280621488 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46054) SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-46054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796381#comment-17796381 ] Soam Acharya commented on SPARK-46054: -- +1. As our recent [AWS re:Invent talk|https://www.youtube.com/watch?v=G9aNXEu_a8k] indicates, Pinterest relies heavily on the Spark Operator for our Spark on EKS platform. We would love to see continued investment in the Spark Operator and are happy to contribute potential changes we make to the codebase back to the community. We believe the proposed setup is a good vehicle for us to keep moving forward with the Spark Operator. > SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator > -- > > Key: SPARK-46054 > URL: https://issues.apache.org/jira/browse/SPARK-46054 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Vara Bonthu >Priority: Minor > > *Description:* > This proposal aims to recommend the adoption of [Google's Spark K8s > Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the > official Spark Operator for the Apache Spark community. The operator has > gained significant traction among many users and organizations and used > heavily in production environments, but challenges related to maintenance and > governance necessitate this recommendation. > *Background:* > * Google's Spark K8s Operator is currently in use by hundreds of users and > organizations. However, due to maintenance issues, many of these users and > organizations have resorted to forking the repository and implementing their > own fixes. > * The project boasts an impressive user base with 167 contributors, 2.5k > likes, and endorsements from 45 organizations, as documented in the "Who is > using" document. Notably, there are many more organizations using it than the > initially reported 45. > * The primary issue at hand is that this project resides under the > GoogleCloudPlatform GitHub organization and is exclusively moderated by a > Google employee. Concerns have been raised by numerous users and customers > regarding the maintenance of the repository. > * The existing Google maintainers are constrained by limitations in terms of > time and support, which negatively impacts both the project and its user > community. > > *Recent Developments:* > * During Kubecon Chicago 2023, AWS OSS Architects (Vara Bonthu) and the > Apple infrastructure team engaged in discussions with the Google's team, > specifically with Marcin Wielgus. They expressed their interest in > contributing the project to either the Kubeflow or Apache Spark community. > * *{color:#00875a}Marcin from Google confirmed their willingness to donate > the project to either of these communities.{color}* > * An adoption process has been initiated by the Kubeflow project under CNCF, > as documented in the following thread: [Link to the > thread|https://github.com/kubeflow/community/issues/648]. > > *Primary Goal:* > * The primary goal is to ensure the collaborative support and adoption of > Google's Spark Operator by the Apache Spark , thereby avoiding the > development of redundant tools and reducing confusion among users. > *Next Steps:* > * *Meeting with Apache Spark Working Group Maintainers:* We propose > arranging a meeting with the Apache Spark working group maintainers to delve > deeper into this matter, address any questions or concerns they may have, and > collectively work towards a decision. > * *Establish a New Working Group:* Upon reaching an agreement, we intend to > create a new working group comprising members from diverse organizations who > are willing to contribute and collaborate on this initiative. > * *Repository Transfer:* Our plan involves transferring the project > repository from Google's organization to either the Apache or Kubeflow > organization, aligning with the chosen community. > * *Roadmap Development:* We will formulate a new roadmap that encompasses > immediate issue resolution and a long-term design strategy aimed at enhancing > performance, scalability, and security for this tool. > > We believe that working towards one Spark Operator will benefit the Apache > Spark community and address the current maintenance challenges. Your feedback > and support in this matter are highly valued. Let's collaborate to ensure a > robust and well-maintained Spark Operator for the Apache Spark community's > benefit. > *Community members are encouraged to leave their comments or give a thumbs-up > to express their support for adopting Google's Spark Operator as the official > Apache Spark operator.* > > *Proposed Authors* > Vara Bonthu (AWS) > Marcin Wielgus (Google) > --
[jira] [Created] (SPARK-46393) Classify exceptions in the JDBC table catalog
Max Gekk created SPARK-46393: Summary: Classify exceptions in the JDBC table catalog Key: SPARK-46393 URL: https://issues.apache.org/jira/browse/SPARK-46393 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Handle exceptions from JDBC drivers and convert them to AnalysisException with error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42332) Convert `require` in `ComplexTypeMergingExpression.dataTypeCheck` to an error class
[ https://issues.apache.org/jira/browse/SPARK-42332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796360#comment-17796360 ] Hannah Amundson commented on SPARK-42332: - I'll work on this! > Convert `require` in `ComplexTypeMergingExpression.dataTypeCheck` to an error > class > --- > > Key: SPARK-42332 > URL: https://issues.apache.org/jira/browse/SPARK-42332 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Major > > See https://github.com/apache/spark/pull/39851#pullrequestreview-1280621488 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-45190) XML: StructType schema issue in pyspark connect
[ https://issues.apache.org/jira/browse/SPARK-45190 ] Hannah Amundson deleted comment on SPARK-45190: - was (Author: hannahkamundson): Hello everyone (and [~sandip.agarwala] ), I am a graduate student at the University of Texas (Computer Science). I have a project in my Distributed Systems course to contribute to an open source distributed project. Would it be okay if I worked on this ticket? Thanks for your help, Hannah > XML: StructType schema issue in pyspark connect > --- > > Key: SPARK-45190 > URL: https://issues.apache.org/jira/browse/SPARK-45190 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > > The following PR added support for from_xml to pyspark. > https://github.com/apache/spark/pull/42938 > > However, StructType schema is resulting in schema parse error for pyspark > connect. > Filing a Jira to track this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-45190) XML: StructType schema issue in pyspark connect
[ https://issues.apache.org/jira/browse/SPARK-45190 ] Hannah Amundson deleted comment on SPARK-45190: - was (Author: hannahkamundson): I have started on this ticket > XML: StructType schema issue in pyspark connect > --- > > Key: SPARK-45190 > URL: https://issues.apache.org/jira/browse/SPARK-45190 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > > The following PR added support for from_xml to pyspark. > https://github.com/apache/spark/pull/42938 > > However, StructType schema is resulting in schema parse error for pyspark > connect. > Filing a Jira to track this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source filter
[ https://issues.apache.org/jira/browse/SPARK-46392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46392: --- Labels: pull-request-available (was: ) > In Function DataSourceStrategy.translateFilterWithMapping, we need transfer > cast expression to data source filter > -- > > Key: SPARK-46392 > URL: https://issues.apache.org/jira/browse/SPARK-46392 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiahong.li >Priority: Minor > Labels: pull-request-available > > Considering this Situation: > We create a partition table that created by source which is extends > TableProvider, if we select data from some specific partitions, choose > partition dataType differ from table partition type leads partition can not > be pushed down . > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source filter
[ https://issues.apache.org/jira/browse/SPARK-46392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiahong.li updated SPARK-46392: --- Description: Considering this Situation: We create a partition table that created by source which is extends TableProvider, if we select data from some specific partitions, choose partition dataType differ from table partition type leads partition can not be pushed down . was: Considering this Situation: We create a partition table that created by source which is extends TableProvider, if we select data from some specific partitions, choose partition dataType differ from table partition type leads partition can not be pushed down > In Function DataSourceStrategy.translateFilterWithMapping, we need transfer > cast expression to data source filter > -- > > Key: SPARK-46392 > URL: https://issues.apache.org/jira/browse/SPARK-46392 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiahong.li >Priority: Minor > > Considering this Situation: > We create a partition table that created by source which is extends > TableProvider, if we select data from some specific partitions, choose > partition dataType differ from table partition type leads partition can not > be pushed down . > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source filter
jiahong.li created SPARK-46392: -- Summary: In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source filter Key: SPARK-46392 URL: https://issues.apache.org/jira/browse/SPARK-46392 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: jiahong.li Considering this Situation: We create a partition table that created by source which is extends TableProvider, if we select data from some specific partitions, choose partition dataType differ from table partition type leads partition can not be pushed down -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46391) Reorganize `ExpandingParityTests`
[ https://issues.apache.org/jira/browse/SPARK-46391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46391: --- Labels: pull-request-available (was: ) > Reorganize `ExpandingParityTests` > - > > Key: SPARK-46391 > URL: https://issues.apache.org/jira/browse/SPARK-46391 > Project: Spark > Issue Type: Test > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46391) Reorganize `ExpandingParityTests`
Ruifeng Zheng created SPARK-46391: - Summary: Reorganize `ExpandingParityTests` Key: SPARK-46391 URL: https://issues.apache.org/jira/browse/SPARK-46391 Project: Spark Issue Type: Test Components: PS, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46388) HiveAnalysis misses pattern guard `query.resolved`
[ https://issues.apache.org/jira/browse/SPARK-46388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46388. -- Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 44326 [https://github.com/apache/spark/pull/44326] > HiveAnalysis misses pattern guard `query.resolved` > -- > > Key: SPARK-46388 > URL: https://issues.apache.org/jira/browse/SPARK-46388 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46388) HiveAnalysis misses pattern guard `query.resolved`
[ https://issues.apache.org/jira/browse/SPARK-46388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-46388: Assignee: Kent Yao > HiveAnalysis misses pattern guard `query.resolved` > -- > > Key: SPARK-46388 > URL: https://issues.apache.org/jira/browse/SPARK-46388 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46384) Structured Streaming UI doesn't display graph correctly
[ https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796103#comment-17796103 ] Kent Yao commented on SPARK-46384: -- I will take a look asap > Structured Streaming UI doesn't display graph correctly > --- > > Key: SPARK-46384 > URL: https://issues.apache.org/jira/browse/SPARK-46384 > Project: Spark > Issue Type: Task > Components: Structured Streaming, Web UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > > The Streaming UI is broken currently at spark master. Running a simple query: > ``` > q = > spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start() > ``` > Would make the spark UI shows empty graph for "operation duration": > !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8! > Here is the error: > !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20! > > I verified the same query runs fine on spark 3.5, as in the following graph. > !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI! > > This should be a problem from the library updates, this could be a potential > source of error: [https://github.com/apache/spark/pull/42879] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46381) Migrate sub-classes of AnalysisException to error classes
[ https://issues.apache.org/jira/browse/SPARK-46381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46381. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44314 [https://github.com/apache/spark/pull/44314] > Migrate sub-classes of AnalysisException to error classes > - > > Key: SPARK-46381 > URL: https://issues.apache.org/jira/browse/SPARK-46381 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, there are a few sub-classes of AnalysisException haven't ported on > error classes yet: > - NonEmptyNamespaceException > - UnresolvedException > - ExtendedAnalysisException > This ticket aims to migrate two of them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API
[ https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45807: -- Assignee: Eduard Tudenhoefner (was: Apache Spark) > DataSourceV2: Improve ViewCatalog API > - > > Key: SPARK-45807 > URL: https://issues.apache.org/jira/browse/SPARK-45807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Eduard Tudenhoefner >Assignee: Eduard Tudenhoefner >Priority: Major > Labels: pull-request-available > > The goal is to add createOrReplaceView(..) and replaceView(..) methods to the > ViewCatalog API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API
[ https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45807: -- Assignee: Apache Spark (was: Eduard Tudenhoefner) > DataSourceV2: Improve ViewCatalog API > - > > Key: SPARK-45807 > URL: https://issues.apache.org/jira/browse/SPARK-45807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Eduard Tudenhoefner >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > The goal is to add createOrReplaceView(..) and replaceView(..) methods to the > ViewCatalog API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46390) Update the Example page
Ruifeng Zheng created SPARK-46390: - Summary: Update the Example page Key: SPARK-46390 URL: https://issues.apache.org/jira/browse/SPARK-46390 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API
[ https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45807: -- Assignee: Eduard Tudenhoefner (was: Apache Spark) > DataSourceV2: Improve ViewCatalog API > - > > Key: SPARK-45807 > URL: https://issues.apache.org/jira/browse/SPARK-45807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Eduard Tudenhoefner >Assignee: Eduard Tudenhoefner >Priority: Major > Labels: pull-request-available > > The goal is to add createOrReplaceView(..) and replaceView(..) methods to the > ViewCatalog API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46381) Migrate sub-classes of AnalysisException to error classes
[ https://issues.apache.org/jira/browse/SPARK-46381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46381: -- Assignee: Apache Spark (was: Max Gekk) > Migrate sub-classes of AnalysisException to error classes > - > > Key: SPARK-46381 > URL: https://issues.apache.org/jira/browse/SPARK-46381 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Currently, there are a few sub-classes of AnalysisException haven't ported on > error classes yet: > - NonEmptyNamespaceException > - UnresolvedException > - ExtendedAnalysisException > This ticket aims to migrate two of them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API
[ https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45807: -- Assignee: Apache Spark (was: Eduard Tudenhoefner) > DataSourceV2: Improve ViewCatalog API > - > > Key: SPARK-45807 > URL: https://issues.apache.org/jira/browse/SPARK-45807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Eduard Tudenhoefner >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > The goal is to add createOrReplaceView(..) and replaceView(..) methods to the > ViewCatalog API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API
[ https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45807: -- Assignee: Apache Spark (was: Eduard Tudenhoefner) > DataSourceV2: Improve ViewCatalog API > - > > Key: SPARK-45807 > URL: https://issues.apache.org/jira/browse/SPARK-45807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Eduard Tudenhoefner >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > The goal is to add createOrReplaceView(..) and replaceView(..) methods to the > ViewCatalog API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45959: -- Assignee: Apache Spark > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46246: -- Assignee: Apache Spark > Support EXECUTE IMMEDIATE sytax in Spark SQL > > > Key: SPARK-46246 > URL: https://issues.apache.org/jira/browse/SPARK-46246 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Milan Stefanovic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries > from within SQL. > This API executes query passed as string with arguments. > Other DBs that support this: > * > [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm] > * > [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate] > * > [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46246: -- Assignee: (was: Apache Spark) > Support EXECUTE IMMEDIATE sytax in Spark SQL > > > Key: SPARK-46246 > URL: https://issues.apache.org/jira/browse/SPARK-46246 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Milan Stefanovic >Priority: Major > Labels: pull-request-available > > Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries > from within SQL. > This API executes query passed as string with arguments. > Other DBs that support this: > * > [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm] > * > [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate] > * > [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45959: -- Assignee: (was: Apache Spark) > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Minor > Labels: pull-request-available > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46246: -- Assignee: (was: Apache Spark) > Support EXECUTE IMMEDIATE sytax in Spark SQL > > > Key: SPARK-46246 > URL: https://issues.apache.org/jira/browse/SPARK-46246 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Milan Stefanovic >Priority: Major > Labels: pull-request-available > > Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries > from within SQL. > This API executes query passed as string with arguments. > Other DBs that support this: > * > [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm] > * > [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate] > * > [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46246: -- Assignee: Apache Spark > Support EXECUTE IMMEDIATE sytax in Spark SQL > > > Key: SPARK-46246 > URL: https://issues.apache.org/jira/browse/SPARK-46246 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Milan Stefanovic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries > from within SQL. > This API executes query passed as string with arguments. > Other DBs that support this: > * > [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm] > * > [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate] > * > [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46365) Spark 3.5.0 Regression: Window Function Combination Yields Null Values
[ https://issues.apache.org/jira/browse/SPARK-46365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796054#comment-17796054 ] Josh Rosen commented on SPARK-46365: I think that this is a duplicate of SPARK-45543, which is fixed in the forthcoming Spark 3.5.1. I figured this out by running `git bisect` in `branch-3.5` with the above reproduction. > Spark 3.5.0 Regression: Window Function Combination Yields Null Values > --- > > Key: SPARK-46365 > URL: https://issues.apache.org/jira/browse/SPARK-46365 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Boris PEREVALOV >Priority: Major > > When combining two window functions (first one to get the previous non-null > value, second one to get the latest rows only), the result is not correct > since version 3.5.0. > > Here is a simple Scala example: > > {code:java} > import org.apache.spark.sql.expressions.Window > import org.apache.spark.sql.functions._ > > case class Event(timestamp: Long, id: Long, value: String) > > val events = Seq( > Event(timestamp = 1702289001, id = 1 , value = "non-null value"), > Event(timestamp = 1702289002, id = 1 , value = "new non-null value"), > Event(timestamp = 1702289003, id = 1 , value = null), > Event(timestamp = 1702289004, id = 2 , value = "non-null value"), > Event(timestamp = 1702289005, id = 2 , value = null), > ).toDF > > val window = Window.partitionBy("id").orderBy($"timestamp".desc) > > val eventsWithLatestNonNullValue = events > .withColumn( > "value", > first("value", ignoreNulls = true) over > window.rangeBetween(Window.currentRow, Window.unboundedFollowing) > ) > > eventsWithLatestNonNullValue.show > > val latestEvents = eventsWithLatestNonNullValue > .withColumn("n", row_number over window) > .where("n = 1") > .drop("n") > > latestEvents.show > {code} > > > Current result (Spark 3.5.0) > > {code:java} > +--+---+-+ > | timestamp| id|value| > +--+---+-+ > |1702289003| 1| NULL| > |1702289005| 2| NULL| > +--+---+-+ > {code} > > > Expected result (all versions > 3.5.0): > > {code:java} > +--+---+--+ > | timestamp| id| value| > +--+---+--+ > |1702289003| 1|new non-null value| > |1702289005| 2| non-null value| > +--+---+--+ > {code} > > Execution plans are different. > > Spark 3.5.0: > > {code:java} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Project [timestamp#1856L, id#1857L, value#1867] > +- Filter (n#1887 = 1) > +- Window [first(value#1858, true) windowspecdefinition(id#1857L, > timestamp#1856L DESC NULLS LAST, specifiedwindowframe(RangeFrame, > currentrow$(), unboundedfollowing$())) AS value#1867, row_number() > windowspecdefinition(id#1857L, timestamp#1856L DESC NULLS LAST, > specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS > n#1887], [id#1857L], [timestamp#1856L DESC NULLS LAST] > +- WindowGroupLimit [id#1857L], [timestamp#1856L DESC NULLS LAST], > row_number(), 1, Final > +- Sort [id#1857L ASC NULLS FIRST, timestamp#1856L DESC NULLS > LAST], false, 0 > +- Exchange hashpartitioning(id#1857L, 200), > ENSURE_REQUIREMENTS, [plan_id=326] > +- WindowGroupLimit [id#1857L], [timestamp#1856L DESC NULLS > LAST], row_number(), 1, Partial > +- Sort [id#1857L ASC NULLS FIRST, timestamp#1856L DESC > NULLS LAST], false, 0 > +- LocalTableScan [timestamp#1856L, id#1857L, > value#1858] > {code} > > Spark 3.4.0: > > {code:java} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Project [timestamp#6L, id#7L, value#17] > +- Filter (n#37 = 1) > +- Window [first(value#8, true) windowspecdefinition(id#7L, > timestamp#6L DESC NULLS LAST, specifiedwindowframe(RangeFrame, currentrow$(), > unboundedfollowing$())) AS value#17, row_number() windowspecdefinition(id#7L, > timestamp#6L DESC NULLS LAST, specifiedwindowframe(RowFrame, > unboundedpreceding$(), currentrow$())) AS n#37], [id#7L], [timestamp#6L DESC > NULLS LAST] > +- Sort [id#7L ASC NULLS FIRST, timestamp#6L DESC NULLS LAST], > false, 0 > +- Exchange hashpartitioning(id#7L, 200), ENSURE_REQUIREMENTS, > [plan_id=60] > +- LocalTableScan [timestamp#6L, id#7L, value#8] > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org