[jira] [Created] (SPARK-48982) [GO] Extract Spark Exceptions from GRPC response
Martin Grund created SPARK-48982: Summary: [GO] Extract Spark Exceptions from GRPC response Key: SPARK-48982 URL: https://issues.apache.org/jira/browse/SPARK-48982 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.1 Reporter: Martin Grund Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48951) Column and Function Support for Go
[ https://issues.apache.org/jira/browse/SPARK-48951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grund reassigned SPARK-48951: Assignee: Martin Grund > Column and Function Support for Go > -- > > Key: SPARK-48951 > URL: https://issues.apache.org/jira/browse/SPARK-48951 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.1 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > > Support Column & Function feature parity in Go client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48951) Column and Function Support for Go
[ https://issues.apache.org/jira/browse/SPARK-48951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grund resolved SPARK-48951. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 35 [https://github.com/apache/spark-connect-go/pull/35] > Column and Function Support for Go > -- > > Key: SPARK-48951 > URL: https://issues.apache.org/jira/browse/SPARK-48951 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.1 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Support Column & Function feature parity in Go client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48951) Column and Function Support for Go
Martin Grund created SPARK-48951: Summary: Column and Function Support for Go Key: SPARK-48951 URL: https://issues.apache.org/jira/browse/SPARK-48951 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.1 Reporter: Martin Grund Support Column & Function feature parity in Go client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48756) Add support for `df.debug()`
Martin Grund created SPARK-48756: Summary: Add support for `df.debug()` Key: SPARK-48756 URL: https://issues.apache.org/jira/browse/SPARK-48756 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.1 Reporter: Martin Grund Following the work on execution info, we want to add some basic data debug capabilities to the DF API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48638) Native QueryExecution information for the dataframe
Martin Grund created SPARK-48638: Summary: Native QueryExecution information for the dataframe Key: SPARK-48638 URL: https://issues.apache.org/jira/browse/SPARK-48638 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Martin Grund Adding a new property to `DataFrame` called `queryExecution` that returns a class that contains information about the query execution and it's metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47862) Connect generated proots can't be pickled
Martin Grund created SPARK-47862: Summary: Connect generated proots can't be pickled Key: SPARK-47862 URL: https://issues.apache.org/jira/browse/SPARK-47862 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.1 Reporter: Martin Grund Fix For: 4.0.0 When Spark Connect generates the protobuf files, they're manually adjusted and moved to the right folder. However, we did not fix the package for the descriptor. This breaks serializing them to proto. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47812) Support Serializing Spark Sessions in ForEachBAtch
Martin Grund created SPARK-47812: Summary: Support Serializing Spark Sessions in ForEachBAtch Key: SPARK-47812 URL: https://issues.apache.org/jira/browse/SPARK-47812 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.1 Reporter: Martin Grund Fix For: 4.0.0 SparkSessions using Connect should be serialized when used in ForEachBatch and friends. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47336) Provide to PySpark a functionality to get estimated size of DataFrame in bytes
[ https://issues.apache.org/jira/browse/SPARK-47336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827077#comment-17827077 ] Martin Grund commented on SPARK-47336: -- I think the general idea is great! I would like to propose to change the name to reflect that this is most likely a size estimation though. > Provide to PySpark a functionality to get estimated size of DataFrame in bytes > -- > > Key: SPARK-47336 > URL: https://issues.apache.org/jira/browse/SPARK-47336 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Semyon Sinchenko >Priority: Minor > > Something equal to > sessionState().executePlan(...).optimizedPlan().stats().sizeInBytes() in > JVM-Spark. It may be done via simple call of `_jsparkSession` in a regular > PySpark and via a plugin for Spark Connect. > > This functionality is useful when one need to check a possibility of > broadcast join without modifying global broadcast threshold. > > The function in PySpark API may looks like: > `DataFrame.estimate_size_in_bytes() -> float` or > `DataFrame.estimateSizeInBytes() -> float`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47227) Spark Connect Documentation
Martin Grund created SPARK-47227: Summary: Spark Connect Documentation Key: SPARK-47227 URL: https://issues.apache.org/jira/browse/SPARK-47227 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.1 Reporter: Martin Grund Improve the documentation of Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47081) Support Query Execution Progress Messages
Martin Grund created SPARK-47081: Summary: Support Query Execution Progress Messages Key: SPARK-47081 URL: https://issues.apache.org/jira/browse/SPARK-47081 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund Fix For: 4.0.0 Spark Connect should support reporting basic query progress to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45852) Gracefully deal with recursion exception during Spark Connect logging
Martin Grund created SPARK-45852: Summary: Gracefully deal with recursion exception during Spark Connect logging Key: SPARK-45852 URL: https://issues.apache.org/jira/browse/SPARK-45852 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund ``` from google.protobuf.text_format import MessageToString from pyspark.sql.functions import col, lit df = spark.range(10) for x in range(800): df = df.withColumn(f"next{x}", lit(1)) MessageToString(df._plan.to_proto(spark._client), as_one_line=True) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45808) Improve error details for Spark Connect Client in Python
Martin Grund created SPARK-45808: Summary: Improve error details for Spark Connect Client in Python Key: SPARK-45808 URL: https://issues.apache.org/jira/browse/SPARK-45808 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund Improve the error handling in Spark Connect Python Client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45798) Assert server-side session ID in Spark Connect
Martin Grund created SPARK-45798: Summary: Assert server-side session ID in Spark Connect Key: SPARK-45798 URL: https://issues.apache.org/jira/browse/SPARK-45798 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund When accessing the Spark Session remotely, it is possible that the server has silently restarted and we loose temporary state like for example views or function definitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45167) Python Spark Connect client does not call `releaseAll`
[ https://issues.apache.org/jira/browse/SPARK-45167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grund updated SPARK-45167: - Issue Type: Bug (was: Improvement) > Python Spark Connect client does not call `releaseAll` > -- > > Key: SPARK-45167 > URL: https://issues.apache.org/jira/browse/SPARK-45167 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Priority: Major > > The Python client does not call release all previous responses on the server > and thus does not properly close the queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45167) Python Spark Connect client does not call `releaseAll`
Martin Grund created SPARK-45167: Summary: Python Spark Connect client does not call `releaseAll` Key: SPARK-45167 URL: https://issues.apache.org/jira/browse/SPARK-45167 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund The Python client does not call release all previous responses on the server and thus does not properly close the queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45048) Add additional tests for Python client
Martin Grund created SPARK-45048: Summary: Add additional tests for Python client Key: SPARK-45048 URL: https://issues.apache.org/jira/browse/SPARK-45048 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44931) Fix JSON Serailization for Spark Connect Event Listener
Martin Grund created SPARK-44931: Summary: Fix JSON Serailization for Spark Connect Event Listener Key: SPARK-44931 URL: https://issues.apache.org/jira/browse/SPARK-44931 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44815) Cache Schema of DF
Martin Grund created SPARK-44815: Summary: Cache Schema of DF Key: SPARK-44815 URL: https://issues.apache.org/jira/browse/SPARK-44815 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44814) Test to trigger protobuf 4.23.3 crash
Martin Grund created SPARK-44814: Summary: Test to trigger protobuf 4.23.3 crash Key: SPARK-44814 URL: https://issues.apache.org/jira/browse/SPARK-44814 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44740) Allow configuring the session ID for a spark connect client in the remote string
Martin Grund created SPARK-44740: Summary: Allow configuring the session ID for a spark connect client in the remote string Key: SPARK-44740 URL: https://issues.apache.org/jira/browse/SPARK-44740 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44738) Spark Connect Reattach misses metadata propagation
Martin Grund created SPARK-44738: Summary: Spark Connect Reattach misses metadata propagation Key: SPARK-44738 URL: https://issues.apache.org/jira/browse/SPARK-44738 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund Fix For: 3.5.0 Currently, in the Spark Connect Reattach handler client metadata is not propgated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44528) Spark Connect DataFrame does not allow to add custom instance attributes and check for it
[ https://issues.apache.org/jira/browse/SPARK-44528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grund updated SPARK-44528: - Summary: Spark Connect DataFrame does not allow to add custom instance attributes and check for it (was: Spark Connect DataFrame does not allow to add custom instance attributes) > Spark Connect DataFrame does not allow to add custom instance attributes and > check for it > - > > Key: SPARK-44528 > URL: https://issues.apache.org/jira/browse/SPARK-44528 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.1 >Reporter: Martin Grund >Priority: Major > > ``` > df = spark.range(10) > df._test = 10 > ``` > Treats `df._test` like a column -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44528) Spark Connect DataFrame does not allow to add custom instance attributes and check for it
[ https://issues.apache.org/jira/browse/SPARK-44528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grund updated SPARK-44528: - Description: ``` df = spark.range(10) df._test = 10 assert(hasattr(df, "_test")) assert(!hasattr(df, "_test_no")) ``` Treats `df._test` like a column was: ``` df = spark.range(10) df._test = 10 ``` Treats `df._test` like a column > Spark Connect DataFrame does not allow to add custom instance attributes and > check for it > - > > Key: SPARK-44528 > URL: https://issues.apache.org/jira/browse/SPARK-44528 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.1 >Reporter: Martin Grund >Priority: Major > > ``` > df = spark.range(10) > df._test = 10 > assert(hasattr(df, "_test")) > assert(!hasattr(df, "_test_no")) > ``` > Treats `df._test` like a column -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44528) Spark Connect DataFrame does not allow to add custom instance attributes
Martin Grund created SPARK-44528: Summary: Spark Connect DataFrame does not allow to add custom instance attributes Key: SPARK-44528 URL: https://issues.apache.org/jira/browse/SPARK-44528 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.1 Reporter: Martin Grund ``` ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44528) Spark Connect DataFrame does not allow to add custom instance attributes
[ https://issues.apache.org/jira/browse/SPARK-44528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grund updated SPARK-44528: - Description: ``` df = spark.range(10) df._test = 10 ``` Treats `df._test` like a column was: ``` ``` > Spark Connect DataFrame does not allow to add custom instance attributes > > > Key: SPARK-44528 > URL: https://issues.apache.org/jira/browse/SPARK-44528 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.1 >Reporter: Martin Grund >Priority: Major > > ``` > df = spark.range(10) > df._test = 10 > ``` > Treats `df._test` like a column -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44505) DataSource v2 Scans should not require planning the input partitions on explain
Martin Grund created SPARK-44505: Summary: DataSource v2 Scans should not require planning the input partitions on explain Key: SPARK-44505 URL: https://issues.apache.org/jira/browse/SPARK-44505 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Martin Grund Right now, we will always call `planInputPartitions()` for a DSv2 implementation even if there is no spark job run but only explain. We should provide a way to avoid scanning all input partitions just to determine if the input is columnar or not. The scan should provide an override. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43958) Channel Builder for Golang client
Martin Grund created SPARK-43958: Summary: Channel Builder for Golang client Key: SPARK-43958 URL: https://issues.apache.org/jira/browse/SPARK-43958 Project: Spark Issue Type: Improvement Components: Connect Contrib Affects Versions: 3.4.0 Reporter: Martin Grund Support Channel builder with default URL parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43909) Golang Repository Workflows
Martin Grund created SPARK-43909: Summary: Golang Repository Workflows Key: SPARK-43909 URL: https://issues.apache.org/jira/browse/SPARK-43909 Project: Spark Issue Type: Improvement Components: Connect Contrib Affects Versions: 3.4.0 Reporter: Martin Grund Umbrella Jira for the github setup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43895) Skeleton Golang Repository
Martin Grund created SPARK-43895: Summary: Skeleton Golang Repository Key: SPARK-43895 URL: https://issues.apache.org/jira/browse/SPARK-43895 Project: Spark Issue Type: Improvement Components: Connect Contrib Affects Versions: 3.4.0 Reporter: Martin Grund Prepare the build for the Spark Connect go client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43894) df.cache() not working
Martin Grund created SPARK-43894: Summary: df.cache() not working Key: SPARK-43894 URL: https://issues.apache.org/jira/browse/SPARK-43894 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Calling `df.cache()` will throw an exception ``` (org.apache.spark.sql.connect.common.InvalidPlanInput) Unknown Analyze Method ANALYZE_NOT_SET! ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43509) Support creating multiple sessions for Spark Connect in PySpark
Martin Grund created SPARK-43509: Summary: Support creating multiple sessions for Spark Connect in PySpark Key: SPARK-43509 URL: https://issues.apache.org/jira/browse/SPARK-43509 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43430) ExecutePlanRequest should have the ability to set request options.
Martin Grund created SPARK-43430: Summary: ExecutePlanRequest should have the ability to set request options. Key: SPARK-43430 URL: https://issues.apache.org/jira/browse/SPARK-43430 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43351) Support Golang in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720864#comment-17720864 ] Martin Grund commented on SPARK-43351: -- 1.1 we had the same questions with Python and Scala and right now the best way is to keep the same directory as we can depend better on the proto artifacts 1.5 we can take care of this in the initial PR 2. We should do basic DF stuff and if we want streaming as well. I would exclude UDFs for now. > Support Golang in Spark Connect > --- > > Key: SPARK-43351 > URL: https://issues.apache.org/jira/browse/SPARK-43351 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: BoYang >Priority: Major > > Support Spark Connect client side in Go programming language -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43332) Allow ChannelBuilder extensions
Martin Grund created SPARK-43332: Summary: Allow ChannelBuilder extensions Key: SPARK-43332 URL: https://issues.apache.org/jira/browse/SPARK-43332 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Allow to make the channelbuilder extensible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43249) df.sql() should send metrics back()
Martin Grund created SPARK-43249: Summary: df.sql() should send metrics back() Key: SPARK-43249 URL: https://issues.apache.org/jira/browse/SPARK-43249 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund df.sql() does not return the metrics to the client when executed as a command. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41628) Support async query execution
[ https://issues.apache.org/jira/browse/SPARK-41628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707580#comment-17707580 ] Martin Grund commented on SPARK-41628: -- I think this needs a bit more discussion. Generally, I think it would be possible to model the asynchronous execution using the existing model. A couple of things would be needed up front: 1. `ExecutePlanRequest` needs an execution mode to indicate if the client wants a blocking request or non-blocking, this should probably be an enum 2. `ExecutePlanResponse` needs to indicate the query ID to be able to resume the query to block and fetch results 3. We need a way to check the status, but this could probably be modeled as a `Command` (e.g. QueryStatusCommand) Once this API is specced out, the next step is to identify how to perform the query execution in the background so that the results can be fetched when available. My suggestion would be to prepare a small doc on what exactly you're doing so that we can have a discussion. Feel free to do the design for this in a README in a pull request if this is preferred. > Support async query execution > - > > Key: SPARK-41628 > URL: https://issues.apache.org/jira/browse/SPARK-41628 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > Today the query execution is completely synchronous, add an additional > asynchronous API that allows to submit and polll for the result. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42853) Update the Spark Doc to match the new website style
Martin Grund created SPARK-42853: Summary: Update the Spark Doc to match the new website style Key: SPARK-42853 URL: https://issues.apache.org/jira/browse/SPARK-42853 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42816) Increase max message size to 128MB
Martin Grund created SPARK-42816: Summary: Increase max message size to 128MB Key: SPARK-42816 URL: https://issues.apache.org/jira/browse/SPARK-42816 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Support messages up to 128MB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42733) df.write.format().save() should support calling with no path or table name
Martin Grund created SPARK-42733: Summary: df.write.format().save() should support calling with no path or table name Key: SPARK-42733 URL: https://issues.apache.org/jira/browse/SPARK-42733 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund When calling `session.range(5).write.format("xxx").options().save()` Spark Connect currently throws an assertion error because it expects that either path or tableName are present. According to our current PySpark implementation that is not necessary though. {code:python} if format is not None: self.format(format) if path is None: self._jwrite.save() else: self._jwrite.save(path) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42374) User-facing documentaiton
[ https://issues.apache.org/jira/browse/SPARK-42374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688394#comment-17688394 ] Martin Grund commented on SPARK-42374: -- Yes, that is correct. There is not built-in authentication. The benefit of the GRPC / HTTP2 interface is that it's very easy to put a capable authenticating proxy in front of it so that we don't need to implement the logic in Spark directly, but can simply use existing infrastructure. > User-facing documentaiton > - > > Key: SPARK-42374 > URL: https://issues.apache.org/jira/browse/SPARK-42374 > Project: Spark > Issue Type: Documentation > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Haejoon Lee >Priority: Major > > Should provide the user-facing documentation so end users how to use Spark > Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688393#comment-17688393 ] Martin Grund commented on SPARK-39375: -- [~tgraves] Currently the Python UDFs are implemented exactly the same way as they are today. In today's world, they are serialized bytes that are sent from the Python process via Py4J to the driver and then to the executors where they're deserialized and executed. The primary difference to Spark Connect is that we don't use Py4J anymore but leverage the protocol directly. This is backward compatible and allows us to make sure that we can build upon the existing architecture going forward. Please keep in mind that today, the Python process for the UDF execution is started by the executor as part of query execution. Depending on the setup the Python process is kept around or destroyed at the end of the processing. None of this behavior changed. This means that all of the existing applications using PySpark will simply continue to work. Similarly, this means we're not changing the assumptions around the requirements of which Python version has to be present where. In the same way, the Python version on the client has to be the same as on the executor. The reason we did not create a design for it is that we did not change the semantics, the logic or the implementation. This is very similar to the way we're translating the Spark Connect proto API into Catalyst plans. > SPIP: Spark Connect - A client and server interface for Apache Spark > > > Key: SPARK-39375 > URL: https://issues.apache.org/jira/browse/SPARK-39375 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Critical > Labels: SPIP > > Please find the full document for discussion here: [Spark Connect > SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj] > Below, we have just referenced the introduction. > h2. What are you trying to do? > While Spark is used extensively, it was designed nearly a decade ago, which, > in the age of serverless computing and ubiquitous programming language use, > poses a number of limitations. Most of the limitations stem from the tightly > coupled Spark driver architecture and fact that clusters are typically shared > across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark > driver runs both the client application and scheduler, which results in a > heavyweight architecture that requires proximity to the cluster. There is no > built-in capability to remotely connect to a Spark cluster in languages > other than SQL and users therefore rely on external solutions such as the > inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich > developer experience{*}: The current architecture and APIs do not cater for > interactive data exploration (as done with Notebooks), or allow for building > out rich developer experience common in modern code editors. (3) > {*}Stability{*}: with the current shared driver architecture, users causing > critical exceptions (e.g. OOM) bring the whole cluster down for all users. > (4) {*}Upgradability{*}: the current entangling of platform and client APIs > (e.g. first and third-party dependencies in the classpath) does not allow for > seamless upgrades between Spark versions (and with that, hinders new feature > adoption). > > We propose to overcome these challenges by building on the DataFrame API and > the underlying unresolved logical plans. The DataFrame API is widely used and > makes it very easy to iteratively express complex logic. We will introduce > {_}Spark Connect{_}, a remote option of the DataFrame API that separates the > client from the Spark server. With Spark Connect, Spark will become > decoupled, allowing for built-in remote connectivity: The decoupled client > SDK can be used to run interactive data exploration and connect to the server > for DataFrame operations. > > Spark Connect will benefit Spark developers in different ways: The decoupled > architecture will result in improved stability, as clients are separated from > the driver. From the Spark Connect client perspective, Spark will be (almost) > versionless, and thus enable seamless upgradability, as server APIs can > evolve without affecting the client API. The decoupled client-server > architecture can be leveraged to build close integrations with local > developer tooling. Finally, separating the client process from the Spark > server process will improve Spark’s overall security posture by avoiding the > tight coupling of the client inside the Spark runtime
[jira] [Created] (SPARK-42156) Support client-side retries in Spark Connect Python client
Martin Grund created SPARK-42156: Summary: Support client-side retries in Spark Connect Python client Key: SPARK-42156 URL: https://issues.apache.org/jira/browse/SPARK-42156 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42029) Distribution build for Spark Connect does not work with Spark Shell
Martin Grund created SPARK-42029: Summary: Distribution build for Spark Connect does not work with Spark Shell Key: SPARK-42029 URL: https://issues.apache.org/jira/browse/SPARK-42029 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps
Martin Grund created SPARK-42028: Summary: Support Pandas DF to Spark DF with Nanosecond Timestamps Key: SPARK-42028 URL: https://issues.apache.org/jira/browse/SPARK-42028 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42027) CreateDataframe from Pandas with Struct and Timestamp
Martin Grund created SPARK-42027: Summary: CreateDataframe from Pandas with Struct and Timestamp Key: SPARK-42027 URL: https://issues.apache.org/jira/browse/SPARK-42027 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund The following should be supported and correctly truncate the nanosecond timestamps. {code:python} from datetime import datetime, timezone, timedelta from pandas import Timestamp ts=Timestamp(year=2019, month=1, day=1, nanosecond=500, tz=timezone(timedelta(hours=-8))) d = pd.DataFrame({"col1": [1], "col2": [{"a":1, "b":2.32, "c":ts}]}) spark.createDataFrame(d).collect() {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41919) Unify the schema or datatype in protos
[ https://issues.apache.org/jira/browse/SPARK-41919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655346#comment-17655346 ] Martin Grund commented on SPARK-41919: -- There is a compatible way of doing this by adding another field to the oneof that represents the string schema and marking the other fields as reserved. > Unify the schema or datatype in protos > -- > > Key: SPARK-41919 > URL: https://issues.apache.org/jira/browse/SPARK-41919 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > this ticket only focus on the protos sent from client to server. > we normally use > {code:java} > oneof schema { > DataType datatype = 2; > // Server will use Catalyst parser to parse this string to DataType. > string datatype_str = 3; > } > {code} > to represent a schema or datatype. > actually, we can simplify it with just a string. In the server, we can easily > parse a DDL-formatted schema or a JSON formatted one. > {code:java} > // (Optional) The schema of local data. > // It should be either a DDL-formatted type string or a JSON string. > // > // The server side will update the column names and data types according to > this schema. > // If the 'data' is not provided, then this schema will be required. > optional string schema = 2; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41918) Refine the naming in proto messages
[ https://issues.apache.org/jira/browse/SPARK-41918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655345#comment-17655345 ] Martin Grund commented on SPARK-41918: -- Renaming fields is WIRE compatible and most likely this is going to be the preferred way of compatibility for the protos. > Refine the naming in proto messages > --- > > Key: SPARK-41918 > URL: https://issues.apache.org/jira/browse/SPARK-41918 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > normally, we name the fields after the corresponding LogiclalPlan or > DataFrame API, but they are not consistent in protos, for example, the column > name: > {code:java} > message UnresolvedRegex { > // (Required) The column name used to extract column with regex. > string col_name = 1; > } > {code} > {code:java} > message Alias { > // (Required) The expression that alias will be added on. > Expression expr = 1; > // (Required) a list of name parts for the alias. > // > // Scalar columns only has one name that presents. > repeated string name = 2; > // (Optional) Alias metadata expressed as a JSON map. > optional string metadata = 3; > } > {code} > {code:java} > // Relation of type [[Deduplicate]] which have duplicate rows removed, could > consider either only > // the subset of columns or all the columns. > message Deduplicate { > // (Required) Input relation for a Deduplicate. > Relation input = 1; > // (Optional) Deduplicate based on a list of column names. > // > // This field does not co-use with `all_columns_as_keys`. > repeated string column_names = 2; > // (Optional) Deduplicate based on all the columns of the input relation. > // > // This field does not co-use with `column_names`. > optional bool all_columns_as_keys = 3; > } > {code} > {code:java} > // Computes basic statistics for numeric and string columns, including count, > mean, stddev, min, > // and max. If no columns are given, this function computes statistics for > all numerical or > // string columns. > message StatDescribe { > // (Required) The input relation. > Relation input = 1; > // (Optional) Columns to compute statistics on. > repeated string cols = 2; > } > {code} > we probably should unify the naming: > single column -> `column` > multi columns -> `columns` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41911) Add version fields to Connect proto
[ https://issues.apache.org/jira/browse/SPARK-41911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655343#comment-17655343 ] Martin Grund commented on SPARK-41911: -- I think the first part would be to identify which messages would need a version, but the version does not need to be the first field in the proto. > Add version fields to Connect proto > --- > > Key: SPARK-41911 > URL: https://issues.apache.org/jira/browse/SPARK-41911 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > > We may need this to help maintain compatibility. Depending on the concrete > protocol design, we may use field number 1 for version fields thus may cause > breaking changes on existing proto messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41910) Remove `optional` notation in proto
[ https://issues.apache.org/jira/browse/SPARK-41910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655336#comment-17655336 ] Martin Grund commented on SPARK-41910: -- Didn't we just have the discussion on why we wanted to use optional? In particular for scalar values that have a default value? > Remove `optional` notation in proto > --- > > Key: SPARK-41910 > URL: https://issues.apache.org/jira/browse/SPARK-41910 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > > Every field in proto3 has a default value. We should revisit existing proto > field to understand if the default value can be used without tell the field > is set and not set, and remove `optional` as much as possible from Spark > Connect proto surface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41755) Reorder fields to use consecutive field numbers
[ https://issues.apache.org/jira/browse/SPARK-41755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655334#comment-17655334 ] Martin Grund commented on SPARK-41755: -- While not obvious, this is not needed. We can just add a manual ignore for 28-89 and keep moving. > Reorder fields to use consecutive field numbers > --- > > Key: SPARK-41755 > URL: https://issues.apache.org/jira/browse/SPARK-41755 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > make IDs consecutive > ``` > RepartitionByExpression repartition_by_expression = 27; > // NA functions > NAFill fill_na = 90; > NADrop drop_na = 91; > NAReplace replace = 92; > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41812) DataFrame.join: ambiguous column
[ https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653649#comment-17653649 ] Martin Grund commented on SPARK-41812: -- On the Spark side we eagerly resolve the column and return a full named expression. {code:scala} Column(addDataFrameIdToCol(resolve(colName))) private[sql] def resolve(colName: String): NamedExpression = { val resolver = sparkSession.sessionState.analyzer.resolver queryExecution.analyzed.resolveQuoted(colName, resolver) .getOrElse(throw resolveException(colName, schema.fieldNames)) } {code} To avoid too many round-trips we should probably inject the Dataframe ID and column position properties in the metadata to perfom the resolution later on the server. > DataFrame.join: ambiguous column > > > Key: SPARK-41812 > URL: https://issues.apache.org/jira/browse/SPARK-41812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df1.join(df2, df1["value"] == df2["value"]).count() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df1.join(df2, df1["value"] == df2["value"]).count() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in > count > pdd = self.agg(_invoke_function("count", lit(1))).toPandas() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, > in toPandas > return self._session.client.to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in > to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in > _execute_and_fetch > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in > _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, > `value`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41815) Column.isNull returns nan instead of None
[ https://issues.apache.org/jira/browse/SPARK-41815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653643#comment-17653643 ] Martin Grund edited comment on SPARK-41815 at 1/2/23 3:56 PM: -- The reason seems to be that when Spark Connect serializes the data to Arrow it's serialized as `null` and when converted to pandas it will convert this to the `np.nan` instead of `None`. It seems that we should manually convert `nan` to `None` for the `Row` type. was (Author: JIRAUSER290467): The reason seems to be that when Spark Connect serializes the data to Arrow it's serialized as `null` and when converted to pandas it will convert this to the `np.nan` instead of `None`. It seems that we should manually convert `nan` to `None` for the `Row` type. > Column.isNull returns nan instead of None > - > > Key: SPARK-41815 > URL: https://issues.apache.org/jira/browse/SPARK-41815 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 99, in > pyspark.sql.connect.column.Column.isNull > Failed example: > df.filter(df.height.isNull()).collect() > Expected: > [Row(name='Alice', height=None)] > Got: > [Row(name='Alice', height=nan)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41815) Column.isNull returns nan instead of None
[ https://issues.apache.org/jira/browse/SPARK-41815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653643#comment-17653643 ] Martin Grund commented on SPARK-41815: -- The reason seems to be that when Spark Connect serializes the data to Arrow it's serialized as `null` and when converted to pandas it will convert this to the `np.nan` instead of `None`. It seems that we should manually convert `nan` to `None` for the `Row` type. > Column.isNull returns nan instead of None > - > > Key: SPARK-41815 > URL: https://issues.apache.org/jira/browse/SPARK-41815 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 99, in > pyspark.sql.connect.column.Column.isNull > Failed example: > df.filter(df.height.isNull()).collect() > Expected: > [Row(name='Alice', height=None)] > Got: > [Row(name='Alice', height=nan)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41803) log() function variations are missing
Martin Grund created SPARK-41803: Summary: log() function variations are missing Key: SPARK-41803 URL: https://issues.apache.org/jira/browse/SPARK-41803 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41743) groupBy(...).agg(...).sort does not actually sort the output
[ https://issues.apache.org/jira/browse/SPARK-41743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652936#comment-17652936 ] Martin Grund commented on SPARK-41743: -- Running the doc tests actually passes for me. > groupBy(...).agg(...).sort does not actually sort the output > > > Key: SPARK-41743 > URL: https://issues.apache.org/jira/browse/SPARK-41743 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > ** > File "/.../spark/python/pyspark/sql/connect/group.py", line 211, in > pyspark.sql.connect.group.GroupedData.agg > Failed example: > df.groupBy(df.name).agg(F.min(df.age)).sort("name").show() > Differences (ndiff with -expected +actual): > +-++ > | name|min(age)| > +-++ > + | Bob| 5| > |Alice| 2| > - | Bob| 5| > +-++ > + > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41743) groupBy(...).agg(...).sort does not actually sort the output
[ https://issues.apache.org/jira/browse/SPARK-41743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652934#comment-17652934 ] Martin Grund edited comment on SPARK-41743 at 12/29/22 8:24 PM: In the following example, I cannot reproduce this: {code:python} df = spark.createDataFrame([{"age":10, "name": "Martin"},{"age":11, "name": "Anton"}]) df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).sort("name").show() {code} produces {noformat} +--++ | name|min(age)| +--++ | Anton| 11| |Martin| 10| +--++ {noformat} vs {code:python} df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).show() {code} produces {noformat} +--++ | name|min(age)| +--++ |Martin| 10| | Anton| 11| +--++ {noformat} was (Author: JIRAUSER290467): In the following example, I cannot reproduce this: ``` df = spark.createDataFrame([{"age":10, "name": "Martin"},{"age":11, "name": "Anton"}]) df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).sort("name").show() ``` produces ``` +--++ | name|min(age)| +--++ | Anton| 11| |Martin| 10| +--++ ``` vs ``` df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).show() ``` produces ``` +--++ | name|min(age)| +--++ |Martin| 10| | Anton| 11| +--++ ``` > groupBy(...).agg(...).sort does not actually sort the output > > > Key: SPARK-41743 > URL: https://issues.apache.org/jira/browse/SPARK-41743 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > ** > File "/.../spark/python/pyspark/sql/connect/group.py", line 211, in > pyspark.sql.connect.group.GroupedData.agg > Failed example: > df.groupBy(df.name).agg(F.min(df.age)).sort("name").show() > Differences (ndiff with -expected +actual): > +-++ > | name|min(age)| > +-++ > + | Bob| 5| > |Alice| 2| > - | Bob| 5| > +-++ > + > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41743) groupBy(...).agg(...).sort does not actually sort the output
[ https://issues.apache.org/jira/browse/SPARK-41743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652934#comment-17652934 ] Martin Grund commented on SPARK-41743: -- In the following example, I cannot reproduce this: ``` df = spark.createDataFrame([{"age":10, "name": "Martin"},{"age":11, "name": "Anton"}]) df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).sort("name").show() ``` produces ``` +--++ | name|min(age)| +--++ | Anton| 11| |Martin| 10| +--++ ``` vs ``` df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).show() ``` produces ``` +--++ | name|min(age)| +--++ |Martin| 10| | Anton| 11| +--++ ``` > groupBy(...).agg(...).sort does not actually sort the output > > > Key: SPARK-41743 > URL: https://issues.apache.org/jira/browse/SPARK-41743 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > ** > File "/.../spark/python/pyspark/sql/connect/group.py", line 211, in > pyspark.sql.connect.group.GroupedData.agg > Failed example: > df.groupBy(df.name).agg(F.min(df.age)).sort("name").show() > Differences (ndiff with -expected +actual): > +-++ > | name|min(age)| > +-++ > + | Bob| 5| > |Alice| 2| > - | Bob| 5| > +-++ > + > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41738) Client ID should be mixed into SparkSession cache
Martin Grund created SPARK-41738: Summary: Client ID should be mixed into SparkSession cache Key: SPARK-41738 URL: https://issues.apache.org/jira/browse/SPARK-41738 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41664) Support streaming client data to create large DataFrames
Martin Grund created SPARK-41664: Summary: Support streaming client data to create large DataFrames Key: SPARK-41664 URL: https://issues.apache.org/jira/browse/SPARK-41664 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Support client side streaming to support creation of large DataFrames from the client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41662) Minimal support for pickled Python UDFs
Martin Grund created SPARK-41662: Summary: Minimal support for pickled Python UDFs Key: SPARK-41662 URL: https://issues.apache.org/jira/browse/SPARK-41662 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Minimal support for UDFs as part of queries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41661) Support for Python UDFs
Martin Grund created SPARK-41661: Summary: Support for Python UDFs Key: SPARK-41661 URL: https://issues.apache.org/jira/browse/SPARK-41661 Project: Spark Issue Type: Umbrella Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Spark Connect should support Python UDFs -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41629) Support for protocol extensions
Martin Grund created SPARK-41629: Summary: Support for protocol extensions Key: SPARK-41629 URL: https://issues.apache.org/jira/browse/SPARK-41629 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Spark comes with many different extension points. Many of those simply become available through the shared classpath between Spark and the user application. To be able to support arbitrary plugins e.g. for Delta or Iceberg, we need a way to make the Spark Connect protocol extensible and let users register their own handlers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41625) Feature parity: Streaming support
[ https://issues.apache.org/jira/browse/SPARK-41625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grund updated SPARK-41625: - Description: We need to design how support for structured streaming will look like in Spark Connect. > Feature parity: Streaming support > - > > Key: SPARK-41625 > URL: https://issues.apache.org/jira/browse/SPARK-41625 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > We need to design how support for structured streaming will look like in > Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41628) Support async query execution
Martin Grund created SPARK-41628: Summary: Support async query execution Key: SPARK-41628 URL: https://issues.apache.org/jira/browse/SPARK-41628 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Today the query execution is completely synchronous, add an additional asynchronous API that allows to submit and polll for the result. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41627) Spark Connect Server Development
Martin Grund created SPARK-41627: Summary: Spark Connect Server Development Key: SPARK-41627 URL: https://issues.apache.org/jira/browse/SPARK-41627 Project: Spark Issue Type: Umbrella Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41626) Document validation choices for local and remote input validation
Martin Grund created SPARK-41626: Summary: Document validation choices for local and remote input validation Key: SPARK-41626 URL: https://issues.apache.org/jira/browse/SPARK-41626 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41625) Feature parity: Streaming support
Martin Grund created SPARK-41625: Summary: Feature parity: Streaming support Key: SPARK-41625 URL: https://issues.apache.org/jira/browse/SPARK-41625 Project: Spark Issue Type: Umbrella Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41624) Support Python logging
Martin Grund created SPARK-41624: Summary: Support Python logging Key: SPARK-41624 URL: https://issues.apache.org/jira/browse/SPARK-41624 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Since the Spark Connect client cannot leverage the JVM based logging we need to add additional instrumentation to make sure we provide enough insights for users to understand and debug errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41623) Support Catalog.uncacheTable
Martin Grund created SPARK-41623: Summary: Support Catalog.uncacheTable Key: SPARK-41623 URL: https://issues.apache.org/jira/browse/SPARK-41623 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41620) Support Catalog.registerFunction
Martin Grund created SPARK-41620: Summary: Support Catalog.registerFunction Key: SPARK-41620 URL: https://issues.apache.org/jira/browse/SPARK-41620 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41619) Support Catalog.refreshTable
Martin Grund created SPARK-41619: Summary: Support Catalog.refreshTable Key: SPARK-41619 URL: https://issues.apache.org/jira/browse/SPARK-41619 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41622) Support Catalog.setCurrentDatabase
Martin Grund created SPARK-41622: Summary: Support Catalog.setCurrentDatabase Key: SPARK-41622 URL: https://issues.apache.org/jira/browse/SPARK-41622 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41621) Support Catalog.setCurrentCatalog
Martin Grund created SPARK-41621: Summary: Support Catalog.setCurrentCatalog Key: SPARK-41621 URL: https://issues.apache.org/jira/browse/SPARK-41621 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41618) Support Catalog.recoverPartitions
Martin Grund created SPARK-41618: Summary: Support Catalog.recoverPartitions Key: SPARK-41618 URL: https://issues.apache.org/jira/browse/SPARK-41618 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41617) Support Catalog.listTables
Martin Grund created SPARK-41617: Summary: Support Catalog.listTables Key: SPARK-41617 URL: https://issues.apache.org/jira/browse/SPARK-41617 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41616) Support Catalog.listFunctions
Martin Grund created SPARK-41616: Summary: Support Catalog.listFunctions Key: SPARK-41616 URL: https://issues.apache.org/jira/browse/SPARK-41616 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41614) Support Catalog.listColumns
Martin Grund created SPARK-41614: Summary: Support Catalog.listColumns Key: SPARK-41614 URL: https://issues.apache.org/jira/browse/SPARK-41614 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41615) Support Catalog.listDatabases
Martin Grund created SPARK-41615: Summary: Support Catalog.listDatabases Key: SPARK-41615 URL: https://issues.apache.org/jira/browse/SPARK-41615 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41610) Support Catalog.getFunction
Martin Grund created SPARK-41610: Summary: Support Catalog.getFunction Key: SPARK-41610 URL: https://issues.apache.org/jira/browse/SPARK-41610 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41612) Support Catalog.isCached
Martin Grund created SPARK-41612: Summary: Support Catalog.isCached Key: SPARK-41612 URL: https://issues.apache.org/jira/browse/SPARK-41612 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41613) Support Catalog.listCatalogs
Martin Grund created SPARK-41613: Summary: Support Catalog.listCatalogs Key: SPARK-41613 URL: https://issues.apache.org/jira/browse/SPARK-41613 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41611) Support Catalog.getTable
Martin Grund created SPARK-41611: Summary: Support Catalog.getTable Key: SPARK-41611 URL: https://issues.apache.org/jira/browse/SPARK-41611 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41608) Support Catalog.functionExists
Martin Grund created SPARK-41608: Summary: Support Catalog.functionExists Key: SPARK-41608 URL: https://issues.apache.org/jira/browse/SPARK-41608 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41609) Support Catalog.getDatabase
Martin Grund created SPARK-41609: Summary: Support Catalog.getDatabase Key: SPARK-41609 URL: https://issues.apache.org/jira/browse/SPARK-41609 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41607) Support Catalog.dropTempView
Martin Grund created SPARK-41607: Summary: Support Catalog.dropTempView Key: SPARK-41607 URL: https://issues.apache.org/jira/browse/SPARK-41607 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41604) Support Catalog.currentCatalog
Martin Grund created SPARK-41604: Summary: Support Catalog.currentCatalog Key: SPARK-41604 URL: https://issues.apache.org/jira/browse/SPARK-41604 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41606) Support Catalog dropGlobalTempView
Martin Grund created SPARK-41606: Summary: Support Catalog dropGlobalTempView Key: SPARK-41606 URL: https://issues.apache.org/jira/browse/SPARK-41606 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41605) Support Catalog.currentDatabase
Martin Grund created SPARK-41605: Summary: Support Catalog.currentDatabase Key: SPARK-41605 URL: https://issues.apache.org/jira/browse/SPARK-41605 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41602) Support Catalog.createExternalTable
Martin Grund created SPARK-41602: Summary: Support Catalog.createExternalTable Key: SPARK-41602 URL: https://issues.apache.org/jira/browse/SPARK-41602 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41603) Support Catalog.createTable
Martin Grund created SPARK-41603: Summary: Support Catalog.createTable Key: SPARK-41603 URL: https://issues.apache.org/jira/browse/SPARK-41603 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41601) Support Catalog.clearCache
Martin Grund created SPARK-41601: Summary: Support Catalog.clearCache Key: SPARK-41601 URL: https://issues.apache.org/jira/browse/SPARK-41601 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41600) Support Catalog.cacheTable
Martin Grund created SPARK-41600: Summary: Support Catalog.cacheTable Key: SPARK-41600 URL: https://issues.apache.org/jira/browse/SPARK-41600 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41560) Document how to add new functions
Martin Grund created SPARK-41560: Summary: Document how to add new functions Key: SPARK-41560 URL: https://issues.apache.org/jira/browse/SPARK-41560 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Please add documentation that outlines how to add support for a new function based on the concept of unresolved functions and documentation for the support of functions using based on the approach used for case/when and lambda. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41537) Protobuf backwards compatibility testing
Martin Grund created SPARK-41537: Summary: Protobuf backwards compatibility testing Key: SPARK-41537 URL: https://issues.apache.org/jira/browse/SPARK-41537 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41533) GRPC Errors on the client should be cleaned up
Martin Grund created SPARK-41533: Summary: GRPC Errors on the client should be cleaned up Key: SPARK-41533 URL: https://issues.apache.org/jira/browse/SPARK-41533 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund When the server throws an exception we report a very deep stack trace that is not helpful for the user. We need to separate the cause from the user visible exception and wrap the error into custom exception instead of publishing the RPCError from GRPC -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41532) DF operations that involve multiple data frames should fail if sessions don't match
Martin Grund created SPARK-41532: Summary: DF operations that involve multiple data frames should fail if sessions don't match Key: SPARK-41532 URL: https://issues.apache.org/jira/browse/SPARK-41532 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund We do not support joining for example two data frames from different Spark Connect Sessions. To avoid exceptions, the client should clearly fail when it tries to construct such a composition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41531) Debugging and Stability
Martin Grund created SPARK-41531: Summary: Debugging and Stability Key: SPARK-41531 URL: https://issues.apache.org/jira/browse/SPARK-41531 Project: Spark Issue Type: Umbrella Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Umbrella JIRA for items on debugging, logging and stability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41366) DF.groupby.agg() API should be compatible
Martin Grund created SPARK-41366: Summary: DF.groupby.agg() API should be compatible Key: SPARK-41366 URL: https://issues.apache.org/jira/browse/SPARK-41366 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41362) Better type errors when passing wrong parameters
Martin Grund created SPARK-41362: Summary: Better type errors when passing wrong parameters Key: SPARK-41362 URL: https://issues.apache.org/jira/browse/SPARK-41362 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund Throw better error messages when passing the wrong types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org