[jira] [Created] (SPARK-48982) [GO] Extract Spark Exceptions from GRPC response

2024-07-23 Thread Martin Grund (Jira)
Martin Grund created SPARK-48982:


 Summary: [GO] Extract Spark Exceptions from GRPC response
 Key: SPARK-48982
 URL: https://issues.apache.org/jira/browse/SPARK-48982
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.1
Reporter: Martin Grund
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48951) Column and Function Support for Go

2024-07-19 Thread Martin Grund (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grund reassigned SPARK-48951:


Assignee: Martin Grund

> Column and Function Support for Go
> --
>
> Key: SPARK-48951
> URL: https://issues.apache.org/jira/browse/SPARK-48951
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.1
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
>
> Support Column & Function feature parity in Go client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48951) Column and Function Support for Go

2024-07-19 Thread Martin Grund (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grund resolved SPARK-48951.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 35
[https://github.com/apache/spark-connect-go/pull/35]

> Column and Function Support for Go
> --
>
> Key: SPARK-48951
> URL: https://issues.apache.org/jira/browse/SPARK-48951
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.1
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support Column & Function feature parity in Go client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48951) Column and Function Support for Go

2024-07-19 Thread Martin Grund (Jira)
Martin Grund created SPARK-48951:


 Summary: Column and Function Support for Go
 Key: SPARK-48951
 URL: https://issues.apache.org/jira/browse/SPARK-48951
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.1
Reporter: Martin Grund


Support Column & Function feature parity in Go client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48756) Add support for `df.debug()`

2024-06-28 Thread Martin Grund (Jira)
Martin Grund created SPARK-48756:


 Summary: Add support for `df.debug()`
 Key: SPARK-48756
 URL: https://issues.apache.org/jira/browse/SPARK-48756
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.1
Reporter: Martin Grund


Following the work on execution info, we want to add some basic data debug 
capabilities to the DF API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48638) Native QueryExecution information for the dataframe

2024-06-16 Thread Martin Grund (Jira)
Martin Grund created SPARK-48638:


 Summary: Native QueryExecution information for the dataframe
 Key: SPARK-48638
 URL: https://issues.apache.org/jira/browse/SPARK-48638
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Martin Grund


Adding a new property to `DataFrame` called `queryExecution` that returns a 
class that contains information about the query execution and it's metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47862) Connect generated proots can't be pickled

2024-04-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-47862:


 Summary: Connect generated proots can't be pickled
 Key: SPARK-47862
 URL: https://issues.apache.org/jira/browse/SPARK-47862
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.1
Reporter: Martin Grund
 Fix For: 4.0.0


When Spark Connect generates the protobuf files, they're manually adjusted and 
moved to the right folder. However, we did not fix the package for the 
descriptor. This breaks serializing them to proto.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47812) Support Serializing Spark Sessions in ForEachBAtch

2024-04-11 Thread Martin Grund (Jira)
Martin Grund created SPARK-47812:


 Summary: Support Serializing Spark Sessions in ForEachBAtch
 Key: SPARK-47812
 URL: https://issues.apache.org/jira/browse/SPARK-47812
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.1
Reporter: Martin Grund
 Fix For: 4.0.0


SparkSessions using Connect should be serialized when used in ForEachBatch and 
friends.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47336) Provide to PySpark a functionality to get estimated size of DataFrame in bytes

2024-03-14 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827077#comment-17827077
 ] 

Martin Grund commented on SPARK-47336:
--

I think the general idea is great! I would like to propose to change the name 
to reflect that this is most likely a size estimation though.

> Provide to PySpark a functionality to get estimated size of DataFrame in bytes
> --
>
> Key: SPARK-47336
> URL: https://issues.apache.org/jira/browse/SPARK-47336
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Semyon Sinchenko
>Priority: Minor
>
> Something equal to 
> sessionState().executePlan(...).optimizedPlan().stats().sizeInBytes() in 
> JVM-Spark. It may be done via simple call of `_jsparkSession` in a regular 
> PySpark and via a plugin for Spark Connect.
>  
> This functionality is useful when one need to check a possibility of 
> broadcast join without modifying global broadcast threshold.
>  
> The function in PySpark API may looks like: 
> `DataFrame.estimate_size_in_bytes() -> float` or 
> `DataFrame.estimateSizeInBytes() -> float`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47227) Spark Connect Documentation

2024-02-29 Thread Martin Grund (Jira)
Martin Grund created SPARK-47227:


 Summary: Spark Connect Documentation
 Key: SPARK-47227
 URL: https://issues.apache.org/jira/browse/SPARK-47227
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.1
Reporter: Martin Grund


Improve the documentation of Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47081) Support Query Execution Progress Messages

2024-02-17 Thread Martin Grund (Jira)
Martin Grund created SPARK-47081:


 Summary: Support Query Execution Progress Messages
 Key: SPARK-47081
 URL: https://issues.apache.org/jira/browse/SPARK-47081
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund
 Fix For: 4.0.0


Spark Connect should support reporting basic query progress to the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45852) Gracefully deal with recursion exception during Spark Connect logging

2023-11-09 Thread Martin Grund (Jira)
Martin Grund created SPARK-45852:


 Summary: Gracefully deal with recursion exception during Spark 
Connect logging
 Key: SPARK-45852
 URL: https://issues.apache.org/jira/browse/SPARK-45852
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund


```
from google.protobuf.text_format import MessageToString
from pyspark.sql.functions import col, lit

df = spark.range(10)

for x in range(800):
  df = df.withColumn(f"next{x}", lit(1))
  MessageToString(df._plan.to_proto(spark._client), as_one_line=True)

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45808) Improve error details for Spark Connect Client in Python

2023-11-06 Thread Martin Grund (Jira)
Martin Grund created SPARK-45808:


 Summary: Improve error details for Spark Connect Client in Python
 Key: SPARK-45808
 URL: https://issues.apache.org/jira/browse/SPARK-45808
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund


Improve the error handling in Spark Connect Python Client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45798) Assert server-side session ID in Spark Connect

2023-11-05 Thread Martin Grund (Jira)
Martin Grund created SPARK-45798:


 Summary: Assert server-side session ID in Spark Connect
 Key: SPARK-45798
 URL: https://issues.apache.org/jira/browse/SPARK-45798
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund


When accessing the Spark Session remotely, it is possible that the server has 
silently restarted and we loose temporary state like for example views or 
function definitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45167) Python Spark Connect client does not call `releaseAll`

2023-09-14 Thread Martin Grund (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grund updated SPARK-45167:
-
Issue Type: Bug  (was: Improvement)

> Python Spark Connect client does not call `releaseAll`
> --
>
> Key: SPARK-45167
> URL: https://issues.apache.org/jira/browse/SPARK-45167
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Priority: Major
>
> The Python client does not call release all previous responses on the server 
> and thus does not properly close the queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45167) Python Spark Connect client does not call `releaseAll`

2023-09-14 Thread Martin Grund (Jira)
Martin Grund created SPARK-45167:


 Summary: Python Spark Connect client does not call `releaseAll`
 Key: SPARK-45167
 URL: https://issues.apache.org/jira/browse/SPARK-45167
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund


The Python client does not call release all previous responses on the server 
and thus does not properly close the queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45048) Add additional tests for Python client

2023-09-01 Thread Martin Grund (Jira)
Martin Grund created SPARK-45048:


 Summary: Add additional tests for Python client
 Key: SPARK-45048
 URL: https://issues.apache.org/jira/browse/SPARK-45048
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44931) Fix JSON Serailization for Spark Connect Event Listener

2023-08-23 Thread Martin Grund (Jira)
Martin Grund created SPARK-44931:


 Summary: Fix JSON Serailization for Spark Connect Event Listener
 Key: SPARK-44931
 URL: https://issues.apache.org/jira/browse/SPARK-44931
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44815) Cache Schema of DF

2023-08-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-44815:


 Summary: Cache Schema of DF
 Key: SPARK-44815
 URL: https://issues.apache.org/jira/browse/SPARK-44815
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44814) Test to trigger protobuf 4.23.3 crash

2023-08-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-44814:


 Summary: Test to trigger protobuf 4.23.3 crash
 Key: SPARK-44814
 URL: https://issues.apache.org/jira/browse/SPARK-44814
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44740) Allow configuring the session ID for a spark connect client in the remote string

2023-08-09 Thread Martin Grund (Jira)
Martin Grund created SPARK-44740:


 Summary: Allow configuring the session ID for a spark connect 
client in the remote string
 Key: SPARK-44740
 URL: https://issues.apache.org/jira/browse/SPARK-44740
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44738) Spark Connect Reattach misses metadata propagation

2023-08-09 Thread Martin Grund (Jira)
Martin Grund created SPARK-44738:


 Summary: Spark Connect Reattach misses metadata propagation
 Key: SPARK-44738
 URL: https://issues.apache.org/jira/browse/SPARK-44738
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund
 Fix For: 3.5.0


Currently, in the Spark Connect Reattach handler client metadata is not 
propgated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44528) Spark Connect DataFrame does not allow to add custom instance attributes and check for it

2023-07-24 Thread Martin Grund (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grund updated SPARK-44528:
-
Summary: Spark Connect DataFrame does not allow to add custom instance 
attributes and check for it  (was: Spark Connect DataFrame does not allow to 
add custom instance attributes)

> Spark Connect DataFrame does not allow to add custom instance attributes and 
> check for it
> -
>
> Key: SPARK-44528
> URL: https://issues.apache.org/jira/browse/SPARK-44528
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Martin Grund
>Priority: Major
>
> ```
> df = spark.range(10)
> df._test = 10
> ```
> Treats `df._test` like a column



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44528) Spark Connect DataFrame does not allow to add custom instance attributes and check for it

2023-07-24 Thread Martin Grund (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grund updated SPARK-44528:
-
Description: 
```
df = spark.range(10)
df._test = 10

assert(hasattr(df, "_test"))
assert(!hasattr(df, "_test_no"))
```

Treats `df._test` like a column

  was:
```
df = spark.range(10)
df._test = 10
```

Treats `df._test` like a column


> Spark Connect DataFrame does not allow to add custom instance attributes and 
> check for it
> -
>
> Key: SPARK-44528
> URL: https://issues.apache.org/jira/browse/SPARK-44528
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Martin Grund
>Priority: Major
>
> ```
> df = spark.range(10)
> df._test = 10
> assert(hasattr(df, "_test"))
> assert(!hasattr(df, "_test_no"))
> ```
> Treats `df._test` like a column



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44528) Spark Connect DataFrame does not allow to add custom instance attributes

2023-07-24 Thread Martin Grund (Jira)
Martin Grund created SPARK-44528:


 Summary: Spark Connect DataFrame does not allow to add custom 
instance attributes
 Key: SPARK-44528
 URL: https://issues.apache.org/jira/browse/SPARK-44528
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.1
Reporter: Martin Grund


```
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44528) Spark Connect DataFrame does not allow to add custom instance attributes

2023-07-24 Thread Martin Grund (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grund updated SPARK-44528:
-
Description: 
```
df = spark.range(10)
df._test = 10
```

Treats `df._test` like a column

  was:
```
```


> Spark Connect DataFrame does not allow to add custom instance attributes
> 
>
> Key: SPARK-44528
> URL: https://issues.apache.org/jira/browse/SPARK-44528
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Martin Grund
>Priority: Major
>
> ```
> df = spark.range(10)
> df._test = 10
> ```
> Treats `df._test` like a column



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44505) DataSource v2 Scans should not require planning the input partitions on explain

2023-07-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-44505:


 Summary: DataSource v2 Scans should not require planning the input 
partitions on explain
 Key: SPARK-44505
 URL: https://issues.apache.org/jira/browse/SPARK-44505
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Martin Grund


Right now, we will always call `planInputPartitions()` for a DSv2 
implementation even if there is no spark job run but only explain.

We should provide a way to avoid scanning all input partitions just to 
determine if the input is columnar or not. The scan should provide an override.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43958) Channel Builder for Golang client

2023-06-03 Thread Martin Grund (Jira)
Martin Grund created SPARK-43958:


 Summary: Channel Builder for Golang client
 Key: SPARK-43958
 URL: https://issues.apache.org/jira/browse/SPARK-43958
 Project: Spark
  Issue Type: Improvement
  Components: Connect Contrib
Affects Versions: 3.4.0
Reporter: Martin Grund


Support Channel builder with default URL parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43909) Golang Repository Workflows

2023-06-01 Thread Martin Grund (Jira)
Martin Grund created SPARK-43909:


 Summary: Golang Repository Workflows
 Key: SPARK-43909
 URL: https://issues.apache.org/jira/browse/SPARK-43909
 Project: Spark
  Issue Type: Improvement
  Components: Connect Contrib
Affects Versions: 3.4.0
Reporter: Martin Grund


Umbrella Jira for the github setup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43895) Skeleton Golang Repository

2023-05-31 Thread Martin Grund (Jira)
Martin Grund created SPARK-43895:


 Summary: Skeleton Golang Repository
 Key: SPARK-43895
 URL: https://issues.apache.org/jira/browse/SPARK-43895
 Project: Spark
  Issue Type: Improvement
  Components: Connect Contrib
Affects Versions: 3.4.0
Reporter: Martin Grund


Prepare the build for the Spark Connect go client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43894) df.cache() not working

2023-05-31 Thread Martin Grund (Jira)
Martin Grund created SPARK-43894:


 Summary: df.cache() not working
 Key: SPARK-43894
 URL: https://issues.apache.org/jira/browse/SPARK-43894
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Calling `df.cache()` will throw an exception 

```
(org.apache.spark.sql.connect.common.InvalidPlanInput) Unknown Analyze Method 
ANALYZE_NOT_SET!
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43509) Support creating multiple sessions for Spark Connect in PySpark

2023-05-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-43509:


 Summary: Support creating multiple sessions for Spark Connect in 
PySpark
 Key: SPARK-43509
 URL: https://issues.apache.org/jira/browse/SPARK-43509
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43430) ExecutePlanRequest should have the ability to set request options.

2023-05-09 Thread Martin Grund (Jira)
Martin Grund created SPARK-43430:


 Summary: ExecutePlanRequest should have the ability to set request 
options.
 Key: SPARK-43430
 URL: https://issues.apache.org/jira/browse/SPARK-43430
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43351) Support Golang in Spark Connect

2023-05-09 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720864#comment-17720864
 ] 

Martin Grund commented on SPARK-43351:
--

1.1 we had the same questions with Python and Scala and right now the best way 
is to keep the same directory as we can depend better on the proto artifacts
1.5 we can take care of this in the initial PR


2. We should do basic DF stuff and if we want streaming as well. I would 
exclude UDFs for now.

> Support Golang in Spark Connect
> ---
>
> Key: SPARK-43351
> URL: https://issues.apache.org/jira/browse/SPARK-43351
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: BoYang
>Priority: Major
>
> Support Spark Connect client side in Go programming language 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43332) Allow ChannelBuilder extensions

2023-05-01 Thread Martin Grund (Jira)
Martin Grund created SPARK-43332:


 Summary: Allow ChannelBuilder extensions
 Key: SPARK-43332
 URL: https://issues.apache.org/jira/browse/SPARK-43332
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Allow to make the channelbuilder extensible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43249) df.sql() should send metrics back()

2023-04-24 Thread Martin Grund (Jira)
Martin Grund created SPARK-43249:


 Summary: df.sql() should send metrics back()
 Key: SPARK-43249
 URL: https://issues.apache.org/jira/browse/SPARK-43249
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


df.sql() does not return the metrics to the client when executed as a command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41628) Support async query execution

2023-04-01 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707580#comment-17707580
 ] 

Martin Grund commented on SPARK-41628:
--

I think this needs a bit more discussion. Generally, I think it would be 
possible to model the asynchronous execution using the existing model. A couple 
of things would be needed up front:

1. `ExecutePlanRequest` needs an execution mode to indicate if the client wants 
a blocking request or non-blocking, this should probably be an enum
2. `ExecutePlanResponse` needs to indicate the query ID to be able to resume 
the query to block and fetch results
3. We need a way to check the status, but this could probably be modeled as a 
`Command` (e.g. QueryStatusCommand)

Once this API is specced out, the next step is to identify how to perform the 
query execution in the background so that the results can be fetched when 
available.

My suggestion would be to prepare a small doc on what exactly you're doing so 
that we can have a discussion. Feel free to do the design for this in a README 
in a pull request if this is preferred.

> Support async query execution
> -
>
> Key: SPARK-41628
> URL: https://issues.apache.org/jira/browse/SPARK-41628
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Today the query execution is completely synchronous, add an additional 
> asynchronous API that allows to submit and polll for the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42853) Update the Spark Doc to match the new website style

2023-03-19 Thread Martin Grund (Jira)
Martin Grund created SPARK-42853:


 Summary: Update the Spark Doc to match the new website style
 Key: SPARK-42853
 URL: https://issues.apache.org/jira/browse/SPARK-42853
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42816) Increase max message size to 128MB

2023-03-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-42816:


 Summary: Increase max message size to 128MB
 Key: SPARK-42816
 URL: https://issues.apache.org/jira/browse/SPARK-42816
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Support messages up to 128MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42733) df.write.format().save() should support calling with no path or table name

2023-03-09 Thread Martin Grund (Jira)
Martin Grund created SPARK-42733:


 Summary: df.write.format().save() should support calling with no 
path or table name
 Key: SPARK-42733
 URL: https://issues.apache.org/jira/browse/SPARK-42733
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


When calling `session.range(5).write.format("xxx").options().save()` Spark 
Connect currently throws an assertion error because it expects that either path 
or tableName are present. According to our current PySpark implementation that 
is not necessary though.

   
{code:python}
if format is not None:
self.format(format)
if path is None:
self._jwrite.save()
else:
self._jwrite.save(path)
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42374) User-facing documentaiton

2023-02-14 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688394#comment-17688394
 ] 

Martin Grund commented on SPARK-42374:
--

Yes, that is correct. There is not built-in authentication. The benefit of the 
GRPC / HTTP2 interface is that it's very easy to put a capable authenticating 
proxy in front of it so that we don't need to implement the logic in Spark 
directly, but can simply use existing infrastructure.

> User-facing documentaiton
> -
>
> Key: SPARK-42374
> URL: https://issues.apache.org/jira/browse/SPARK-42374
> Project: Spark
>  Issue Type: Documentation
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Haejoon Lee
>Priority: Major
>
> Should provide the user-facing documentation so end users how to use Spark 
> Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2023-02-14 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688393#comment-17688393
 ] 

Martin Grund commented on SPARK-39375:
--

[~tgraves] Currently the Python UDFs are implemented exactly the same way as 
they are today. In today's world, they are serialized bytes that are sent from 
the Python process via Py4J to the driver and then to the executors where 
they're deserialized and executed. The primary difference to Spark Connect is 
that we don't use Py4J anymore but leverage the protocol directly. This is 
backward compatible and allows us to make sure that we can build upon the 
existing architecture going forward. Please keep in mind that today, the Python 
process for the UDF execution is started by the executor as part of query 
execution. Depending on the setup the Python process is kept around or 
destroyed at the end of the processing. None of this behavior changed. This 
means that all of the existing applications using PySpark will simply continue 
to work.

Similarly, this means we're not changing the assumptions around the 
requirements of which Python version has to be present where. In the same way, 
the Python version on the client has to be the same as on the executor. 

The reason we did not create a design for it is that we did not change the 
semantics, the logic or the implementation. This is very similar to the way 
we're translating the Spark Connect proto API into Catalyst plans.



> SPIP: Spark Connect - A client and server interface for Apache Spark
> 
>
> Key: SPARK-39375
> URL: https://issues.apache.org/jira/browse/SPARK-39375
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Critical
>  Labels: SPIP
>
> Please find the full document for discussion here: [Spark Connect 
> SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
>  Below, we have just referenced the introduction.
> h2. What are you trying to do?
> While Spark is used extensively, it was designed nearly a decade ago, which, 
> in the age of serverless computing and ubiquitous programming language use, 
> poses a number of limitations. Most of the limitations stem from the tightly 
> coupled Spark driver architecture and fact that clusters are typically shared 
> across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark 
> driver runs both the client application and scheduler, which results in a 
> heavyweight architecture that requires proximity to the cluster. There is no 
> built-in capability to  remotely connect to a Spark cluster in languages 
> other than SQL and users therefore rely on external solutions such as the 
> inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich 
> developer experience{*}: The current architecture and APIs do not cater for 
> interactive data exploration (as done with Notebooks), or allow for building 
> out rich developer experience common in modern code editors. (3) 
> {*}Stability{*}: with the current shared driver architecture, users causing 
> critical exceptions (e.g. OOM) bring the whole cluster down for all users. 
> (4) {*}Upgradability{*}: the current entangling of platform and client APIs 
> (e.g. first and third-party dependencies in the classpath) does not allow for 
> seamless upgrades between Spark versions (and with that, hinders new feature 
> adoption).
>  
> We propose to overcome these challenges by building on the DataFrame API and 
> the underlying unresolved logical plans. The DataFrame API is widely used and 
> makes it very easy to iteratively express complex logic. We will introduce 
> {_}Spark Connect{_}, a remote option of the DataFrame API that separates the 
> client from the Spark server. With Spark Connect, Spark will become 
> decoupled, allowing for built-in remote connectivity: The decoupled client 
> SDK can be used to run interactive data exploration and connect to the server 
> for DataFrame operations. 
>  
> Spark Connect will benefit Spark developers in different ways: The decoupled 
> architecture will result in improved stability, as clients are separated from 
> the driver. From the Spark Connect client perspective, Spark will be (almost) 
> versionless, and thus enable seamless upgradability, as server APIs can 
> evolve without affecting the client API. The decoupled client-server 
> architecture can be leveraged to build close integrations with local 
> developer tooling. Finally, separating the client process from the Spark 
> server process will improve Spark’s overall security posture by avoiding the 
> tight coupling of the client inside the Spark runtime 

[jira] [Created] (SPARK-42156) Support client-side retries in Spark Connect Python client

2023-01-22 Thread Martin Grund (Jira)
Martin Grund created SPARK-42156:


 Summary: Support client-side retries in Spark Connect Python client
 Key: SPARK-42156
 URL: https://issues.apache.org/jira/browse/SPARK-42156
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42029) Distribution build for Spark Connect does not work with Spark Shell

2023-01-12 Thread Martin Grund (Jira)
Martin Grund created SPARK-42029:


 Summary: Distribution build for Spark Connect does not work with 
Spark Shell
 Key: SPARK-42029
 URL: https://issues.apache.org/jira/browse/SPARK-42029
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42028) Support Pandas DF to Spark DF with Nanosecond Timestamps

2023-01-12 Thread Martin Grund (Jira)
Martin Grund created SPARK-42028:


 Summary: Support Pandas DF to Spark DF with Nanosecond Timestamps
 Key: SPARK-42028
 URL: https://issues.apache.org/jira/browse/SPARK-42028
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42027) CreateDataframe from Pandas with Struct and Timestamp

2023-01-12 Thread Martin Grund (Jira)
Martin Grund created SPARK-42027:


 Summary: CreateDataframe from Pandas with Struct and Timestamp
 Key: SPARK-42027
 URL: https://issues.apache.org/jira/browse/SPARK-42027
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


The following should be supported and correctly truncate the nanosecond 
timestamps.

{code:python}
from datetime import datetime, timezone, timedelta
from pandas import Timestamp

ts=Timestamp(year=2019, month=1, day=1, nanosecond=500, 
tz=timezone(timedelta(hours=-8)))

d = pd.DataFrame({"col1": [1], "col2": [{"a":1, "b":2.32, "c":ts}]})
spark.createDataFrame(d).collect()

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41919) Unify the schema or datatype in protos

2023-01-06 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655346#comment-17655346
 ] 

Martin Grund commented on SPARK-41919:
--

There is a compatible way of doing this by adding another field to the oneof 
that represents the string schema and marking the other fields as reserved.

> Unify the schema or datatype in protos
> --
>
> Key: SPARK-41919
> URL: https://issues.apache.org/jira/browse/SPARK-41919
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> this ticket only focus on the protos sent from client to server.
> we normally use 
> {code:java}
>   oneof schema {
> DataType datatype = 2;
> // Server will use Catalyst parser to parse this string to DataType.
> string datatype_str = 3;
>   }
> {code}
> to represent a schema or datatype.
> actually, we can simplify it with just a string. In the server, we can easily 
> parse a DDL-formatted schema or a JSON formatted one.
> {code:java}
>   // (Optional) The schema of local data.
>   // It should be either a DDL-formatted type string or a JSON string.
>   //
>   // The server side will update the column names and data types according to 
> this schema.
>   // If the 'data' is not provided, then this schema will be required.
>   optional string schema = 2;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41918) Refine the naming in proto messages

2023-01-06 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655345#comment-17655345
 ] 

Martin Grund commented on SPARK-41918:
--

Renaming fields is WIRE compatible and most likely this is going to be the 
preferred way of compatibility for the protos.

> Refine the naming in proto messages
> ---
>
> Key: SPARK-41918
> URL: https://issues.apache.org/jira/browse/SPARK-41918
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> normally, we name the fields after the corresponding LogiclalPlan or 
> DataFrame API, but they are not consistent in protos, for example, the column 
> name:
> {code:java}
>   message UnresolvedRegex {
> // (Required) The column name used to extract column with regex.
> string col_name = 1;
>   }
> {code}
> {code:java}
>   message Alias {
> // (Required) The expression that alias will be added on.
> Expression expr = 1;
> // (Required) a list of name parts for the alias.
> //
> // Scalar columns only has one name that presents.
> repeated string name = 2;
> // (Optional) Alias metadata expressed as a JSON map.
> optional string metadata = 3;
>   }
> {code}
> {code:java}
> // Relation of type [[Deduplicate]] which have duplicate rows removed, could 
> consider either only
> // the subset of columns or all the columns.
> message Deduplicate {
>   // (Required) Input relation for a Deduplicate.
>   Relation input = 1;
>   // (Optional) Deduplicate based on a list of column names.
>   //
>   // This field does not co-use with `all_columns_as_keys`.
>   repeated string column_names = 2;
>   // (Optional) Deduplicate based on all the columns of the input relation.
>   //
>   // This field does not co-use with `column_names`.
>   optional bool all_columns_as_keys = 3;
> }
> {code}
> {code:java}
> // Computes basic statistics for numeric and string columns, including count, 
> mean, stddev, min,
> // and max. If no columns are given, this function computes statistics for 
> all numerical or
> // string columns.
> message StatDescribe {
>   // (Required) The input relation.
>   Relation input = 1;
>   // (Optional) Columns to compute statistics on.
>   repeated string cols = 2;
> }
> {code}
> we probably should unify the naming:
> single column -> `column`
> multi columns -> `columns`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41911) Add version fields to Connect proto

2023-01-06 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655343#comment-17655343
 ] 

Martin Grund commented on SPARK-41911:
--

I think the first part would be to identify which messages would need a 
version, but the version does not need to be the first field in the proto.

> Add version fields to Connect proto
> ---
>
> Key: SPARK-41911
> URL: https://issues.apache.org/jira/browse/SPARK-41911
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>
> We may need this to help maintain compatibility. Depending on the concrete 
> protocol design, we may use field number 1 for version fields thus may cause 
> breaking changes on existing proto messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41910) Remove `optional` notation in proto

2023-01-06 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655336#comment-17655336
 ] 

Martin Grund commented on SPARK-41910:
--

Didn't we just have the discussion on why we wanted to use optional? In 
particular for scalar values that have a default value?

> Remove `optional` notation in proto
> ---
>
> Key: SPARK-41910
> URL: https://issues.apache.org/jira/browse/SPARK-41910
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>
> Every field in proto3 has a default value. We should revisit existing proto 
> field to understand if the default value can be used without tell the field 
> is set and not set, and remove `optional` as much as possible from Spark 
> Connect proto surface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41755) Reorder fields to use consecutive field numbers

2023-01-06 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655334#comment-17655334
 ] 

Martin Grund commented on SPARK-41755:
--

While not obvious, this is not needed. We can just add a manual ignore for 
28-89 and keep moving.

> Reorder fields to use consecutive field numbers
> ---
>
> Key: SPARK-41755
> URL: https://issues.apache.org/jira/browse/SPARK-41755
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> make IDs consecutive
> ```
> RepartitionByExpression repartition_by_expression = 27;
> // NA functions
> NAFill fill_na = 90;
> NADrop drop_na = 91;
> NAReplace replace = 92;
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41812) DataFrame.join: ambiguous column

2023-01-02 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653649#comment-17653649
 ] 

Martin Grund commented on SPARK-41812:
--

On the Spark side we eagerly resolve the column and return a full named 
expression.


{code:scala}
Column(addDataFrameIdToCol(resolve(colName)))

private[sql] def resolve(colName: String): NamedExpression = {
val resolver = sparkSession.sessionState.analyzer.resolver
queryExecution.analyzed.resolveQuoted(colName, resolver)
  .getOrElse(throw resolveException(colName, schema.fieldNames))
  }
{code}

To avoid too many round-trips we should probably inject the Dataframe ID and 
column position properties in the metadata to perfom the resolution later on 
the server.


> DataFrame.join: ambiguous column
> 
>
> Key: SPARK-41812
> URL: https://issues.apache.org/jira/browse/SPARK-41812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df1.join(df2, df1["value"] == df2["value"]).count()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df1.join(df2, df1["value"] == df2["value"]).count()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in 
> count
> pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, 
> in toPandas
> return self._session.client.to_pandas(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in 
> to_pandas
> return self._execute_and_fetch(req)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in 
> _execute_and_fetch
> self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in 
> _handle_error
> raise SparkConnectAnalysisException(
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, 
> `value`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-41815) Column.isNull returns nan instead of None

2023-01-02 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653643#comment-17653643
 ] 

Martin Grund edited comment on SPARK-41815 at 1/2/23 3:56 PM:
--

The reason seems to be that when Spark Connect serializes the data to Arrow 
it's serialized as `null` and when converted to pandas it will convert this to 
the `np.nan` instead of `None`. It seems that we should manually convert `nan` 
to `None` for the `Row` type.


was (Author: JIRAUSER290467):
The reason seems to be that when Spark Connect serializes the data to Arrow 
it's serialized as `null` and when converted to pandas it will convert this to 
the `np.nan` instead of `None`. It seems that we should manually convert `nan` 
to `None` for the `Row` type.

> Column.isNull returns nan instead of None
> -
>
> Key: SPARK-41815
> URL: https://issues.apache.org/jira/browse/SPARK-41815
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 99, in 
> pyspark.sql.connect.column.Column.isNull
> Failed example:
> df.filter(df.height.isNull()).collect()
> Expected:
> [Row(name='Alice', height=None)]
> Got:
> [Row(name='Alice', height=nan)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41815) Column.isNull returns nan instead of None

2023-01-02 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653643#comment-17653643
 ] 

Martin Grund commented on SPARK-41815:
--

The reason seems to be that when Spark Connect serializes the data to Arrow 
it's serialized as `null` and when converted to pandas it will convert this to 
the `np.nan` instead of `None`. It seems that we should manually convert `nan` 
to `None` for the `Row` type.

> Column.isNull returns nan instead of None
> -
>
> Key: SPARK-41815
> URL: https://issues.apache.org/jira/browse/SPARK-41815
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 99, in 
> pyspark.sql.connect.column.Column.isNull
> Failed example:
> df.filter(df.height.isNull()).collect()
> Expected:
> [Row(name='Alice', height=None)]
> Got:
> [Row(name='Alice', height=nan)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41803) log() function variations are missing

2022-12-31 Thread Martin Grund (Jira)
Martin Grund created SPARK-41803:


 Summary: log() function variations are missing
 Key: SPARK-41803
 URL: https://issues.apache.org/jira/browse/SPARK-41803
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41743) groupBy(...).agg(...).sort does not actually sort the output

2022-12-29 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652936#comment-17652936
 ] 

Martin Grund commented on SPARK-41743:
--

Running the doc tests actually passes for me.

> groupBy(...).agg(...).sort does not actually sort the output
> 
>
> Key: SPARK-41743
> URL: https://issues.apache.org/jira/browse/SPARK-41743
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> **
> File "/.../spark/python/pyspark/sql/connect/group.py", line 211, in 
> pyspark.sql.connect.group.GroupedData.agg
> Failed example:
> df.groupBy(df.name).agg(F.min(df.age)).sort("name").show()
> Differences (ndiff with -expected +actual):
>   +-++
>   | name|min(age)|
>   +-++
> + |  Bob|   5|
>   |Alice|   2|
> - |  Bob|   5|
>   +-++
> + 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-41743) groupBy(...).agg(...).sort does not actually sort the output

2022-12-29 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652934#comment-17652934
 ] 

Martin Grund edited comment on SPARK-41743 at 12/29/22 8:24 PM:


In the following example, I cannot reproduce this:


{code:python}
df =  spark.createDataFrame([{"age":10, "name": "Martin"},{"age":11, "name": 
"Anton"}])
df.select(df["_1"].alias("age"), 
df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).sort("name").show()
{code}

produces



{noformat}
+--++
|  name|min(age)|
+--++
| Anton|  11|
|Martin|  10|
+--++
{noformat}

vs


{code:python}
df.select(df["_1"].alias("age"), 
df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).show()
{code}

produces


{noformat}
+--++
|  name|min(age)|
+--++
|Martin|  10|
| Anton|  11|
+--++
{noformat}




was (Author: JIRAUSER290467):
In the following example, I cannot reproduce this:

```
df =  spark.createDataFrame([{"age":10, "name": "Martin"},{"age":11, "name": 
"Anton"}])
df.select(df["_1"].alias("age"), 
df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).sort("name").show()
```

produces

```
+--++
|  name|min(age)|
+--++
| Anton|  11|
|Martin|  10|
+--++
```


vs

```
 df.select(df["_1"].alias("age"), 
df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).show()
```

produces

```
+--++
|  name|min(age)|
+--++
|Martin|  10|
| Anton|  11|
+--++
```

> groupBy(...).agg(...).sort does not actually sort the output
> 
>
> Key: SPARK-41743
> URL: https://issues.apache.org/jira/browse/SPARK-41743
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> **
> File "/.../spark/python/pyspark/sql/connect/group.py", line 211, in 
> pyspark.sql.connect.group.GroupedData.agg
> Failed example:
> df.groupBy(df.name).agg(F.min(df.age)).sort("name").show()
> Differences (ndiff with -expected +actual):
>   +-++
>   | name|min(age)|
>   +-++
> + |  Bob|   5|
>   |Alice|   2|
> - |  Bob|   5|
>   +-++
> + 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41743) groupBy(...).agg(...).sort does not actually sort the output

2022-12-29 Thread Martin Grund (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652934#comment-17652934
 ] 

Martin Grund commented on SPARK-41743:
--

In the following example, I cannot reproduce this:

```
df =  spark.createDataFrame([{"age":10, "name": "Martin"},{"age":11, "name": 
"Anton"}])
df.select(df["_1"].alias("age"), 
df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).sort("name").show()
```

produces

```
+--++
|  name|min(age)|
+--++
| Anton|  11|
|Martin|  10|
+--++
```


vs

```
 df.select(df["_1"].alias("age"), 
df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).show()
```

produces

```
+--++
|  name|min(age)|
+--++
|Martin|  10|
| Anton|  11|
+--++
```

> groupBy(...).agg(...).sort does not actually sort the output
> 
>
> Key: SPARK-41743
> URL: https://issues.apache.org/jira/browse/SPARK-41743
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> **
> File "/.../spark/python/pyspark/sql/connect/group.py", line 211, in 
> pyspark.sql.connect.group.GroupedData.agg
> Failed example:
> df.groupBy(df.name).agg(F.min(df.age)).sort("name").show()
> Differences (ndiff with -expected +actual):
>   +-++
>   | name|min(age)|
>   +-++
> + |  Bob|   5|
>   |Alice|   2|
> - |  Bob|   5|
>   +-++
> + 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41738) Client ID should be mixed into SparkSession cache

2022-12-27 Thread Martin Grund (Jira)
Martin Grund created SPARK-41738:


 Summary: Client ID should be mixed into SparkSession cache
 Key: SPARK-41738
 URL: https://issues.apache.org/jira/browse/SPARK-41738
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41664) Support streaming client data to create large DataFrames

2022-12-21 Thread Martin Grund (Jira)
Martin Grund created SPARK-41664:


 Summary: Support streaming client data to create large DataFrames
 Key: SPARK-41664
 URL: https://issues.apache.org/jira/browse/SPARK-41664
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Support client side streaming to support creation of large DataFrames from the 
client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41662) Minimal support for pickled Python UDFs

2022-12-21 Thread Martin Grund (Jira)
Martin Grund created SPARK-41662:


 Summary: Minimal support for pickled Python UDFs
 Key: SPARK-41662
 URL: https://issues.apache.org/jira/browse/SPARK-41662
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Minimal support for UDFs as part of queries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41661) Support for Python UDFs

2022-12-21 Thread Martin Grund (Jira)
Martin Grund created SPARK-41661:


 Summary: Support for Python UDFs
 Key: SPARK-41661
 URL: https://issues.apache.org/jira/browse/SPARK-41661
 Project: Spark
  Issue Type: Umbrella
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Spark Connect should support Python UDFs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41629) Support for protocol extensions

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41629:


 Summary: Support for protocol extensions
 Key: SPARK-41629
 URL: https://issues.apache.org/jira/browse/SPARK-41629
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Spark comes with many different extension points. Many of those simply become 
available through the shared classpath between Spark and the user application. 
To be able to support arbitrary plugins e.g. for Delta or Iceberg, we need a 
way to make the Spark Connect protocol extensible and let users register their 
own handlers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41625) Feature parity: Streaming support

2022-12-20 Thread Martin Grund (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grund updated SPARK-41625:
-
Description: We need to design how support for structured streaming will 
look like in Spark Connect.

> Feature parity: Streaming support
> -
>
> Key: SPARK-41625
> URL: https://issues.apache.org/jira/browse/SPARK-41625
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> We need to design how support for structured streaming will look like in 
> Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41628) Support async query execution

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41628:


 Summary: Support async query execution
 Key: SPARK-41628
 URL: https://issues.apache.org/jira/browse/SPARK-41628
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Today the query execution is completely synchronous, add an additional 
asynchronous API that allows to submit and polll for the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41627) Spark Connect Server Development

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41627:


 Summary: Spark Connect Server Development
 Key: SPARK-41627
 URL: https://issues.apache.org/jira/browse/SPARK-41627
 Project: Spark
  Issue Type: Umbrella
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41626) Document validation choices for local and remote input validation

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41626:


 Summary: Document validation choices for local and remote input 
validation
 Key: SPARK-41626
 URL: https://issues.apache.org/jira/browse/SPARK-41626
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41625) Feature parity: Streaming support

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41625:


 Summary: Feature parity: Streaming support
 Key: SPARK-41625
 URL: https://issues.apache.org/jira/browse/SPARK-41625
 Project: Spark
  Issue Type: Umbrella
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41624) Support Python logging

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41624:


 Summary: Support Python logging 
 Key: SPARK-41624
 URL: https://issues.apache.org/jira/browse/SPARK-41624
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Since the Spark Connect client cannot leverage the JVM based logging we need to 
add additional instrumentation to make sure we provide enough insights for 
users to understand and debug errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41623) Support Catalog.uncacheTable

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41623:


 Summary: Support Catalog.uncacheTable
 Key: SPARK-41623
 URL: https://issues.apache.org/jira/browse/SPARK-41623
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41620) Support Catalog.registerFunction

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41620:


 Summary: Support Catalog.registerFunction
 Key: SPARK-41620
 URL: https://issues.apache.org/jira/browse/SPARK-41620
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41619) Support Catalog.refreshTable

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41619:


 Summary: Support Catalog.refreshTable
 Key: SPARK-41619
 URL: https://issues.apache.org/jira/browse/SPARK-41619
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41622) Support Catalog.setCurrentDatabase

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41622:


 Summary: Support Catalog.setCurrentDatabase
 Key: SPARK-41622
 URL: https://issues.apache.org/jira/browse/SPARK-41622
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41621) Support Catalog.setCurrentCatalog

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41621:


 Summary: Support Catalog.setCurrentCatalog
 Key: SPARK-41621
 URL: https://issues.apache.org/jira/browse/SPARK-41621
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41618) Support Catalog.recoverPartitions

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41618:


 Summary: Support Catalog.recoverPartitions
 Key: SPARK-41618
 URL: https://issues.apache.org/jira/browse/SPARK-41618
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41617) Support Catalog.listTables

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41617:


 Summary: Support Catalog.listTables
 Key: SPARK-41617
 URL: https://issues.apache.org/jira/browse/SPARK-41617
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41616) Support Catalog.listFunctions

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41616:


 Summary: Support Catalog.listFunctions
 Key: SPARK-41616
 URL: https://issues.apache.org/jira/browse/SPARK-41616
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41614) Support Catalog.listColumns

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41614:


 Summary: Support Catalog.listColumns
 Key: SPARK-41614
 URL: https://issues.apache.org/jira/browse/SPARK-41614
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41615) Support Catalog.listDatabases

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41615:


 Summary: Support Catalog.listDatabases
 Key: SPARK-41615
 URL: https://issues.apache.org/jira/browse/SPARK-41615
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41610) Support Catalog.getFunction

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41610:


 Summary: Support Catalog.getFunction
 Key: SPARK-41610
 URL: https://issues.apache.org/jira/browse/SPARK-41610
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41612) Support Catalog.isCached

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41612:


 Summary: Support Catalog.isCached
 Key: SPARK-41612
 URL: https://issues.apache.org/jira/browse/SPARK-41612
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41613) Support Catalog.listCatalogs

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41613:


 Summary: Support Catalog.listCatalogs
 Key: SPARK-41613
 URL: https://issues.apache.org/jira/browse/SPARK-41613
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41611) Support Catalog.getTable

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41611:


 Summary: Support Catalog.getTable
 Key: SPARK-41611
 URL: https://issues.apache.org/jira/browse/SPARK-41611
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41608) Support Catalog.functionExists

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41608:


 Summary: Support Catalog.functionExists
 Key: SPARK-41608
 URL: https://issues.apache.org/jira/browse/SPARK-41608
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41609) Support Catalog.getDatabase

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41609:


 Summary: Support Catalog.getDatabase
 Key: SPARK-41609
 URL: https://issues.apache.org/jira/browse/SPARK-41609
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41607) Support Catalog.dropTempView

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41607:


 Summary: Support Catalog.dropTempView
 Key: SPARK-41607
 URL: https://issues.apache.org/jira/browse/SPARK-41607
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41604) Support Catalog.currentCatalog

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41604:


 Summary: Support Catalog.currentCatalog
 Key: SPARK-41604
 URL: https://issues.apache.org/jira/browse/SPARK-41604
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41606) Support Catalog dropGlobalTempView

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41606:


 Summary: Support Catalog dropGlobalTempView
 Key: SPARK-41606
 URL: https://issues.apache.org/jira/browse/SPARK-41606
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41605) Support Catalog.currentDatabase

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41605:


 Summary: Support Catalog.currentDatabase
 Key: SPARK-41605
 URL: https://issues.apache.org/jira/browse/SPARK-41605
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41602) Support Catalog.createExternalTable

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41602:


 Summary: Support Catalog.createExternalTable
 Key: SPARK-41602
 URL: https://issues.apache.org/jira/browse/SPARK-41602
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41603) Support Catalog.createTable

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41603:


 Summary: Support Catalog.createTable
 Key: SPARK-41603
 URL: https://issues.apache.org/jira/browse/SPARK-41603
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41601) Support Catalog.clearCache

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41601:


 Summary: Support Catalog.clearCache
 Key: SPARK-41601
 URL: https://issues.apache.org/jira/browse/SPARK-41601
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41600) Support Catalog.cacheTable

2022-12-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-41600:


 Summary: Support Catalog.cacheTable
 Key: SPARK-41600
 URL: https://issues.apache.org/jira/browse/SPARK-41600
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41560) Document how to add new functions

2022-12-17 Thread Martin Grund (Jira)
Martin Grund created SPARK-41560:


 Summary: Document how to add new functions
 Key: SPARK-41560
 URL: https://issues.apache.org/jira/browse/SPARK-41560
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Please add documentation that outlines how to add support for a new function 
based on the concept of unresolved functions and documentation for the support 
of functions using based on the approach used for case/when and lambda.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41537) Protobuf backwards compatibility testing

2022-12-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-41537:


 Summary: Protobuf backwards compatibility testing
 Key: SPARK-41537
 URL: https://issues.apache.org/jira/browse/SPARK-41537
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41533) GRPC Errors on the client should be cleaned up

2022-12-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-41533:


 Summary: GRPC Errors on the client should be cleaned up
 Key: SPARK-41533
 URL: https://issues.apache.org/jira/browse/SPARK-41533
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


When the server throws an exception we report a very deep stack trace that is 
not helpful for the user. 

We need to separate the cause from the user visible exception and wrap the 
error into custom exception instead of publishing the RPCError from GRPC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41532) DF operations that involve multiple data frames should fail if sessions don't match

2022-12-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-41532:


 Summary: DF operations that involve multiple data frames should 
fail if sessions don't match
 Key: SPARK-41532
 URL: https://issues.apache.org/jira/browse/SPARK-41532
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


We do not support joining for example two data frames from different Spark 
Connect Sessions. To avoid exceptions, the client should clearly fail when it 
tries to construct such a composition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41531) Debugging and Stability

2022-12-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-41531:


 Summary: Debugging and Stability
 Key: SPARK-41531
 URL: https://issues.apache.org/jira/browse/SPARK-41531
 Project: Spark
  Issue Type: Umbrella
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Umbrella JIRA for items on debugging, logging and stability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41366) DF.groupby.agg() API should be compatible

2022-12-02 Thread Martin Grund (Jira)
Martin Grund created SPARK-41366:


 Summary: DF.groupby.agg() API should be compatible
 Key: SPARK-41366
 URL: https://issues.apache.org/jira/browse/SPARK-41366
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41362) Better type errors when passing wrong parameters

2022-12-02 Thread Martin Grund (Jira)
Martin Grund created SPARK-41362:


 Summary: Better type errors when passing wrong parameters
 Key: SPARK-41362
 URL: https://issues.apache.org/jira/browse/SPARK-41362
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Throw better error messages when passing the wrong types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >