[jira] [Reopened] (SPARK-47986) [CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server

2024-05-08 Thread Niranjan Jayakar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan Jayakar reopened SPARK-47986:
--

This issue was not fully resolved by the previous pull request.

A follow up fix is here: https://github.com/apache/spark/pull/46435

> [CONNECT][PYTHON] Unable to create a new session when the default session is 
> closed by the server
> -
>
> Key: SPARK-47986
> URL: https://issues.apache.org/jira/browse/SPARK-47986
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When the server closes a session, usually after a cluster restart, the client 
> is unaware of this until it receives an error.
> Once it does so, there is no way for the client to create a new session since 
> the stale sessions are still recorded as default and active sessions.
> The only solution currently is to restart the Python interpreter on the 
> client, or to reach into the session builder and change the active or default 
> session.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47544) [Pyspark] SparkSession builder method is incompatible with vs code intellisense

2024-03-25 Thread Niranjan Jayakar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan Jayakar updated SPARK-47544:
-
Attachment: old.mov

> [Pyspark] SparkSession builder method is incompatible with vs code 
> intellisense
> ---
>
> Key: SPARK-47544
> URL: https://issues.apache.org/jira/browse/SPARK-47544
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Niranjan Jayakar
>Priority: Major
> Attachments: old.mov
>
>
> VS code's intellisense is unable to recognize the methods under 
> `SparkSession.builder`.
>  
> See attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47544) [Pyspark] SparkSession builder method is incompatible with vs code intellisense

2024-03-25 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-47544:


 Summary: [Pyspark] SparkSession builder method is incompatible 
with vs code intellisense
 Key: SPARK-47544
 URL: https://issues.apache.org/jira/browse/SPARK-47544
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Niranjan Jayakar


VS code's intellisense is unable to recognize the methods under 
`SparkSession.builder`.

 

See attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46265) New assertions in AddArtifact RPC make the connect client incompatible with older clusters

2023-12-05 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-46265:


 Summary: New assertions in AddArtifact RPC make the connect client 
incompatible with older clusters
 Key: SPARK-46265
 URL: https://issues.apache.org/jira/browse/SPARK-46265
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Niranjan Jayakar


We added new assertions to the AddArtifact RPC - 
[https://github.com/apache/spark/commit/d9c5f9d6#diff-d4744c7abd099c57d04746140aba3c20b93f1ac011f5915f963e0a3e0758690e]

As part of this change, we have also updated the RPC implementation to return 
session id as part of its response.

 

Since the assertion depends on the session id to be present, it makes the 
protocol incompatible such that newer Connect clients that apply this assertion 
are incompatible with older clusters that don't have the corresponding service 
side changes.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails

2023-11-23 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-46074:


 Summary: [CONNECT][SCALA] Insufficient details in error when a UDF 
fails
 Key: SPARK-46074
 URL: https://issues.apache.org/jira/browse/SPARK-46074
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Niranjan Jayakar


Currently, when a UDF fails the connect client does not receive the actual 
error that caused the failure. 

As an example, the error message looks like -
{code:java}
Exception in thread "main" org.apache.spark.SparkException: 
grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage 
failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 
in stage 0.0 (TID 10) (10.68.141.158 executor 0): 
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). SQLSTATE: 
39000 {code}
In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44816) Cryptic error message when UDF associated class is not found

2023-08-15 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-44816:


 Summary: Cryptic error message when UDF associated class is not 
found
 Key: SPARK-44816
 URL: https://issues.apache.org/jira/browse/SPARK-44816
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Niranjan Jayakar


When a Dataset API is used that either requires or is modeled as a UDF, the 
class defining the UDF/function should be uploaded to the service fist using 
the `addArtifact()` API.

When this is not done, an error is thrown. However, this error message is 
cryptic and is not clear about the problem.

Improve this error message to make it clear that an expected class was not 
found.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44291) [CONNECT][SCALA] range query returns incorrect schema

2023-07-03 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-44291:


 Summary: [CONNECT][SCALA] range query returns incorrect schema
 Key: SPARK-44291
 URL: https://issues.apache.org/jira/browse/SPARK-44291
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.1
Reporter: Niranjan Jayakar


The following code on Spark Connect produces the following output

Code:

 
{code:java}
val df = spark.range(3)

df.show()
df.printSchema(){code}
 

Output:
{code:java}
+---+
| id|
+---+
|  0|
|  1|
|  2|
+---+

root
 |-- value: long (nullable = true) {code}
The mismatch is that one shows the column as "id" while the other shows this as 
"value".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43457) [PYTHON][CONNECT] user agent should include the OS and Python versions

2023-05-11 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-43457:


 Summary: [PYTHON][CONNECT] user agent should include the OS and 
Python versions
 Key: SPARK-43457
 URL: https://issues.apache.org/jira/browse/SPARK-43457
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Niranjan Jayakar


Including OS and Python versions in the user agent improves tracking to see how 
Spark Connect is used across Python versions and the different platforms it's 
used from



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43456) [SCALA][CONNECT] user agent should include the OS and Python versions

2023-05-11 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-43456:


 Summary: [SCALA][CONNECT] user agent should include the OS and 
Python versions
 Key: SPARK-43456
 URL: https://issues.apache.org/jira/browse/SPARK-43456
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Niranjan Jayakar


Including OS and Python versions in the user agent improves tracking to see how 
Spark Connect is used across Python versions and the different platforms it's 
used from



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43192) Spark connect's user agent validations are too restrictive

2023-04-19 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-43192:


 Summary: Spark connect's user agent validations are too restrictive
 Key: SPARK-43192
 URL: https://issues.apache.org/jira/browse/SPARK-43192
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Niranjan Jayakar


The current restriction on allowed charset and length are too restrictive

 

https://github.com/apache/spark/blob/cac6f58318bb84d532f02d245a50d3c66daa3e4b/python/pyspark/sql/connect/client.py#L274-L275



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43172) Expose host and bearer tokens from the spark connect client

2023-04-18 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-43172:


 Summary: Expose host and bearer tokens from the spark connect 
client
 Key: SPARK-43172
 URL: https://issues.apache.org/jira/browse/SPARK-43172
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Niranjan Jayakar


The `SparkConnectClient` class takes in a connection string to connect with the 
spark connect service.

 

As part of setting up the connection, it parses the connection string. Expose 
the parsed host and bearer tokens as part of the class, so they may be accessed 
by consumers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42502) scala: accept user_agent in spark connect's connection string

2023-02-20 Thread Niranjan Jayakar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan Jayakar updated SPARK-42502:
-
Description: 
Currently, the Spark Connect service's {{client_type}} attribute (which is 
really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark.

Accept an optional {{user_agent}} parameter in the connection string and plumb 
this down to the Spark Connect service.

This enables partners using Spark Connect to set their application as the user 
agent,
which then allows visibility and measurement of integrations and usages of spark
connect.

This is already done for the Python client: 
https://github.com/apache/spark/commit/b887d3de954ae5b2482087fe08affcc4ac60c669

  was:
Currently, the Spark Connect service's {{client_type}} attribute (which is 
really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark.

Accept an optional {{user_agent}} parameter in the connection string and plumb 
this down to the Spark Connect service.

This enables partners using Spark Connect to set their application as the user 
agent,
which then allows visibility and measurement of integrations and usages of spark
connect.


> scala: accept user_agent in spark connect's connection string
> -
>
> Key: SPARK-42502
> URL: https://issues.apache.org/jira/browse/SPARK-42502
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.3.2
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the Spark Connect service's {{client_type}} attribute (which is 
> really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark.
> Accept an optional {{user_agent}} parameter in the connection string and 
> plumb this down to the Spark Connect service.
> This enables partners using Spark Connect to set their application as the 
> user agent,
> which then allows visibility and measurement of integrations and usages of 
> spark
> connect.
> This is already done for the Python client: 
> https://github.com/apache/spark/commit/b887d3de954ae5b2482087fe08affcc4ac60c669



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42502) scala: accept user_agent in spark connect's connection string

2023-02-20 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-42502:


 Summary: scala: accept user_agent in spark connect's connection 
string
 Key: SPARK-42502
 URL: https://issues.apache.org/jira/browse/SPARK-42502
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.3.2
Reporter: Niranjan Jayakar
Assignee: Niranjan Jayakar
 Fix For: 3.4.0


Currently, the Spark Connect service's {{client_type}} attribute (which is 
really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark.

Accept an optional {{user_agent}} parameter in the connection string and plumb 
this down to the Spark Connect service.

This enables partners using Spark Connect to set their application as the user 
agent,
which then allows visibility and measurement of integrations and usages of spark
connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42477) python: accept user_agent in spark connect's connection string

2023-02-20 Thread Niranjan Jayakar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan Jayakar updated SPARK-42477:
-
Summary:  python: accept user_agent in spark connect's connection string  
(was:  accept user_agent in spark connect's connection string)

>  python: accept user_agent in spark connect's connection string
> ---
>
> Key: SPARK-42477
> URL: https://issues.apache.org/jira/browse/SPARK-42477
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.3.2
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the Spark Connect service's {{client_type}} attribute (which is 
> really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark.
> Accept an optional {{user_agent}} parameter in the connection string and 
> plumb this down to the Spark Connect service.
> This enables partners using Spark Connect to set their application as the 
> user agent,
> which then allows visibility and measurement of integrations and usages of 
> spark
> connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42498) reduce spark connect service retry time

2023-02-20 Thread Niranjan Jayakar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan Jayakar resolved SPARK-42498.
--
Resolution: Abandoned

> reduce spark connect service retry time
> ---
>
> Key: SPARK-42498
> URL: https://issues.apache.org/jira/browse/SPARK-42498
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2
>Reporter: Niranjan Jayakar
>Priority: Major
>
> https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411
>  
> Currently, 15 retries with the current backoff strategy result in the client 
> sitting in
> the retry loop for ~400 seconds in the worst case. This means, applications 
> and
> users using the spark connect client will hang for >6 minutes with no 
> response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42498) reduce spark connect service retry time

2023-02-20 Thread Niranjan Jayakar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan Jayakar updated SPARK-42498:
-
Summary: reduce spark connect service retry time  (was: make spark connect 
retries configurat)

> reduce spark connect service retry time
> ---
>
> Key: SPARK-42498
> URL: https://issues.apache.org/jira/browse/SPARK-42498
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2
>Reporter: Niranjan Jayakar
>Priority: Major
>
> https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411
>  
> Currently, 15 retries with the current backoff strategy result in the client 
> sitting in
> the retry loop for ~400 seconds in the worst case. This means, applications 
> and
> users using the spark connect client will hang for >6 minutes with no 
> response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42498) make spark connect retries configurat

2023-02-20 Thread Niranjan Jayakar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan Jayakar updated SPARK-42498:
-
Summary: make spark connect retries configurat  (was: reduce spark connect 
service retry time)

> make spark connect retries configurat
> -
>
> Key: SPARK-42498
> URL: https://issues.apache.org/jira/browse/SPARK-42498
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2
>Reporter: Niranjan Jayakar
>Priority: Major
>
> https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411
>  
> Currently, 15 retries with the current backoff strategy result in the client 
> sitting in
> the retry loop for ~400 seconds in the worst case. This means, applications 
> and
> users using the spark connect client will hang for >6 minutes with no 
> response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42498) reduce spark connect service retry time

2023-02-19 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-42498:


 Summary: reduce spark connect service retry time
 Key: SPARK-42498
 URL: https://issues.apache.org/jira/browse/SPARK-42498
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.3.2
Reporter: Niranjan Jayakar


https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411

 

Currently, 15 retries with the current backoff strategy result in the client 
sitting in
the retry loop for ~400 seconds in the worst case. This means, applications and
users using the spark connect client will hang for >6 minutes with no response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42477) accept user_agent in spark connect's connection string

2023-02-17 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-42477:


 Summary:  accept user_agent in spark connect's connection string
 Key: SPARK-42477
 URL: https://issues.apache.org/jira/browse/SPARK-42477
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.3.2
Reporter: Niranjan Jayakar


Currently, the Spark Connect service's {{client_type}} attribute (which is 
really user agent) is set to {{_SPARK_CONNECT_PYTHON}} to signify PySpark.

Accept an optional {{user_agent}} parameter in the connection string and plumb 
this down to the Spark Connect service.

This enables partners using Spark Connect to set their application as the user 
agent,
which then allows visibility and measurement of integrations and usages of spark
connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42106) [Pyspark] Hide parameters when re-printing user provided remote URL in REPL

2023-01-18 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-42106:


 Summary: [Pyspark] Hide parameters when re-printing user provided 
remote URL in REPL
 Key: SPARK-42106
 URL: https://issues.apache.org/jira/browse/SPARK-42106
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Niranjan Jayakar


The Spark Connect client is initialized in the PySpark REPL by using the 
{{--remote}} option. The option takes a Spark Connect endpoint URL. 

 

The URL may contain auth tokens as URL parameters or query parameters.



Hide these values when the URL is re-printed as part of the REPL start-up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org