[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So we could define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we could wrap the function to a new class

 

  was:
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we could wrap the function to a new class

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So we could define a new interface on the streaming query manager, maybe 
> called refreshListener(listener), e.g. use it as 
> `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
> For `foreachBatch`, we could wrap the function to a new class
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we could wrap the function to a new class

 

  was:
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we might could wrap the function to a new class, like the 
listener case

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So if we define a new interface on the streaming query manager, maybe called 
> refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
> For `foreachBatch`, we could wrap the function to a new class
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we might could wrap the function to a new class, like the 
listener case

 

  was:
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

 

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So if we define a new interface on the streaming query manager, maybe called 
> refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
> For `foreachBatch`, we might could wrap the function to a new class, like the 
> listener case
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

 

 

  was:
I’m thinking of an improvement for python listener and foreachBatch. Currently 
if you define a variable outside of the function, you can’t actually see it on 
client side if it’s touched in the function because the operation is on server 
side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

 

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So if we define a new interface on the streaming query manager, maybe called 
> refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for python listener and foreachBatch. Currently 
if you define a variable outside of the function, you can’t actually see it on 
client side if it’s touched in the function because the operation is on server 
side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

 

 

  was:
I’m thinking of an improvement for python listener and foreachBatch. Currently 
if you define a variable outside of the function, you can’t actually see it on 
client side if it’s touched in the function because the operation is on server 
side, e.g. 

```

x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200

spark.streams.addListener(MyListener())
q = spark.readStream.start()

# x is still 0, self.y is not defined

```

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`




```

def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener

 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y

```

 

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So if we define a new interface on the streaming query manager, maybe called 
> refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org