from:"Wei Liu \(Jira\)"

[jira] [Created] (SPARK-48567) Pyspark StreamingQuery lastProgress and friend should return actual StreamingQueryProgress

2024-06-07 Thread Wei Liu (Jira)

Wei Liu created SPARK-48567:
---

 Summary: Pyspark StreamingQuery lastProgress and friend should 
return actual StreamingQueryProgress
 Key: SPARK-48567
 URL: https://issues.apache.org/jira/browse/SPARK-48567
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-05-30 Thread Wei Liu (Jira)

Wei Liu created SPARK-48482:
---

 Summary: dropDuplicates and dropDuplicatesWithinWatermark should 
accept varargs
 Key: SPARK-48482
 URL: https://issues.apache.org/jira/browse/SPARK-48482
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48480) StreamingQueryListener thread should not be interruptable

2024-05-30 Thread Wei Liu (Jira)

Wei Liu created SPARK-48480:
---

 Summary: StreamingQueryListener thread should not be interruptable
 Key: SPARK-48480
 URL: https://issues.apache.org/jira/browse/SPARK-48480
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48411) Add E2E test for DropDuplicateWithinWatermark

2024-05-24 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849348#comment-17849348
 ] 

Wei Liu commented on SPARK-48411:
-

Sorry I tagged the wrong Yuchen

> Add E2E test for DropDuplicateWithinWatermark
> -
>
> Key: SPARK-48411
> URL: https://issues.apache.org/jira/browse/SPARK-48411
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> Currently we do not have a e2e test for DropDuplicateWithinWatermark, we 
> should add one. We can simply use one of the test written in Scala here (with 
> the testStream API) and replicate it to python:
> [https://github.com/apache/spark/commit/0e9e34c1bd9bd16ad5efca77ce2763eb950f3103]
>  
> The change should happen in 
> [https://github.com/apache/spark/blob/eee179135ed21dbdd8b342d053c9eda849e2de77/python/pyspark/sql/tests/streaming/test_streaming.py#L29]
>  
> so we can test it in both connect and non-connect.
>  
> Test with:
> ```
> python/run-tests --testnames pyspark.sql.tests.streaming.test_streaming
> python/run-tests --testnames 
> pyspark.sql.tests.connect.streaming.test_parity_streaming
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48411) Add E2E test for DropDuplicateWithinWatermark

2024-05-24 Thread Wei Liu (Jira)

Wei Liu created SPARK-48411:
---

 Summary: Add E2E test for DropDuplicateWithinWatermark
 Key: SPARK-48411
 URL: https://issues.apache.org/jira/browse/SPARK-48411
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu


Currently we do not have a e2e test for DropDuplicateWithinWatermark, we should 
add one. We can simply use one of the test written in Scala here (with the 
testStream API) and replicate it to python:

[https://github.com/apache/spark/commit/0e9e34c1bd9bd16ad5efca77ce2763eb950f3103]

 

The change should happen in 
[https://github.com/apache/spark/blob/eee179135ed21dbdd8b342d053c9eda849e2de77/python/pyspark/sql/tests/streaming/test_streaming.py#L29]

 

so we can test it in both connect and non-connect.

 

Test with:

```
python/run-tests --testnames pyspark.sql.tests.streaming.test_streaming
python/run-tests --testnames 
pyspark.sql.tests.connect.streaming.test_parity_streaming
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48411) Add E2E test for DropDuplicateWithinWatermark

2024-05-24 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849202#comment-17849202
 ] 

Wei Liu commented on SPARK-48411:
-

[~liuyuchen777] is going to work on this

> Add E2E test for DropDuplicateWithinWatermark
> -
>
> Key: SPARK-48411
> URL: https://issues.apache.org/jira/browse/SPARK-48411
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> Currently we do not have a e2e test for DropDuplicateWithinWatermark, we 
> should add one. We can simply use one of the test written in Scala here (with 
> the testStream API) and replicate it to python:
> [https://github.com/apache/spark/commit/0e9e34c1bd9bd16ad5efca77ce2763eb950f3103]
>  
> The change should happen in 
> [https://github.com/apache/spark/blob/eee179135ed21dbdd8b342d053c9eda849e2de77/python/pyspark/sql/tests/streaming/test_streaming.py#L29]
>  
> so we can test it in both connect and non-connect.
>  
> Test with:
> ```
> python/run-tests --testnames pyspark.sql.tests.streaming.test_streaming
> python/run-tests --testnames 
> pyspark.sql.tests.connect.streaming.test_parity_streaming
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48181) Unify StreamingPythonRunner and PythonPlannerRunner

2024-05-07 Thread Wei Liu (Jira)

Wei Liu created SPARK-48181:
---

 Summary: Unify StreamingPythonRunner and PythonPlannerRunner
 Key: SPARK-48181
 URL: https://issues.apache.org/jira/browse/SPARK-48181
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu


We should unify the two driver side python runner for PySpark. To do this we 
should move out of StreamingPythonRunner and enhance PythonPlannerRunner with 
streaming support (multiple read - write loop)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48147) Remove all client listeners when local spark session is deleted

2024-05-06 Thread Wei Liu (Jira)

Wei Liu created SPARK-48147:
---

 Summary: Remove all client listeners when local spark session is 
deleted
 Key: SPARK-48147
 URL: https://issues.apache.org/jira/browse/SPARK-48147
 Project: Spark
  Issue Type: New Feature
  Components: Connect, PySpark, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48093) Add config to switch between client side listener and server side listener

2024-05-02 Thread Wei Liu (Jira)

Wei Liu created SPARK-48093:
---

 Summary: Add config to switch between client side listener and 
server side listener 
 Key: SPARK-48093
 URL: https://issues.apache.org/jira/browse/SPARK-48093
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS
Affects Versions: 3.5.1, 3.5.0, 3.5.2
Reporter: Wei Liu


We are moving the implementation of Streaming Query Listener from server to 
client. For clients already running client side listener, to prevent 
regression, we should add a config to let them decide what type of listener the 
user want to use.

 

This is only added to 3.5.x published versions. For 4.0 and upwards we only use 
client side listener.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners

2024-04-26 Thread Wei Liu (Jira)

Wei Liu created SPARK-48002:
---

 Summary: Add Observed metrics test in PySpark 
StreamingQueryListeners
 Key: SPARK-48002
 URL: https://issues.apache.org/jira/browse/SPARK-48002
 Project: Spark
  Issue Type: New Feature
  Components: SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47877) Speed up test_parity_listener

2024-04-16 Thread Wei Liu (Jira)

Wei Liu created SPARK-47877:
---

 Summary: Speed up test_parity_listener
 Key: SPARK-47877
 URL: https://issues.apache.org/jira/browse/SPARK-47877
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47718) .sql() does not recognize watermark defined upstream

2024-04-09 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-47718:

Labels:   (was: pull-request-available)

> .sql() does not recognize watermark defined upstream
> 
>
> Key: SPARK-47718
> URL: https://issues.apache.org/jira/browse/SPARK-47718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Chloe He
>Priority: Major
>
> I have a data pipeline set up in such a way that it reads data from a Kafka 
> source, does some transformation on the data using pyspark, then writes the 
> output into a sink (Kafka, Redis, etc).
>  
> My entire pipeline in written in SQL, so I wish to use the .sql() method to 
> execute SQL on my streaming source directly.
>  
> However, I'm running into the issue where my watermark is not being 
> recognized by the downstream query via the .sql() method.
>  
> ```
> Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) 
> [Clang 16.0.6 ] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyspark
> >>> print(pyspark.__version__)
> 3.5.1
> >>> from pyspark.sql import SparkSession
> >>>
> >>> session = SparkSession.builder \
> ...     .config("spark.jars.packages", 
> "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
> ...     .getOrCreate()
> >>> from pyspark.sql.functions import col, from_json
> >>> from pyspark.sql.types import StructField, StructType, TimestampType, 
> >>> LongType, DoubleType, IntegerType
> >>> schema = StructType(
> ...     [
> ...         StructField('createTime', TimestampType(), True),
> ...         StructField('orderId', LongType(), True),
> ...         StructField('payAmount', DoubleType(), True),
> ...         StructField('payPlatform', IntegerType(), True),
> ...         StructField('provinceId', IntegerType(), True),
> ...     ])
> >>>
> >>> streaming_df = session.readStream\
> ...     .format("kafka")\
> ...     .option("kafka.bootstrap.servers", "localhost:9092")\
> ...     .option("subscribe", "payment_msg")\
> ...     .option("startingOffsets","earliest")\
> ...     .load()\
> ...     .select(from_json(col("value").cast("string"), 
> schema).alias("parsed_value"))\
> ...     .select("parsed_value.*")\
> ...     .withWatermark("createTime", "10 seconds")
> >>>
> >>> streaming_df.createOrReplaceTempView("streaming_df")
> >>> session.sql("""
> ... SELECT
> ...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
> ...     FROM streaming_df
> ...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
> ...     ORDER BY window.start
> ... """)\
> ...   .writeStream\
> ...   .format("kafka") \
> ...   .option("checkpointLocation", "checkpoint") \
> ...   .option("kafka.bootstrap.servers", "localhost:9092") \
> ...   .option("topic", "sink") \
> ...   .start()
> ```
>  
> This throws exception
> ```
> pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
> supported when there are streaming aggregations on streaming 
> DataFrames/DataSets without watermark; line 6 pos 4;
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47722) Wait until RocksDB background work finish before closing

2024-04-03 Thread Wei Liu (Jira)

Wei Liu created SPARK-47722:
---

 Summary: Wait until RocksDB background work finish before closing 
 Key: SPARK-47722
 URL: https://issues.apache.org/jira/browse/SPARK-47722
 Project: Spark
  Issue Type: New Feature
  Components: SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47332) StreamingPythonRunner don't need redundant logic for starting python process

2024-03-08 Thread Wei Liu (Jira)

Wei Liu created SPARK-47332:
---

 Summary: StreamingPythonRunner don't need redundant logic for 
starting python process
 Key: SPARK-47332
 URL: https://issues.apache.org/jira/browse/SPARK-47332
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS, Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu


https://github.com/apache/spark/pull/45023#discussion_r1516609093



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47292) safeMapToJValue should consider when map is null

2024-03-05 Thread Wei Liu (Jira)

Wei Liu created SPARK-47292:
---

 Summary: safeMapToJValue should consider when map is null
 Key: SPARK-47292
 URL: https://issues.apache.org/jira/browse/SPARK-47292
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS
Affects Versions: 3.5.1, 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47277) PySpark util function assertDataFrameEqual should not support streaming DF

2024-03-04 Thread Wei Liu (Jira)

Wei Liu created SPARK-47277:
---

 Summary: PySpark util function assertDataFrameEqual should not 
support streaming DF
 Key: SPARK-47277
 URL: https://issues.apache.org/jira/browse/SPARK-47277
 Project: Spark
  Issue Type: New Feature
  Components: Connect, PySpark, SQL, Structured Streaming
Affects Versions: 3.5.1, 3.5.0, 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47174) Client Side Listener - Server side implementation

2024-02-26 Thread Wei Liu (Jira)

Wei Liu created SPARK-47174:
---

 Summary: Client Side Listener - Server side implementation
 Key: SPARK-47174
 URL: https://issues.apache.org/jira/browse/SPARK-47174
 Project: Spark
  Issue Type: Improvement
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47174) Client Side Listener - Server side implementation

2024-02-26 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820828#comment-17820828
 ] 

Wei Liu commented on SPARK-47174:
-

im working on this

> Client Side Listener - Server side implementation
> -
>
> Key: SPARK-47174
> URL: https://issues.apache.org/jira/browse/SPARK-47174
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47173) fix typo in new streaming query listener explanation

2024-02-26 Thread Wei Liu (Jira)

Wei Liu created SPARK-47173:
---

 Summary: fix typo in new streaming query listener explanation
 Key: SPARK-47173
 URL: https://issues.apache.org/jira/browse/SPARK-47173
 Project: Spark
  Issue Type: Improvement
  Components: SS, UI
Affects Versions: 4.0.0
Reporter: Wei Liu


miss spelled 
flatMapGroupsWithState with flatMapGroupWithState (missed a "s" after group)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46873) PySpark spark.streams should not recreate new StreamingQueryManager

2024-01-25 Thread Wei Liu (Jira)

Wei Liu created SPARK-46873:
---

 Summary: PySpark spark.streams should not recreate new 
StreamingQueryManager
 Key: SPARK-46873
 URL: https://issues.apache.org/jira/browse/SPARK-46873
 Project: Spark
  Issue Type: Task
  Components: Connect, PySpark, SS
Affects Versions: 4.0.0
Reporter: Wei Liu


In Scala, there is only one streaming query manager for one spark session:

```

scala> spark.streams

val *res0*: *org.apache.spark.sql.streaming.StreamingQueryManager* = 
org.apache.spark.sql.streaming.StreamingQueryManager@46bb8cba

 

scala> spark.streams

val *res1*: *org.apache.spark.sql.streaming.StreamingQueryManager* = 
org.apache.spark.sql.streaming.StreamingQueryManager@46bb8cba

 

scala> spark.streams

val *res2*: *org.apache.spark.sql.streaming.StreamingQueryManager* = 
org.apache.spark.sql.streaming.StreamingQueryManager@46bb8cba

 

scala> spark.streams

val *res3*: *org.apache.spark.sql.streaming.StreamingQueryManager* = 
org.apache.spark.sql.streaming.StreamingQueryManager@46bb8cba

```

 

In Python, this is currently false:

```

>>> spark.streams



>>> spark.streams



>>> spark.streams



>>> spark.streams



```

 

Python should align scala behavior. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-44460) Pass user auth credential to Python workers for foreachBatch and listener

2024-01-23 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu closed SPARK-44460.
---

> Pass user auth credential to Python workers for foreachBatch and listener
> -
>
> Key: SPARK-44460
> URL: https://issues.apache.org/jira/browse/SPARK-44460
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Priority: Major
>
> No user specific credentials are sent to Python worker that runs user 
> functions like foreachBatch() and streaming listener. 
> We might need to pass in these. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46627) Streaming UI hover-over shows incorrect value

2024-01-08 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-46627:

Attachment: Screenshot 2024-01-08 at 15.06.24.png

> Streaming UI hover-over shows incorrect value
> -
>
> Key: SPARK-46627
> URL: https://issues.apache.org/jira/browse/SPARK-46627
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, UI, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
> Attachments: Screenshot 2024-01-08 at 1.55.57 PM.png, Screenshot 
> 2024-01-08 at 15.06.24.png
>
>
> Running a simple streaming query:
> val df = spark.readStream.format("rate").option("rowsPerSecond", 
> "5000").load()
> val q = df.writeStream.format("noop").start()
>  
> The hover-over value is incorrect in the streaming ui (shows 321.00 at 
> undefined)
>  
> !Screenshot 2024-01-08 at 1.55.57 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46627) Streaming UI hover-over shows incorrect value

2024-01-08 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804513#comment-17804513
 ] 

Wei Liu commented on SPARK-46627:
-

Also batch percent doesn't add to 100% now:

!Screenshot 2024-01-08 at 15.06.24.png!

> Streaming UI hover-over shows incorrect value
> -
>
> Key: SPARK-46627
> URL: https://issues.apache.org/jira/browse/SPARK-46627
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, UI, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
> Attachments: Screenshot 2024-01-08 at 1.55.57 PM.png, Screenshot 
> 2024-01-08 at 15.06.24.png
>
>
> Running a simple streaming query:
> val df = spark.readStream.format("rate").option("rowsPerSecond", 
> "5000").load()
> val q = df.writeStream.format("noop").start()
>  
> The hover-over value is incorrect in the streaming ui (shows 321.00 at 
> undefined)
>  
> !Screenshot 2024-01-08 at 1.55.57 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46627) Streaming UI hover-over shows incorrect value

2024-01-08 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-46627:

Attachment: Screenshot 2024-01-08 at 1.55.57 PM.png

> Streaming UI hover-over shows incorrect value
> -
>
> Key: SPARK-46627
> URL: https://issues.apache.org/jira/browse/SPARK-46627
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, UI, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
> Attachments: Screenshot 2024-01-08 at 1.55.57 PM.png
>
>
> Running a simple streaming query:
> val df = spark.readStream.format("rate").option("rowsPerSecond", 
> "5000").load()
> val q = df.writeStream.format("noop").start()
>  
> The hover-over value is incorrect in the streaming ui:
>  
> !https://files.slack.com/files-tmb/T02727P8HV4-F06CJ83D3JT-b44210f391/image_720.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46627) Streaming UI hover-over shows incorrect value

2024-01-08 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804481#comment-17804481
 ] 

Wei Liu commented on SPARK-46627:
-

Hi Kent : ) [~yao] 

I was wondering if you have context in this issue? Thank you so much!

> Streaming UI hover-over shows incorrect value
> -
>
> Key: SPARK-46627
> URL: https://issues.apache.org/jira/browse/SPARK-46627
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, UI, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
> Attachments: Screenshot 2024-01-08 at 1.55.57 PM.png
>
>
> Running a simple streaming query:
> val df = spark.readStream.format("rate").option("rowsPerSecond", 
> "5000").load()
> val q = df.writeStream.format("noop").start()
>  
> The hover-over value is incorrect in the streaming ui (shows 321.00 at 
> undefined)
>  
> !Screenshot 2024-01-08 at 1.55.57 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46627) Streaming UI hover-over shows incorrect value

2024-01-08 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-46627:

Description: 
Running a simple streaming query:

val df = spark.readStream.format("rate").option("rowsPerSecond", 
"5000").load()

val q = df.writeStream.format("noop").start()

 

The hover-over value is incorrect in the streaming ui (shows 321.00 at 
undefined)

 
!Screenshot 2024-01-08 at 1.55.57 PM.png!

  was:
Running a simple streaming query:

val df = spark.readStream.format("rate").option("rowsPerSecond", 
"5000").load()

val q = df.writeStream.format("noop").start()

 

The hover-over value is incorrect in the streaming ui:

 
!https://files.slack.com/files-tmb/T02727P8HV4-F06CJ83D3JT-b44210f391/image_720.png!


> Streaming UI hover-over shows incorrect value
> -
>
> Key: SPARK-46627
> URL: https://issues.apache.org/jira/browse/SPARK-46627
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, UI, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
> Attachments: Screenshot 2024-01-08 at 1.55.57 PM.png
>
>
> Running a simple streaming query:
> val df = spark.readStream.format("rate").option("rowsPerSecond", 
> "5000").load()
> val q = df.writeStream.format("noop").start()
>  
> The hover-over value is incorrect in the streaming ui (shows 321.00 at 
> undefined)
>  
> !Screenshot 2024-01-08 at 1.55.57 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46627) Streaming UI hover-over shows incorrect value

2024-01-08 Thread Wei Liu (Jira)

Wei Liu created SPARK-46627:
---

 Summary: Streaming UI hover-over shows incorrect value
 Key: SPARK-46627
 URL: https://issues.apache.org/jira/browse/SPARK-46627
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming, UI, Web UI
Affects Versions: 4.0.0
Reporter: Wei Liu


Running a simple streaming query:

val df = spark.readStream.format("rate").option("rowsPerSecond", 
"5000").load()

val q = df.writeStream.format("noop").start()

 

The hover-over value is incorrect in the streaming ui:

 
!https://files.slack.com/files-tmb/T02727P8HV4-F06CJ83D3JT-b44210f391/image_720.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46384) Streaming UI doesn't display graph correctly

2023-12-12 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-46384:

Summary: Streaming UI doesn't display graph correctly  (was: Streaming UI 
doesn't show graph)

> Streaming UI doesn't display graph correctly
> 
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-12 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-46384:

Summary: Structured Streaming UI doesn't display graph correctly  (was: 
Streaming UI doesn't display graph correctly)

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46384) Streaming UI doesn't show graph

2023-12-12 Thread Wei Liu (Jira)

Wei Liu created SPARK-46384:
---

 Summary: Streaming UI doesn't show graph
 Key: SPARK-46384
 URL: https://issues.apache.org/jira/browse/SPARK-46384
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming, Web UI
Affects Versions: 4.0.0
Reporter: Wei Liu


The Streaming UI is broken currently at spark master. Running a simple query:

```

q = 
spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()

```

Would make the spark UI shows empty graph for "operation duration":

!https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!

Here is the error:

!https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!

 

I verified the same query runs fine on spark 3.5, as in the following graph.

!https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!

 

This should be a problem from the library updates, this could be a potential 
source of error: [https://github.com/apache/spark/pull/42879]

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46250) Deflake test_parity_listener

2023-12-04 Thread Wei Liu (Jira)

Wei Liu created SPARK-46250:
---

 Summary: Deflake test_parity_listener
 Key: SPARK-46250
 URL: https://issues.apache.org/jira/browse/SPARK-46250
 Project: Spark
  Issue Type: Task
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45845) Streaming UI add number of evicted state rows

2023-11-08 Thread Wei Liu (Jira)

Wei Liu created SPARK-45845:
---

 Summary: Streaming UI add number of evicted state rows
 Key: SPARK-45845
 URL: https://issues.apache.org/jira/browse/SPARK-45845
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu


The UI is missing this chart, and people always confuse "aggregated number of 
rows dropped by watermark" with this newly added metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45637) Time window aggregation in separate streams followed by stream-stream join not returning results

2023-10-27 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-45637:

Description: 
According to documentation update (SPARK-42591) resulting from SPARK-42376, 
Spark 3.5.0 should support time-window aggregations in two separate streams 
followed by stream-stream window join:

[https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995]

However, I failed to reproduce this example and the query I built doesn't 
return any results:
{code:java}
from pyspark.sql.functions import rand
from pyspark.sql.functions import expr, window, window_time

spark.conf.set("spark.sql.shuffle.partitions", "1")

impressions = (
spark
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()
.selectExpr("value AS adId", "timestamp AS impressionTime")
)

impressionsWithWatermark = impressions \
.selectExpr("adId AS impressionAdId", "impressionTime") \
.withWatermark("impressionTime", "10 seconds")

clicks = (  
spark  
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()  
.where((rand() * 100).cast("integer") < 10)  # 10 out of every 100 
impressions result in a click  
.selectExpr("(value - 10) AS adId ", "timestamp AS clickTime")  # -10 so 
that a click with same id as impression is generated later (i.e. delayed data).
    .where("adId > 0")  
) 

clicksWithWatermark = clicks \
.selectExpr("adId AS clickAdId", "clickTime") \
.withWatermark("clickTime", "10 seconds")

clicksWindow = clicksWithWatermark.groupBy(  
window(clicksWithWatermark.clickTime, "1 minute")
).count()

impressionsWindow = impressionsWithWatermark.groupBy(
window(impressionsWithWatermark.impressionTime, "1 minute")
).count()

clicksAndImpressions = clicksWindow.join(impressionsWindow, "window", "inner")

clicksAndImpressions.writeStream \
.format("memory") \
.queryName("clicksAndImpressions") \
.outputMode("append") \
.start() {code}
 

My intuition is that I'm getting no results because to output results of the 
first stateful operator (time window aggregation), a watermark needs to pass 
the end timestamp of the window. And once the watermark is after the end 
timestamp of the window, this window is ignored at the second stateful operator 
(stream-stream) join because it's behind the watermark. Indeed, a small hack 
done to event time column (adding one minute) between two stateful operators 
makes it possible to get results:
{code:java}
clicksWindow2 = clicksWithWatermark.groupBy( 
window(clicksWithWatermark.clickTime, "1 minute")
).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 
MINUTE')).drop("window")

impressionsWindow2 = impressionsWithWatermark.groupBy(
window(impressionsWithWatermark.impressionTime, "1 minute")
).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 
MINUTE')).drop("window")

clicksAndImpressions2 = clicksWindow2.join(impressionsWindow2, "window_time", 
"inner")

clicksAndImpressions2.writeStream \
.format("memory") \
.queryName("clicksAndImpressions2") \
.outputMode("append") \
.start()  {code}
 

  was:
According to documentation update (SPARK-42591) resulting from SPARK-42376, 
Spark 3.5.0 should support time-window aggregations in two separate streams 
followed by stream-stream window join:

https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995

However, I failed to reproduce this example and the query I built doesn't 
return any results:
{code:java}
from pyspark.sql.functions import rand
from pyspark.sql.functions import expr, window, window_time

spark.conf.set("spark.sql.shuffle.partitions", "1")

impressions = (
spark
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()
.selectExpr("value AS adId", "timestamp AS impressionTime")
)

impressionsWithWatermark = impressions \
.selectExpr("adId AS impressionAdId", "impressionTime") \
.withWatermark("impressionTime", "10 seconds")

clicks = (  
spark  
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()  
.where((rand() * 100).cast("integer") < 10)  # 10 out of every 100 
impressions result in a click  
.selectExpr("(value - 10) AS adId ", "timestamp AS clickTime")  # -10 so 
that a click with same id as impression is generated later (i.e. delayed data). 
 .where("adId > 0")
) 

clicksWithWatermark = clicks \
.selectExpr("adId AS clickAdId", "clickTime") \
.withWatermark("clickTime", "10 seconds")

clicksWindow = clicksWithWatermark.groupBy(  
window(clicksWithWatermark.clickTime, "1 minute")
).count()

impressionsWindow

[jira] [Created] (SPARK-45677) Observe API error logging

2023-10-26 Thread Wei Liu (Jira)

Wei Liu created SPARK-45677:
---

 Summary: Observe API error logging
 Key: SPARK-45677
 URL: https://issues.apache.org/jira/browse/SPARK-45677
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu


We should tell user why it's not supported and what to do

[https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45053) Improve python version mismatch logging

2023-09-01 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-45053:

Description: Currently the syntax of the python version mismatching is a 
little bit confusing, it uses (3,9) to represent python version 3.9. Just a 
minor update to make it more straightforward 

> Improve python version mismatch logging
> ---
>
> Key: SPARK-45053
> URL: https://issues.apache.org/jira/browse/SPARK-45053
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.1
>Reporter: Wei Liu
>Priority: Trivial
>
> Currently the syntax of the python version mismatching is a little bit 
> confusing, it uses (3,9) to represent python version 3.9. Just a minor update 
> to make it more straightforward 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45056) Add process termination tests for Python foreachBatch and StreamingQueryListener

2023-09-01 Thread Wei Liu (Jira)

Wei Liu created SPARK-45056:
---

 Summary: Add process termination tests for Python foreachBatch and 
StreamingQueryListener
 Key: SPARK-45056
 URL: https://issues.apache.org/jira/browse/SPARK-45056
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45053) Improve python version mismatch logging

2023-09-01 Thread Wei Liu (Jira)

Wei Liu created SPARK-45053:
---

 Summary: Improve python version mismatch logging
 Key: SPARK-45053
 URL: https://issues.apache.org/jira/browse/SPARK-45053
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44971) [BUG Fix] PySpark StreamingQuerProgress fromJson

2023-08-25 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44971:

Issue Type: Bug  (was: New Feature)

> [BUG Fix] PySpark StreamingQuerProgress fromJson 
> -
>
> Key: SPARK-44971
> URL: https://issues.apache.org/jira/browse/SPARK-44971
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44971) [BUG Fix] PySpark StreamingQuerProgress fromJson

2023-08-25 Thread Wei Liu (Jira)

Wei Liu created SPARK-44971:
---

 Summary: [BUG Fix] PySpark StreamingQuerProgress fromJson 
 Key: SPARK-44971
 URL: https://issues.apache.org/jira/browse/SPARK-44971
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.5.0, 3.5.1
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44917) PySpark Streaming DataStreamWriter table API

2023-08-22 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu resolved SPARK-44917.
-
Resolution: Not A Problem

> PySpark Streaming DataStreamWriter table API
> 
>
> Key: SPARK-44917
> URL: https://issues.apache.org/jira/browse/SPARK-44917
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44917) PySpark Streaming DataStreamWriter table API

2023-08-22 Thread Wei Liu (Jira)

Wei Liu created SPARK-44917:
---

 Summary: PySpark Streaming DataStreamWriter table API
 Key: SPARK-44917
 URL: https://issues.apache.org/jira/browse/SPARK-44917
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44460) Pass user auth credential to Python workers for foreachBatch and listener

2023-08-21 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757087#comment-17757087
 ] 

Wei Liu commented on SPARK-44460:
-

[~rangadi] This seems to be a Databricks internal issue. See the updates in 
SC-138245

> Pass user auth credential to Python workers for foreachBatch and listener
> -
>
> Key: SPARK-44460
> URL: https://issues.apache.org/jira/browse/SPARK-44460
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Priority: Major
>
> No user specific credentials are sent to Python worker that runs user 
> functions like foreachBatch() and streaming listener. 
> We might need to pass in these. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44839) Better error logging when user accesses spark session in foreachBatch and Listener

2023-08-16 Thread Wei Liu (Jira)

Wei Liu created SPARK-44839:
---

 Summary: Better error logging when user accesses spark session in 
foreachBatch and Listener
 Key: SPARK-44839
 URL: https://issues.apache.org/jira/browse/SPARK-44839
 Project: Spark
  Issue Type: New Feature
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0, 4.0.0
Reporter: Wei Liu


TypeError: cannot pickle '_thread._local' object

 

when user access `spark`. we need a better error for this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754280#comment-17754280
 ] 

Wei Liu commented on SPARK-44808:
-

This seems to be against the design principle of spark connect – We pass 
everything to the server side. Client only keep ids to get status. That's the 
reason why the foreachBatch and Listener are ran on the server side. This can 
be closed

> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So we could define a new interface on the streaming query manager, maybe 
> called refreshListener(listener), e.g. use it as 
> `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
> For `foreachBatch`, we could wrap the function to a new class
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu resolved SPARK-44808.
-
Resolution: Won't Do

> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So we could define a new interface on the streaming query manager, maybe 
> called refreshListener(listener), e.g. use it as 
> `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
> For `foreachBatch`, we could wrap the function to a new class
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So we could define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we could wrap the function to a new class

 

  was:
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we could wrap the function to a new class

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So we could define a new interface on the streaming query manager, maybe 
> called refreshListener(listener), e.g. use it as 
> `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
> For `foreachBatch`, we could wrap the function to a new class
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we could wrap the function to a new class

 

  was:
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we might could wrap the function to a new class, like the 
listener case

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So if we define a new interface on the streaming query manager, maybe called 
> refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
> For `foreachBatch`, we could wrap the function to a new class
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

For `foreachBatch`, we might could wrap the function to a new class, like the 
listener case

 

  was:
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

 

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So if we define a new interface on the streaming query manager, maybe called 
> refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
> For `foreachBatch`, we might could wrap the function to a new class, like the 
> listener case
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for connect python listener and foreachBatch. 
Currently if you define a variable outside of the function, you can’t actually 
see it on client side if it’s touched in the function because the operation is 
on server side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

 

 

  was:
I’m thinking of an improvement for python listener and foreachBatch. Currently 
if you define a variable outside of the function, you can’t actually see it on 
client side if it’s touched in the function because the operation is on server 
side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

 

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for connect python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So if we define a new interface on the streaming query manager, maybe called 
> refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)

Wei Liu created SPARK-44808:
---

 Summary: refreshListener() API on StreamingQueryManager for spark 
connect
 Key: SPARK-44808
 URL: https://issues.apache.org/jira/browse/SPARK-44808
 Project: Spark
  Issue Type: New Feature
  Components: Connect, Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu


I’m thinking of an improvement for python listener and foreachBatch. Currently 
if you define a variable outside of the function, you can’t actually see it on 
client side if it’s touched in the function because the operation is on server 
side, e.g. 

```

x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200

spark.streams.addListener(MyListener())
q = spark.readStream.start()

# x is still 0, self.y is not defined

```

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`




```

def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener

 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y

```

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44808) refreshListener() API on StreamingQueryManager for spark connect

2023-08-14 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44808:

Description: 
I’m thinking of an improvement for python listener and foreachBatch. Currently 
if you define a variable outside of the function, you can’t actually see it on 
client side if it’s touched in the function because the operation is on server 
side, e.g. 

 

 
{code:java}
x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200
spark.streams.addListener(MyListener())
q = spark.readStream.start()
 
# x is still 0, self.y is not defined
{code}
 

 

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`

 

 
{code:java}
def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener
 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y
 
{code}
 

 

 

  was:
I’m thinking of an improvement for python listener and foreachBatch. Currently 
if you define a variable outside of the function, you can’t actually see it on 
client side if it’s touched in the function because the operation is on server 
side, e.g. 

```

x = 0
class MyListener(StreamingQueryListener):
    def onQueryStarted(e):
        x = 100
        self.y = 200

spark.streams.addListener(MyListener())
q = spark.readStream.start()

# x is still 0, self.y is not defined

```

But for the self.y case, there could be an improvement. 

 

The server side is capable of pickle serialize the whole listener instance 
again, and send back to the client.

 

So if we define a new interface on the streaming query manager, maybe called 
refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`




```

def refreshListener(listener: StreamingQueryListener) -> StreamingQueryListener

 # send the listener id with this refresh request, server receives this request 
and serializes the listener again, and send back to client, so the returned new 
listener contains the updated value self.y

```

 

 


> refreshListener() API on StreamingQueryManager for spark connect
> 
>
> Key: SPARK-44808
> URL: https://issues.apache.org/jira/browse/SPARK-44808
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> I’m thinking of an improvement for python listener and foreachBatch. 
> Currently if you define a variable outside of the function, you can’t 
> actually see it on client side if it’s touched in the function because the 
> operation is on server side, e.g. 
>  
>  
> {code:java}
> x = 0
> class MyListener(StreamingQueryListener):
>     def onQueryStarted(e):
>         x = 100
>         self.y = 200
> spark.streams.addListener(MyListener())
> q = spark.readStream.start()
>  
> # x is still 0, self.y is not defined
> {code}
>  
>  
> But for the self.y case, there could be an improvement. 
>  
> The server side is capable of pickle serialize the whole listener instance 
> again, and send back to the client.
>  
> So if we define a new interface on the streaming query manager, maybe called 
> refreshListener(listener), e.g. use it as `spark.streams.refreshListener()`
>  
>  
> {code:java}
> def refreshListener(listener: StreamingQueryListener) -> 
> StreamingQueryListener
>  # send the listener id with this refresh request, server receives this 
> request and serializes the listener again, and send back to client, so the 
> returned new listener contains the updated value self.y
>  
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44764) Streaming process improvement

2023-08-10 Thread Wei Liu (Jira)

Wei Liu created SPARK-44764:
---

 Summary: Streaming process improvement
 Key: SPARK-44764
 URL: https://issues.apache.org/jira/browse/SPARK-44764
 Project: Spark
  Issue Type: New Feature
  Components: Connect, Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu


# Deduplicate or remove StreamingPythonRunner if possible, it is very similar 
to existing PythonRunner
 # Use the DAEMON mode for starting a new process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44460) Pass user auth credential to Python workers for foreachBatch and listener

2023-07-28 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44460:

Description: 
No user specific credentials are sent to Python worker that runs user functions 
like foreachBatch() and streaming listener. 

We might need to pass in these. 

  was:
No user specific credentials (like UC creds?) are sent to Python worker that 
runs user functions like foreachBatch() and streaming listener. 

We might need to pass in these. 


> Pass user auth credential to Python workers for foreachBatch and listener
> -
>
> Key: SPARK-44460
> URL: https://issues.apache.org/jira/browse/SPARK-44460
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Priority: Major
>
> No user specific credentials are sent to Python worker that runs user 
> functions like foreachBatch() and streaming listener. 
> We might need to pass in these. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44516) Spark Connect Python StreamingQueryListener removeListener method actually shut down the listener process

2023-07-24 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-44516:

Summary: Spark Connect Python StreamingQueryListener removeListener method 
actually shut down the listener process  (was: Spark Connect Python 
StreamingQueryListener removeListener method)

> Spark Connect Python StreamingQueryListener removeListener method actually 
> shut down the listener process
> -
>
> Key: SPARK-44516
> URL: https://issues.apache.org/jira/browse/SPARK-44516
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44516) Spark Connect Python StreamingQueryListener removeListener method

2023-07-23 Thread Wei Liu (Jira)

Wei Liu created SPARK-44516:
---

 Summary: Spark Connect Python StreamingQueryListener 
removeListener method
 Key: SPARK-44516
 URL: https://issues.apache.org/jira/browse/SPARK-44516
 Project: Spark
  Issue Type: New Feature
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0, 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44515) Code Improvement: PySpark add util function to set python version

2023-07-23 Thread Wei Liu (Jira)

Wei Liu created SPARK-44515:
---

 Summary: Code Improvement: PySpark add util function to set python 
version
 Key: SPARK-44515
 URL: https://issues.apache.org/jira/browse/SPARK-44515
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.5.0, 4.0.0
Reporter: Wei Liu


If searching for `"%d.%d" % sys.version_info[:2]` there are multiple occurrence 
of such method, we should create a separate util method for this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44502) Add mission versionchanged field to docs

2023-07-20 Thread Wei Liu (Jira)

Wei Liu created SPARK-44502:
---

 Summary: Add mission versionchanged field to docs
 Key: SPARK-44502
 URL: https://issues.apache.org/jira/browse/SPARK-44502
 Project: Spark
  Issue Type: New Feature
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44484) Add missing json field batchDuration to StreamingQueryProgress

2023-07-19 Thread Wei Liu (Jira)

Wei Liu created SPARK-44484:
---

 Summary: Add missing json field batchDuration to 
StreamingQueryProgress
 Key: SPARK-44484
 URL: https://issues.apache.org/jira/browse/SPARK-44484
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42962) Allow access to stopped streaming queries

2023-06-21 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17735867#comment-17735867
 ] 

Wei Liu commented on SPARK-42962:
-

Closing this as it's duplicate with SPARK-42940. Please kindly reopen the 
ticket if this is a mistake! 

> Allow access to stopped streaming queries
> -
>
> Key: SPARK-42962
> URL: https://issues.apache.org/jira/browse/SPARK-42962
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>
> In Spark connect, when a query is stopped, the client can't access it 
> anymore. That implies they can not fetch any information about the query 
> after that. 
> The server might have to cache the queries for some time (upto the time the 
> session is closed). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42962) Allow access to stopped streaming queries

2023-06-21 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu resolved SPARK-42962.
-
Resolution: Duplicate

> Allow access to stopped streaming queries
> -
>
> Key: SPARK-42962
> URL: https://issues.apache.org/jira/browse/SPARK-42962
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>
> In Spark connect, when a query is stopped, the client can't access it 
> anymore. That implies they can not fetch any information about the query 
> after that. 
> The server might have to cache the queries for some time (upto the time the 
> session is closed). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42941) Add support for streaming listener in Python

2023-06-21 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17735866#comment-17735866
 ] 

Wei Liu commented on SPARK-42941:
-

Hmmm this is actually still in progress. I was thinking two PRs could share the 
same Jira ticket... sorry if I did it wrong

> Add support for streaming listener in Python
> 
>
> Key: SPARK-42941
> URL: https://issues.apache.org/jira/browse/SPARK-42941
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> Add support of streaming listener in Python. 
> This likely requires a design doc to hash out the details. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-42941) Add support for streaming listener in Python

2023-06-21 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu reopened SPARK-42941:
-

> Add support for streaming listener in Python
> 
>
> Key: SPARK-42941
> URL: https://issues.apache.org/jira/browse/SPARK-42941
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> Add support of streaming listener in Python. 
> This likely requires a design doc to hash out the details. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44046) Pyspark StreamingQueryListener listListener

2023-06-13 Thread Wei Liu (Jira)

Wei Liu created SPARK-44046:
---

 Summary: Pyspark StreamingQueryListener listListener
 Key: SPARK-44046
 URL: https://issues.apache.org/jira/browse/SPARK-44046
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44029) Observation.py supports streaming dataframes

2023-06-12 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu resolved SPARK-44029.
-
Resolution: Not A Problem

> Observation.py supports streaming dataframes
> 
>
> Key: SPARK-44029
> URL: https://issues.apache.org/jira/browse/SPARK-44029
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44029) Observation.py supports streaming dataframes

2023-06-12 Thread Wei Liu (Jira)

Wei Liu created SPARK-44029:
---

 Summary: Observation.py supports streaming dataframes
 Key: SPARK-44029
 URL: https://issues.apache.org/jira/browse/SPARK-44029
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44010) Python StreamingQueryProgress rowsPerSecond type fix

2023-06-08 Thread Wei Liu (Jira)

Wei Liu created SPARK-44010:
---

 Summary: Python StreamingQueryProgress rowsPerSecond type fix
 Key: SPARK-44010
 URL: https://issues.apache.org/jira/browse/SPARK-44010
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43796) Streaming ForeachWriter can't accept custom user defined class

2023-05-25 Thread Wei Liu (Jira)

Wei Liu created SPARK-43796:
---

 Summary: Streaming ForeachWriter can't accept custom user defined 
class
 Key: SPARK-43796
 URL: https://issues.apache.org/jira/browse/SPARK-43796
 Project: Spark
  Issue Type: Bug
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu


[https://github.com/apache/spark/pull/41129]

The last example in the PR description doesn't work with current REPL 
implementation. 

Code:

 
{code:java}
import org.apache.spark.sql.{ForeachWriter, Row} 
import java.io._ 
val filePath = "/home/wei.liu/test_foreach/output-custom" 
case class MyTestClass(value: Int) {
      override def toString: String = value.toString
}
val writer = new ForeachWriter[MyTestClass] {
    var fileWriter: FileWriter = _
    def open(partitionId: Long, version: Long): Boolean = {
      fileWriter = new FileWriter(filePath, true)
      true
    }
    def process(row: MyTestClass): Unit = {
      fileWriter.write(row.toString)
      fileWriter.write("\n")
    }
    def close(errorOrNull: Throwable): Unit = {
      fileWriter.close()
    }
}
val df = spark.readStream .format("rate") .option("rowsPerSecond", "10") .load()
val query = df .selectExpr("CAST(value AS INT)") .as[MyTestClass] .writeStream 
.foreach(writer) .outputMode("update") .start()
{code}
Error:
{code:java}
23/05/24 19:17:31 ERROR Utils: Aborting task
java.lang.NoClassDefFoundError: Could not initialize class ammonite.$sess.cmd4$
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:35)
at 
org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:30)
at org.apache.spark.util.Utils$.classForName(Utils.scala:94)
at 
org.apache.spark.sql.catalyst.encoders.OuterScopes$.$anonfun$getOuterScope$1(OuterScopes.scala:59)
at 
org.apache.spark.sql.catalyst.expressions.objects.NewInstance.$anonfun$doGenCode$1(objects.scala:598)
at scala.Option.map(Option.scala:230)
at 
org.apache.spark.sql.catalyst.expressions.objects.NewInstance.doGenCode(objects.scala:598)
at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:201)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156)
at scala.collection.immutable.List.map(List.scala:293)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369)
at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:171)
at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:168)
at 
org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:51)
at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.create(Projection.scala:194)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:173)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:166)
at 
org.apache.spark.sql.execution.streaming.sources.ForeachDataWriter.write(ForeachWriterTable.scala:147)
at 
org.apache.spark.sql.execution.streaming.sources.ForeachDataWriter.write(ForeachWriterTable.scala:132)
at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.write(WriteToDataSourceV2Exec.scala:493)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:448)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1521)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at

[jira] [Created] (SPARK-43761) Streaming ForeachWriter with UnboundRowEncoder

2023-05-23 Thread Wei Liu (Jira)

Wei Liu created SPARK-43761:
---

 Summary: Streaming ForeachWriter with UnboundRowEncoder
 Key: SPARK-43761
 URL: https://issues.apache.org/jira/browse/SPARK-43761
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu


https://github.com/apache/spark/pull/41129#discussion_r1196873903



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42941) Add support for streaming listener in Python

2023-05-04 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-42941:

Summary: Add support for streaming listener in Python  (was: Add support 
for streaming listener.)

> Add support for streaming listener in Python
> 
>
> Key: SPARK-42941
> URL: https://issues.apache.org/jira/browse/SPARK-42941
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Add support of streaming listener in Python. 
> This likely requires a design doc to hash out the details. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43360) Scala Connect: Add StreamingQueryManager API

2023-05-03 Thread Wei Liu (Jira)

Wei Liu created SPARK-43360:
---

 Summary: Scala Connect: Add StreamingQueryManager API
 Key: SPARK-43360
 URL: https://issues.apache.org/jira/browse/SPARK-43360
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43132) Add foreach streaming API in Python

2023-04-27 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17717347#comment-17717347
 ] 

Wei Liu commented on SPARK-43132:
-

i'm working on this

 

> Add foreach streaming API in Python
> ---
>
> Key: SPARK-43132
> URL: https://issues.apache.org/jira/browse/SPARK-43132
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43054) Support foreach() in streaming spark connect

2023-04-27 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu resolved SPARK-43054.
-
Resolution: Duplicate

duplicate with 43132 and 43133

 

> Support foreach() in streaming spark connect
> 
>
> Key: SPARK-43054
> URL: https://issues.apache.org/jira/browse/SPARK-43054
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43299) JVM Client throw StreamingQueryException when error handling is implemented

2023-04-26 Thread Wei Liu (Jira)

Wei Liu created SPARK-43299:
---

 Summary: JVM Client throw StreamingQueryException when error 
handling is implemented
 Key: SPARK-43299
 URL: https://issues.apache.org/jira/browse/SPARK-43299
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu


Currently the awaitTermination() method of connect's JVM client's 
StreamingQuery won't throw error when there is an exception. 

 

In Python connect this is directly handled by python client's error-handling 
framework but such is not existed in JVM client right now.

 

We should verify it works when JVM adds that

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43287) Connect JVM client REPL not correctly shut down if killed

2023-04-25 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43287:

Description: 
How to reproduce:
 # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
 # in another terminal, kill the process `kill `
 # Back to the client terminal, you can't see anything you type, but the 
command still works

 

 
{code:java}
Spark session available as 'spark'.
   _                  __      __                            __
  / ___/   __/ /__   / /___      ___  _/ /_
  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
 ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
// .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/
    /_/


@ wei.liu:~/oss-spark$ CONTRIBUTING.md  appveyor.yml  conf                      
  examples            logs         resource-managers                    target
LICENSE          artifacts     connector                   graphx              
mllib        sbin                                 tools
LICENSE-binary   assembly      core                        hadoop-cloud        
mllib-local  scalastyle-config.xml
NOTICE           bin           data                        hs_err_pid9062.log  
pom.xml      scalastyle-on-compile.generated.xml
NOTICE-binary    binder        dependency-reduced-pom.xml  launcher            
project      spark-warehouse
R                build         dev                         licenses            
python       sql
README.md        common        docs                        licenses-binary     
repl         streaming
wei.liu:~/oss-spark$ wei.liu:~/oss-spark$ wei.liu:~/oss-spark$ 

{code}
 

I ran 'ls' above, and clicked return multiple times

 

 

 

 

  was:
How to reproduce:
 # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
 # in another terminal, kill the process `kill `
 # Back to the client terminal, you can't see anything you type, but the 
command still works

 

 
{code:java}
Spark session available as 'spark'.
   _                  __      __                            __
  / ___/   __/ /__   / /___      ___  _/ /_
  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
 ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
// .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/
    /_/


@ wei.liu:~/oss-spark$ CONTRIBUTING.md  appveyor.yml  conf                      
  examples            logs         resource-managers                    target
LICENSE          artifacts     connector                   graphx              
mllib        sbin                                 tools
LICENSE-binary   assembly      core                        hadoop-cloud        
mllib-local  scalastyle-config.xml
NOTICE           bin           data                        hs_err_pid9062.log  
pom.xml      scalastyle-on-compile.generated.xml
NOTICE-binary    binder        dependency-reduced-pom.xml  launcher            
project      spark-warehouse
R                build         dev                         licenses            
python       sql
README.md        common        docs                        licenses-binary     
repl         streaming
wei.liu:~/oss-spark$ wei.liu@ip-10-110-19-234:~/oss-spark$ wei.liu:~/oss-spark$ 

{code}
 

I ran 'ls' above, and clicked return multiple times

 

 

 


> Connect JVM client REPL not correctly shut down if killed
> -
>
> Key: SPARK-43287
> URL: https://issues.apache.org/jira/browse/SPARK-43287
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>
> How to reproduce:
>  # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
>  # in another terminal, kill the process `kill `
>  # Back to the client terminal, you can't see anything you type, but the 
> command still works
>  
>  
> {code:java}
> Spark session available as 'spark'.
>    _                  __      __                            __
>   / ___/   __/ /__   / /___      ___  _/ /_
>   \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
>  ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
> // .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/
>     /_/
> @ wei.liu:~/oss-spark$ CONTRIBUTING.md  appveyor.yml  conf                    
>     examples            logs         resource-managers                    
> target
> LICENSE          artifacts     connector                   graphx             
>  mllib        sbin                                 tools
> LICENSE-binary   assembly      core                        hadoop-cloud

[jira] [Updated] (SPARK-43287) Connect JVM client REPL not correctly shut down if killed

2023-04-25 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43287:

Description: 
How to reproduce:
 # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
 # in another terminal, kill the process `kill `
 # Back to the client terminal, you can't see anything you type, but the 
command still works

 

 
{code:java}
Spark session available as 'spark'.
   _                  __      __                            __
  / ___/   __/ /__   / /___      ___  _/ /_
  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
 ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
// .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/
    /_/


@ wei.liu:~/oss-spark$ CONTRIBUTING.md  appveyor.yml  conf                      
  examples            logs         resource-managers                    target
LICENSE          artifacts     connector                   graphx              
mllib        sbin                                 tools
LICENSE-binary   assembly      core                        hadoop-cloud        
mllib-local  scalastyle-config.xml
NOTICE           bin           data                        hs_err_pid9062.log  
pom.xml      scalastyle-on-compile.generated.xml
NOTICE-binary    binder        dependency-reduced-pom.xml  launcher            
project      spark-warehouse
R                build         dev                         licenses            
python       sql
README.md        common        docs                        licenses-binary     
repl         streaming
wei.liu:~/oss-spark$ wei.liu@ip-10-110-19-234:~/oss-spark$ wei.liu:~/oss-spark$ 

{code}
 

I ran 'ls' above, and clicked return multiple times

 

 

 

  was:
How to reproduce:
 # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
 # in another terminal, kill the process `kill `
 # Back to the client terminal, you can't see anything you type, but the 
command still works

 

 
{code:java}
Spark session available as 'spark'.
   _                  __      __                            __
  / ___/   __/ /__   / /___      ___  _/ /_
  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
 ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
// .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/
    /_/


@ wei.liu@ip-10-110-19-234:~/oss-spark$ CONTRIBUTING.md  appveyor.yml  conf     
                   examples            logs         resource-managers           
         target
LICENSE          artifacts     connector                   graphx              
mllib        sbin                                 tools
LICENSE-binary   assembly      core                        hadoop-cloud        
mllib-local  scalastyle-config.xml
NOTICE           bin           data                        hs_err_pid9062.log  
pom.xml      scalastyle-on-compile.generated.xml
NOTICE-binary    binder        dependency-reduced-pom.xml  launcher            
project      spark-warehouse
R                build         dev                         licenses            
python       sql
README.md        common        docs                        licenses-binary     
repl         streaming
wei.liu@ip-10-110-19-234:~/oss-spark$ wei.liu@ip-10-110-19-234:~/oss-spark$ 
wei.liu@ip-10-110-19-234:~/oss-spark$ 

{code}
 

I ran 'ls' above, and clicked return multiple times

 

 


> Connect JVM client REPL not correctly shut down if killed
> -
>
> Key: SPARK-43287
> URL: https://issues.apache.org/jira/browse/SPARK-43287
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>
> How to reproduce:
>  # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
>  # in another terminal, kill the process `kill `
>  # Back to the client terminal, you can't see anything you type, but the 
> command still works
>  
>  
> {code:java}
> Spark session available as 'spark'.
>    _                  __      __                            __
>   / ___/   __/ /__   / /___      ___  _/ /_
>   \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
>  ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
> // .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/
>     /_/
> @ wei.liu:~/oss-spark$ CONTRIBUTING.md  appveyor.yml  conf                    
>     examples            logs         resource-managers                    
> target
> LICENSE          artifacts     connector                   graphx             
>  mllib        sbin                                 tools
> LICENSE-binary

[jira] [Updated] (SPARK-43287) Connect JVM client REPL not correctly shut down if killed

2023-04-25 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43287:

Description: 
How to reproduce:
 # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
 # in another terminal, kill the process `kill `
 # Back to the client terminal, you can't see anything you type, but the 
command still works

 

 
{code:java}
Spark session available as 'spark'.
   _                  __      __                            __
  / ___/   __/ /__   / /___      ___  _/ /_
  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
 ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
// .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/
    /_/


@ wei.liu@ip-10-110-19-234:~/oss-spark$ CONTRIBUTING.md  appveyor.yml  conf     
                   examples            logs         resource-managers           
         target
LICENSE          artifacts     connector                   graphx              
mllib        sbin                                 tools
LICENSE-binary   assembly      core                        hadoop-cloud        
mllib-local  scalastyle-config.xml
NOTICE           bin           data                        hs_err_pid9062.log  
pom.xml      scalastyle-on-compile.generated.xml
NOTICE-binary    binder        dependency-reduced-pom.xml  launcher            
project      spark-warehouse
R                build         dev                         licenses            
python       sql
README.md        common        docs                        licenses-binary     
repl         streaming
wei.liu@ip-10-110-19-234:~/oss-spark$ wei.liu@ip-10-110-19-234:~/oss-spark$ 
wei.liu@ip-10-110-19-234:~/oss-spark$ 

{code}
 

I ran 'ls' above, and clicked return multiple times

 

 

  was:
How to reproduce:
 # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
 # in another terminal, kill the process `kill `
 # Back to the client terminal, you can't see anything you type, but the 
command still works

```

Spark session available as 'spark'.

   _                  __      __                            __

  / ___/   __/ /__   / /___      ___  _/ /_

  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/

 ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_

// .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/

    /_/

 

@ *wei.liu*:*~/oss-spark*$ CONTRIBUTING.md  appveyor.yml  *conf*                
        *examples*            *logs*         *resource-managers*                
    *target*

LICENSE          *artifacts*     *connector*                   *graphx*         
     *mllib*        *sbin*                                 *tools*

LICENSE-binary   *assembly*      *core*                        *hadoop-cloud*   
     *mllib-local*  scalastyle-config.xml

NOTICE           *bin*           *data*                        
hs_err_pid9062.log  pom.xml      scalastyle-on-compile.generated.xml

NOTICE-binary    *binder*        dependency-reduced-pom.xml  *launcher*         
   *project*      *spark-warehouse*

*R*                *build*         *dev*                         *licenses*     
       *python*       *sql*

README.md        *common*        *docs*                        
*licenses-binary*     *repl*         *streaming*

*wei.liu*:*~/oss-spark*$ *wei.liu*:*~/oss-spark*$ *wei.liu*:*~/oss-spark*$ 

```

I ran 'ls' above, and clicked return multiple times

 


> Connect JVM client REPL not correctly shut down if killed
> -
>
> Key: SPARK-43287
> URL: https://issues.apache.org/jira/browse/SPARK-43287
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>
> How to reproduce:
>  # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
>  # in another terminal, kill the process `kill `
>  # Back to the client terminal, you can't see anything you type, but the 
> command still works
>  
>  
> {code:java}
> Spark session available as 'spark'.
>    _                  __      __                            __
>   / ___/   __/ /__   / /___      ___  _/ /_
>   \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
>  ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
> // .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/
>     /_/
> @ wei.liu@ip-10-110-19-234:~/oss-spark$ CONTRIBUTING.md  appveyor.yml  conf   
>                      examples            logs         resource-managers       
>              target
> LICENSE          artifacts     connector                   graphx             
>

[jira] [Created] (SPARK-43287) Connect JVM client REPL not correctly shut down if killed

2023-04-25 Thread Wei Liu (Jira)

Wei Liu created SPARK-43287:
---

 Summary: Connect JVM client REPL not correctly shut down if killed
 Key: SPARK-43287
 URL: https://issues.apache.org/jira/browse/SPARK-43287
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Wei Liu


How to reproduce:
 # Start a scala client `./connector/connect/bin/spark-connect-scala-client`
 # in another terminal, kill the process `kill `
 # Back to the client terminal, you can't see anything you type, but the 
command still works

```

Spark session available as 'spark'.

   _                  __      __                            __

  / ___/   __/ /__   / /___      ___  _/ /_

  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/

 ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_

// .___/\__,_/_/  /_/|_|   \/\/_/ /_/_/ /_/\___/\___/\__/

    /_/

 

@ *wei.liu*:*~/oss-spark*$ CONTRIBUTING.md  appveyor.yml  *conf*                
        *examples*            *logs*         *resource-managers*                
    *target*

LICENSE          *artifacts*     *connector*                   *graphx*         
     *mllib*        *sbin*                                 *tools*

LICENSE-binary   *assembly*      *core*                        *hadoop-cloud*   
     *mllib-local*  scalastyle-config.xml

NOTICE           *bin*           *data*                        
hs_err_pid9062.log  pom.xml      scalastyle-on-compile.generated.xml

NOTICE-binary    *binder*        dependency-reduced-pom.xml  *launcher*         
   *project*      *spark-warehouse*

*R*                *build*         *dev*                         *licenses*     
       *python*       *sql*

README.md        *common*        *docs*                        
*licenses-binary*     *repl*         *streaming*

*wei.liu*:*~/oss-spark*$ *wei.liu*:*~/oss-spark*$ *wei.liu*:*~/oss-spark*$ 

```

I ran 'ls' above, and clicked return multiple times

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43206) Connect Better StreamingQueryException

2023-04-21 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43206:

Summary: Connect Better StreamingQueryException  (was: Streaming query 
exception() also include stack trace)

> Connect Better StreamingQueryException
> --
>
> Key: SPARK-43206
> URL: https://issues.apache.org/jira/browse/SPARK-43206
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>
> [https://github.com/apache/spark/pull/40785#issuecomment-1515522281]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43206) Connect Better StreamingQueryException

2023-04-21 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715178#comment-17715178
 ] 

Wei Liu commented on SPARK-43206:
-

Also cause, offsets, stack trace...

 

> Connect Better StreamingQueryException
> --
>
> Key: SPARK-43206
> URL: https://issues.apache.org/jira/browse/SPARK-43206
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>
> [https://github.com/apache/spark/pull/40785#issuecomment-1515522281]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43206) Streaming query exception() also include stack trace

2023-04-21 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715171#comment-17715171
 ] 

Wei Liu commented on SPARK-43206:
-

I'll work on this.

To myself: don't forget jvm exceptions

 

> Streaming query exception() also include stack trace
> 
>
> Key: SPARK-43206
> URL: https://issues.apache.org/jira/browse/SPARK-43206
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>
> [https://github.com/apache/spark/pull/40785#issuecomment-1515522281]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43032) Add StreamingQueryManager API

2023-04-21 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715147#comment-17715147
 ] 

Wei Liu commented on SPARK-43032:
-

[https://github.com/apache/spark/pull/40861] still draft

 

> Add StreamingQueryManager API
> -
>
> Key: SPARK-43032
> URL: https://issues.apache.org/jira/browse/SPARK-43032
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Add StreamingQueryManager API. It would include API that can be directly 
> support. API like registering streaming listener will be handled separately. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43143) Scala: Add StreamingQuery awaitTermination() API

2023-04-21 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715149#comment-17715149
 ] 

Wei Liu commented on SPARK-43143:
-

I'm working on this

 

> Scala: Add StreamingQuery awaitTermination() API
> 
>
> Key: SPARK-43143
> URL: https://issues.apache.org/jira/browse/SPARK-43143
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43134) Add streaming query exception API in Scala

2023-04-21 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715148#comment-17715148
 ] 

Wei Liu commented on SPARK-43134:
-

I'm working on this

 

> Add streaming query exception API in Scala
> --
>
> Key: SPARK-43134
> URL: https://issues.apache.org/jira/browse/SPARK-43134
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43206) Streaming query exception() also include stack trace

2023-04-19 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43206:

Description: 
[https://github.com/apache/spark/pull/40785#issuecomment-1515522281]

 

 

> Streaming query exception() also include stack trace
> 
>
> Key: SPARK-43206
> URL: https://issues.apache.org/jira/browse/SPARK-43206
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>
> [https://github.com/apache/spark/pull/40785#issuecomment-1515522281]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43206) Streaming query exception() also include stack trace

2023-04-19 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43206:

Epic Link: SPARK-42938

> Streaming query exception() also include stack trace
> 
>
> Key: SPARK-43206
> URL: https://issues.apache.org/jira/browse/SPARK-43206
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>
> [https://github.com/apache/spark/pull/40785#issuecomment-1515522281]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43167) Streaming Connect console output format support

2023-04-19 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu resolved SPARK-43167.
-
Resolution: Not A Problem

automatically supported with existing Connect implementation

 

> Streaming Connect console output format support
> ---
>
> Key: SPARK-43167
> URL: https://issues.apache.org/jira/browse/SPARK-43167
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43206) Streaming query exception() also include stack trace

2023-04-19 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43206:

Environment: (was: 
https://github.com/apache/spark/pull/40785#issuecomment-1515522281

 )

> Streaming query exception() also include stack trace
> 
>
> Key: SPARK-43206
> URL: https://issues.apache.org/jira/browse/SPARK-43206
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43206) Streaming query exception() also include stack trace

2023-04-19 Thread Wei Liu (Jira)

Wei Liu created SPARK-43206:
---

 Summary: Streaming query exception() also include stack trace
 Key: SPARK-43206
 URL: https://issues.apache.org/jira/browse/SPARK-43206
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
 Environment: 
https://github.com/apache/spark/pull/40785#issuecomment-1515522281

 
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43167) Streaming Connect console output format support

2023-04-18 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713787#comment-17713787
 ] 

Wei Liu commented on SPARK-43167:
-

Should be:

```

Welcome to

                    __

     / __/__  ___ _/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 3.5.0-SNAPSHOT

      /_/

 

Using Python version 3.10.8 (main, Oct 13 2022 09:48:40)

Spark context Web UI available at http://10.10.105.160:4040

Spark context available as 'sc' (master = local[*], app id = 
local-1681856185012).

SparkSession available as 'spark'.

>>> spark



>>> q = 
>>> spark.readStream.format("rate").load().writeStream.format("console").start()

23/04/18 15:17:12 WARN ResolveWriteToStream: Temporary checkpoint location 
created which is deleted normally when the query didn't fail: 
/private/var/folders/9k/pbxb4_690wv4smwhwbzwmqkwgp/T/temporary-64d68668-bc6f-46aa-8ea5-b66ddae09f91.
 If it's required to delete it under any circumstances, please set 
spark.sql.streaming.forceDeleteTempCheckpointLocation to true. Important to 
know deleting temp checkpoint folder is best effort.

23/04/18 15:17:12 WARN ResolveWriteToStream: spark.sql.adaptive.enabled is not 
supported in streaming DataFrames/Datasets and will be disabled.

---                                     

Batch: 0

---

l+-+-+

|timestamp|value|

+-+-+

+-+-+

 

---                                     

Batch: 1

---

++-+

|           timestamp|value|

++-+

|2023-04-18 15:17:...|    0|

|2023-04-18 15:17:...|    1|

++-+

 

---

Batch: 2

---

++-+

|           timestamp|value|

++-+

|2023-04-18 15:17:...|    2|

|2023-04-18 15:17:...|    3|

++-+

 

---

Batch: 3

---

++-+

|           timestamp|value|

++-+

|2023-04-18 15:17:...|    4|

++-+

```

 

> Streaming Connect console output format support
> ---
>
> Key: SPARK-43167
> URL: https://issues.apache.org/jira/browse/SPARK-43167
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43167) Streaming Connect console output format support

2023-04-17 Thread Wei Liu (Jira)

Wei Liu created SPARK-43167:
---

 Summary: Streaming Connect console output format support
 Key: SPARK-43167
 URL: https://issues.apache.org/jira/browse/SPARK-43167
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43147) Python lint local config

2023-04-14 Thread Wei Liu (Jira)

Wei Liu created SPARK-43147:
---

 Summary: Python lint local config
 Key: SPARK-43147
 URL: https://issues.apache.org/jira/browse/SPARK-43147
 Project: Spark
  Issue Type: Task
  Components: PySpark, python
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42962) Allow access to stopped streaming queries

2023-04-11 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711155#comment-17711155
 ] 

Wei Liu commented on SPARK-42962:
-

Please be aware of the TODOs in connect query.py and readwriter.py when 
creating a PR for this

> Allow access to stopped streaming queries
> -
>
> Key: SPARK-42962
> URL: https://issues.apache.org/jira/browse/SPARK-42962
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>
> In Spark connect, when a query is stopped, the client can't access it 
> anymore. That implies they can not fetch any information about the query 
> after that. 
> The server might have to cache the queries for some time (upto the time the 
> session is closed). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-43015) Python: await_termination & exception() for Streaming Connect

2023-04-06 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709070#comment-17709070
 ] 

Wei Liu edited comment on SPARK-43015 at 4/6/23 10:05 PM:
--

Duplicate with SPARK-42960

 

 


was (Author: JIRAUSER295948):
SPARK-42960

 

> Python: await_termination & exception() for Streaming Connect
> -
>
> Key: SPARK-43015
> URL: https://issues.apache.org/jira/browse/SPARK-43015
> Project: Spark
>  Issue Type: Story
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43054) Support foreach() in streaming spark connect

2023-04-06 Thread Wei Liu (Jira)

Wei Liu created SPARK-43054:
---

 Summary: Support foreach() in streaming spark connect
 Key: SPARK-43054
 URL: https://issues.apache.org/jira/browse/SPARK-43054
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42951) Spark Connect: Streaming DataStreamReader API except table()

2023-04-06 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-42951:

Issue Type: Task  (was: Story)

> Spark Connect: Streaming DataStreamReader API except table()
> 
>
> Key: SPARK-42951
> URL: https://issues.apache.org/jira/browse/SPARK-42951
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43042) Spark Connect: Streaming readerwriter table() API

2023-04-06 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43042:

Issue Type: Task  (was: Story)

> Spark Connect: Streaming readerwriter table() API
> -
>
> Key: SPARK-43042
> URL: https://issues.apache.org/jira/browse/SPARK-43042
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43042) Spark Connect: Streaming readerwriter table() API

2023-04-05 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-43042:

Summary: Spark Connect: Streaming readerwriter table() API  (was: Spark 
Connect: Streaming DataStreamReader table() API)

> Spark Connect: Streaming readerwriter table() API
> -
>
> Key: SPARK-43042
> URL: https://issues.apache.org/jira/browse/SPARK-43042
> Project: Spark
>  Issue Type: Story
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42951) Spark Connect: Streaming DataStreamReader API except table()

2023-04-05 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-42951:

Affects Version/s: (was: 3.4.0)

> Spark Connect: Streaming DataStreamReader API except table()
> 
>
> Key: SPARK-42951
> URL: https://issues.apache.org/jira/browse/SPARK-42951
> Project: Spark
>  Issue Type: Story
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43042) Spark Connect: Streaming DataStreamReader table() API

2023-04-05 Thread Wei Liu (Jira)

Wei Liu created SPARK-43042:
---

 Summary: Spark Connect: Streaming DataStreamReader table() API
 Key: SPARK-43042
 URL: https://issues.apache.org/jira/browse/SPARK-43042
 Project: Spark
  Issue Type: Story
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42951) Spark Connect: Streaming DataStreamReader API except table()

2023-04-05 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-42951:

Epic Link: SPARK-42938

> Spark Connect: Streaming DataStreamReader API except table()
> 
>
> Key: SPARK-42951
> URL: https://issues.apache.org/jira/browse/SPARK-42951
> Project: Spark
>  Issue Type: Story
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 124 matches

Mail list logo