Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-29 Thread via GitHub


chaoqin-li1123 commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2138544456

   Yes, but these imports are wrapped in if not is_remote_only(), so spark 
connect test should already skip these import statement. This is weird, I will 
take another look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-29 Thread via GitHub


HyukjinKwon commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2138460399

   ```
   from pyspark.core.rdd import RDD, RDDBarrier
   from pyspark.core.files import SparkFiles
   from pyspark.core.status import StatusTracker, SparkJobInfo, 
SparkStageInfo
   from pyspark.core.broadcast import Broadcast
   from pyspark.core import rdd, files, status, broadcast
   ```
   
   are not supported in pure Python library for Spark Connect so we shouldn't 
use them for Spark Connect related modules.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-29 Thread via GitHub


chaoqin-li1123 commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2138108922

   This import statement is supposed to be skipped for spark connect test 
https://github.com/apache/spark/blob/master/python/pyspark/__init__.py#L55
   Is the is_remote_only() function working correctly?
   @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-21 Thread via GitHub


HyukjinKwon commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2123879472

   @chaoqin-li1123 gentle ping on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-08 Thread via GitHub


HyukjinKwon commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2099809962

   Should follow 
https://github.com/apache/spark/blob/master/.github/workflows/build_python_connect.yml#L69-L94
 steps, and try to reproduce why it fails. Spark Connect doesn't need to import 
`pyspark` but only `pyspark.sql`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-07 Thread via GitHub


chaoqin-li1123 commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2099774647

   This seems to be broken in the main function of pyspark init(), what is the 
expected action item we should take? @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-06 Thread via GitHub


HyukjinKwon commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2097190957

   follow 
https://github.com/apache/spark/blob/master/.github/workflows/build_python_connect.yml#L80-L113
 to reproduce the failure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-06 Thread via GitHub


HyukjinKwon commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2097139796

   py4j shouldn't be referred for connect test. can we move them, and import 
when it's actually used?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub


chaoqin-li1123 commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2095178573

   @HyukjinKwon both test_python_datasource, test_python_streaming_datasource 
will fail with the same error if py4j*.zip is removed. 
   
   > Traceback (most recent call last):
   >   File "", line 189, in _run_module_as_main
   >   File "", line 112, in _get_module_details
   >   File "/Users/chaoqin.li/spark/python/pyspark/__init__.py", line 58, in 

   > from pyspark.core.status import StatusTracker, SparkJobInfo, 
SparkStageInfo
   >   File "/Users/chaoqin.li/spark/python/pyspark/core/status.py", line 22, 
in 
   > from py4j.java_collections import JavaArray
   > ModuleNotFoundError: No module named 'py4j'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub


dongjoon-hyun commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2095027198

   Thank you for checking and mitigating this by reverting, @HyukjinKwon .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub


HyukjinKwon commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2095022472

   @chaoqin-li1123 Seems like this test does not work with pure Python library. 
Can you see if the tests pass after removing `python/lib/py4j*.zip`?
   
   Let me revert this for now because we're cutting the preview very soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-04-11 Thread via GitHub


dongjoon-hyun closed pull request #45950: [SPARK-4][PYTHON][SS][TESTS] Add 
spark connect test for python streaming data source
URL: https://github.com/apache/spark/pull/45950


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-04-09 Thread via GitHub


chaoqin-li1123 commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2046225155

   cc @allisonwang-db @HeartSaVioR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-04-09 Thread via GitHub


xinrong-meng commented on PR #45950:
URL: https://github.com/apache/spark/pull/45950#issuecomment-2045843912

   LGTM once CI pass, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-04-09 Thread via GitHub


chaoqin-li1123 commented on code in PR #45950:
URL: https://github.com/apache/spark/pull/45950#discussion_r1558088128


##
python/pyspark/sql/tests/connect/test_parity_python_streaming_datasource.py:
##
@@ -0,0 +1,35 @@
+#

Review Comment:
   Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org