Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2138544456 Yes, but these imports are wrapped in if not is_remote_only(), so spark connect test should already skip these import statement. This is weird, I will take another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2138460399 ``` from pyspark.core.rdd import RDD, RDDBarrier from pyspark.core.files import SparkFiles from pyspark.core.status import StatusTracker, SparkJobInfo, SparkStageInfo from pyspark.core.broadcast import Broadcast from pyspark.core import rdd, files, status, broadcast ``` are not supported in pure Python library for Spark Connect so we shouldn't use them for Spark Connect related modules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2138108922 This import statement is supposed to be skipped for spark connect test https://github.com/apache/spark/blob/master/python/pyspark/__init__.py#L55 Is the is_remote_only() function working correctly? @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2123879472 @chaoqin-li1123 gentle ping on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2099809962 Should follow https://github.com/apache/spark/blob/master/.github/workflows/build_python_connect.yml#L69-L94 steps, and try to reproduce why it fails. Spark Connect doesn't need to import `pyspark` but only `pyspark.sql`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2099774647 This seems to be broken in the main function of pyspark init(), what is the expected action item we should take? @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2097190957 follow https://github.com/apache/spark/blob/master/.github/workflows/build_python_connect.yml#L80-L113 to reproduce the failure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2097139796 py4j shouldn't be referred for connect test. can we move them, and import when it's actually used? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095178573 @HyukjinKwon both test_python_datasource, test_python_streaming_datasource will fail with the same error if py4j*.zip is removed. > Traceback (most recent call last): > File "", line 189, in _run_module_as_main > File "", line 112, in _get_module_details > File "/Users/chaoqin.li/spark/python/pyspark/__init__.py", line 58, in > from pyspark.core.status import StatusTracker, SparkJobInfo, SparkStageInfo > File "/Users/chaoqin.li/spark/python/pyspark/core/status.py", line 22, in > from py4j.java_collections import JavaArray > ModuleNotFoundError: No module named 'py4j' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
dongjoon-hyun commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095027198 Thank you for checking and mitigating this by reverting, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095022472 @chaoqin-li1123 Seems like this test does not work with pure Python library. Can you see if the tests pass after removing `python/lib/py4j*.zip`? Let me revert this for now because we're cutting the preview very soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
dongjoon-hyun closed pull request #45950: [SPARK-4][PYTHON][SS][TESTS] Add spark connect test for python streaming data source URL: https://github.com/apache/spark/pull/45950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2046225155 cc @allisonwang-db @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
xinrong-meng commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2045843912 LGTM once CI pass, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]
chaoqin-li1123 commented on code in PR #45950: URL: https://github.com/apache/spark/pull/45950#discussion_r1558088128 ## python/pyspark/sql/tests/connect/test_parity_python_streaming_datasource.py: ## @@ -0,0 +1,35 @@ +# Review Comment: Added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org