Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-29 Thread via GitHub
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2138544456 Yes, but these imports are wrapped in if not is_remote_only(), so spark connect test should already skip these import statement. This is weird, I will take another look. --

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-29 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2138460399 ``` from pyspark.core.rdd import RDD, RDDBarrier from pyspark.core.files import SparkFiles from pyspark.core.status import StatusTracker, SparkJobInfo,

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-29 Thread via GitHub
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2138108922 This import statement is supposed to be skipped for spark connect test https://github.com/apache/spark/blob/master/python/pyspark/__init__.py#L55 Is the is_remote_only()

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-21 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2123879472 @chaoqin-li1123 gentle ping on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2099809962 Should follow https://github.com/apache/spark/blob/master/.github/workflows/build_python_connect.yml#L69-L94 steps, and try to reproduce why it fails. Spark Connect doesn't need to

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-07 Thread via GitHub
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2099774647 This seems to be broken in the main function of pyspark init(), what is the expected action item we should take? @HyukjinKwon -- This is an automated message from the Apache

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2097190957 follow https://github.com/apache/spark/blob/master/.github/workflows/build_python_connect.yml#L80-L113 to reproduce the failure -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2097139796 py4j shouldn't be referred for connect test. can we move them, and import when it's actually used? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095178573 @HyukjinKwon both test_python_datasource, test_python_streaming_datasource will fail with the same error if py4j*.zip is removed. > Traceback (most recent call last): >

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095027198 Thank you for checking and mitigating this by reverting, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095022472 @chaoqin-li1123 Seems like this test does not work with pure Python library. Can you see if the tests pass after removing `python/lib/py4j*.zip`? Let me revert this for now

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun closed pull request #45950: [SPARK-4][PYTHON][SS][TESTS] Add spark connect test for python streaming data source URL: https://github.com/apache/spark/pull/45950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-04-09 Thread via GitHub
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2046225155 cc @allisonwang-db @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-04-09 Thread via GitHub
xinrong-meng commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2045843912 LGTM once CI pass, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-04-09 Thread via GitHub
chaoqin-li1123 commented on code in PR #45950: URL: https://github.com/apache/spark/pull/45950#discussion_r1558088128 ## python/pyspark/sql/tests/connect/test_parity_python_streaming_datasource.py: ## @@ -0,0 +1,35 @@ +# Review Comment: Added. -- This is an automated