GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22480
[SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests failed on Python 3.6 and macOS High Serria ## What changes were proposed in this pull request? This PR does not fix the problem itself but just target to add few comments to run PySpark tests on Python 3.6 and macOS High Serria since it actually blocks to run tests on Mac. it does not target to fix the problem yet. I am pretty sure there are some guys already debugging this. The problem here looks because we fork python workers and the workers somehow are able to call Objective-C libraries in some codes at CPython's implementation. I suspect `pickle` in Python 3.6 has some changes: https://github.com/apache/spark/blob/master/python/pyspark/serializers.py#L577 After debugging, looks the problem is there in forked worker. This link (http://sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html) and this link (https://blog.phusion.nl/2017/10/13/why-ruby-app-servers-break-on-macos-high-sierra-and-what-can-be-done-about-it/) were helpful for me to understand this. I am still debugging this but my guts say it's difficult to fix or workaround within Spark side. ## How was this patch tested? Manually tested: Before: ``` /usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py:766: ResourceWarning: subprocess 27563 is still running ResourceWarning, source=self) [Stage 0:> (0 + 1) / 1]objc[27586]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called. objc[27586]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug. ERROR ====================================================================== ERROR: test_streaming_foreach_with_simple_function (pyspark.sql.tests.SQLTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/.../spark/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o54.processAllAvailable. : org.apache.spark.sql.streaming.StreamingQueryException: Writing job aborted. === Streaming Query === Identifier: [id = f508d634-407c-4232-806b-70e54b055c42, runId = 08d1435b-5358-4fb6-b167-811584a3163e] Current Committed Offsets: {} Current Available Offsets: {FileStreamSource[file:/var/folders/71/484zt4z10ks1vydt03bhp6hr0000gp/T/tmpolebys1s]: {"logOffset":0}} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: FileStreamSource[file:/var/folders/71/484zt4z10ks1vydt03bhp6hr0000gp/T/tmpolebys1s] at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189) Caused by: org.apache.spark.SparkException: Writing job aborted. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:91) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ``` After: ``` test_streaming_foreach_with_simple_function (pyspark.sql.tests.SQLTests) ... ok ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-25473 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22480.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22480 ---- commit 97e95afeba368dd06f747665c41f96a50141305a Author: hyukjinkwon <gurwls223@...> Date: 2018-09-20T03:03:42Z Add a note for streaming forech tests ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org