[jira] [Assigned] (SPARK-25473) PySpark ForeachWriter test fails on Python 3.6 and macOS High Serria

2018-09-22 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-25473:


Assignee: Hyukjin Kwon

> PySpark ForeachWriter test fails on Python 3.6 and macOS High Serria
> 
>
> Key: SPARK-25473
> URL: https://issues.apache.org/jira/browse/SPARK-25473
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 2.5.0
>
>
> {code}
> PYSPARK_PYTHON=python3.6 SPARK_TESTING=1 ./bin/pyspark pyspark.sql.tests 
> SQLTests
> {code}
> {code}
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> /usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py:766:
>  ResourceWarning: subprocess 27563 is still running
>   ResourceWarning, source=self)
> [Stage 0:>  (0 + 1) / 
> 1]objc[27586]: +[__NSPlaceholderDictionary initialize] may have been in 
> progress in another thread when fork() was called.
> objc[27586]: +[__NSPlaceholderDictionary initialize] may have been in 
> progress in another thread when fork() was called. We cannot safely call it 
> or ignore it in the fork() child process. Crashing instead. Set a breakpoint 
> on objc_initializeAfterForkError to debug.
> ERROR
> ==
> ERROR: test_streaming_foreach_with_simple_function 
> (pyspark.sql.tests.SQLTests)
> --
> Traceback (most recent call last):
>   File "/.../spark/python/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
>   File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 
> 328, in get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o54.processAllAvailable.
> : org.apache.spark.sql.streaming.StreamingQueryException: Writing job aborted.
> === Streaming Query ===
> Identifier: [id = f508d634-407c-4232-806b-70e54b055c42, runId = 
> 08d1435b-5358-4fb6-b167-811584a3163e]
> Current Committed Offsets: {}
> Current Available Offsets: 
> {FileStreamSource[file:/var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/tmpolebys1s]:
>  {"logOffset":0}}
> Current State: ACTIVE
> Thread State: RUNNABLE
> Logical Plan:
> FileStreamSource[file:/var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/tmpolebys1s]
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
> Caused by: org.apache.spark.SparkException: Writing job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:91)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:294)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3384)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
>   at 
> 

[jira] [Assigned] (SPARK-25473) PySpark ForeachWriter test fails on Python 3.6 and macOS High Serria

2018-09-20 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25473:


Assignee: (was: Apache Spark)

> PySpark ForeachWriter test fails on Python 3.6 and macOS High Serria
> 
>
> Key: SPARK-25473
> URL: https://issues.apache.org/jira/browse/SPARK-25473
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> PYSPARK_PYTHON=python3.6 SPARK_TESTING=1 ./bin/pyspark pyspark.sql.tests 
> SQLTests
> {code}
> {code}
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> /usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py:766:
>  ResourceWarning: subprocess 27563 is still running
>   ResourceWarning, source=self)
> [Stage 0:>  (0 + 1) / 
> 1]objc[27586]: +[__NSPlaceholderDictionary initialize] may have been in 
> progress in another thread when fork() was called.
> objc[27586]: +[__NSPlaceholderDictionary initialize] may have been in 
> progress in another thread when fork() was called. We cannot safely call it 
> or ignore it in the fork() child process. Crashing instead. Set a breakpoint 
> on objc_initializeAfterForkError to debug.
> ERROR
> ==
> ERROR: test_streaming_foreach_with_simple_function 
> (pyspark.sql.tests.SQLTests)
> --
> Traceback (most recent call last):
>   File "/.../spark/python/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
>   File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 
> 328, in get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o54.processAllAvailable.
> : org.apache.spark.sql.streaming.StreamingQueryException: Writing job aborted.
> === Streaming Query ===
> Identifier: [id = f508d634-407c-4232-806b-70e54b055c42, runId = 
> 08d1435b-5358-4fb6-b167-811584a3163e]
> Current Committed Offsets: {}
> Current Available Offsets: 
> {FileStreamSource[file:/var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/tmpolebys1s]:
>  {"logOffset":0}}
> Current State: ACTIVE
> Thread State: RUNNABLE
> Logical Plan:
> FileStreamSource[file:/var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/tmpolebys1s]
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
> Caused by: org.apache.spark.SparkException: Writing job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:91)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:294)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3384)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
>   at 
> 

[jira] [Assigned] (SPARK-25473) PySpark ForeachWriter test fails on Python 3.6 and macOS High Serria

2018-09-20 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25473:


Assignee: Apache Spark

> PySpark ForeachWriter test fails on Python 3.6 and macOS High Serria
> 
>
> Key: SPARK-25473
> URL: https://issues.apache.org/jira/browse/SPARK-25473
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> PYSPARK_PYTHON=python3.6 SPARK_TESTING=1 ./bin/pyspark pyspark.sql.tests 
> SQLTests
> {code}
> {code}
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> /usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py:766:
>  ResourceWarning: subprocess 27563 is still running
>   ResourceWarning, source=self)
> [Stage 0:>  (0 + 1) / 
> 1]objc[27586]: +[__NSPlaceholderDictionary initialize] may have been in 
> progress in another thread when fork() was called.
> objc[27586]: +[__NSPlaceholderDictionary initialize] may have been in 
> progress in another thread when fork() was called. We cannot safely call it 
> or ignore it in the fork() child process. Crashing instead. Set a breakpoint 
> on objc_initializeAfterForkError to debug.
> ERROR
> ==
> ERROR: test_streaming_foreach_with_simple_function 
> (pyspark.sql.tests.SQLTests)
> --
> Traceback (most recent call last):
>   File "/.../spark/python/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
>   File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 
> 328, in get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o54.processAllAvailable.
> : org.apache.spark.sql.streaming.StreamingQueryException: Writing job aborted.
> === Streaming Query ===
> Identifier: [id = f508d634-407c-4232-806b-70e54b055c42, runId = 
> 08d1435b-5358-4fb6-b167-811584a3163e]
> Current Committed Offsets: {}
> Current Available Offsets: 
> {FileStreamSource[file:/var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/tmpolebys1s]:
>  {"logOffset":0}}
> Current State: ACTIVE
> Thread State: RUNNABLE
> Logical Plan:
> FileStreamSource[file:/var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/tmpolebys1s]
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
> Caused by: org.apache.spark.SparkException: Writing job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:91)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:294)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3384)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
>   at 
>