[jira] [Updated] (SPARK-22468) subtract creating empty DataFrame that isn't initialised properly
[ https://issues.apache.org/jira/browse/SPARK-22468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-22468: - Labels: bulk-closed (was: ) > subtract creating empty DataFrame that isn't initialised properly > -- > > Key: SPARK-22468 > URL: https://issues.apache.org/jira/browse/SPARK-22468 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.0 >Reporter: James Porritt >Priority: Major > Labels: bulk-closed > > I have an issue whereby a subtract between two DataFrames that will correctly > end up with an empty DataFrame, seemingly has the DataFrame not initialised > properly. > In my code I try and do a subtract both ways: > {code}x = a.subtract(b) > y = b.subtract(a){code} > I then do an .rdd.isEmpty() on both of them to check if I need to do > something with the results. Often the 'y' subtract will fail if the 'x' > subtract is non-empty. It's hard to reproduce however, I can't seem to reduce > it to a sample. One of the errors I will get is: > {noformat}File "", line 642, in > if not y.rdd.isEmpty(): > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line > 1377, in isEmpty > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line > 1343, in take > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", > line 992, in runJob > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line > 2455, in _jrdd > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line > 2390, in _wrap_function > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", > line 1386, in __call__ > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", > line 1372, in _get_args > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", > line 501, in convert > AttributeError: 'NoneType' object has no attribute 'add'{noformat} > Another error is: > {noformat}File "", line 642, in > if not y.rdd.isEmpty(): > File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in > isEmpty > File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in > take > File "/python/lib/pyspark.zip/pyspark/context.py", line 992, > in runJob > File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in > _jrdd > File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", > line 1133, in __call__ > File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, > in deco > File "/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line > 323, in get_return_value > py4j.protocol.Py4JError: An error occurred while calling o5751.asJavaRDD. > Trace: > py4j.Py4JException: Method asJavaRDD([]) does not exist > at > py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) > at > py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) > at py4j.Gateway.invoke(Gateway.java:272) > at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745){noformat} > Another error is: > {noformat} > if not y.rdd.isEmpty(): > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line > 1377, in isEmpty > File " dir>/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line > 385, in getNumPartitions > AttributeError: 'NoneType' object has no attribute 'size' > {noformat} > This is happening at multiple points in my code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22468) subtract creating empty DataFrame that isn't initialised properly
[ https://issues.apache.org/jira/browse/SPARK-22468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Porritt updated SPARK-22468: -- Description: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {code}x = a.subtract(b) y = b.subtract(a){code} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however, I can't seem to reduce it to a sample. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o5751.asJavaRDD. Trace: py4j.Py4JException: Method asJavaRDD([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:272) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745){noformat} Another error is: {noformat} if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 385, in getNumPartitions AttributeError: 'NoneType' object has no attribute 'size' {noformat} This is happening at multiple points in my code. was: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {code}x = a.subtract(b) y = b.subtract(a){code} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however, I can't seem to reduce it to a sample. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty Fi
[jira] [Updated] (SPARK-22468) subtract creating empty DataFrame that isn't initialised properly
[ https://issues.apache.org/jira/browse/SPARK-22468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Porritt updated SPARK-22468: -- Description: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {code}x = a.subtract(b) y = b.subtract(a){code} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however, I can't seem to reduce it to a sample. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o5751.asJavaRDD. Trace: py4j.Py4JException: Method asJavaRDD([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:272) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745){noformat} Sometimes the error will complain about it not having a 'size' parameter. was: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {code}x = a.subtract(b) y = b.subtract(a){code} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py
[jira] [Updated] (SPARK-22468) subtract creating empty DataFrame that isn't initialised properly
[ https://issues.apache.org/jira/browse/SPARK-22468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Porritt updated SPARK-22468: -- Description: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {code}x = a.subtract(b) y = b.subtract(a){code} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o5751.asJavaRDD. Trace: py4j.Py4JException: Method asJavaRDD([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:272) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745){noformat} Sometimes the error will complain about it not having a 'size' parameter. was: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {code:python}x = a.subtract(b) y = b.subtract(a){code} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/pyt
[jira] [Updated] (SPARK-22468) subtract creating empty DataFrame that isn't initialised properly
[ https://issues.apache.org/jira/browse/SPARK-22468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Porritt updated SPARK-22468: -- Description: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {{x = a.subtract(b) y = b.subtract(a)}} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o5751.asJavaRDD. Trace: py4j.Py4JException: Method asJavaRDD([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:272) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745){noformat} Sometimes the error will complain about it not having a 'size' parameter. was: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {{x = a.subtract(b) y = b.subtract(a)}} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. One of the errors I will get is: {{ File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'}} Another error is: {{ File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/python/lib/py4j-0.10.4-src.zip/py4j/protocol.p
[jira] [Updated] (SPARK-22468) subtract creating empty DataFrame that isn't initialised properly
[ https://issues.apache.org/jira/browse/SPARK-22468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Porritt updated SPARK-22468: -- Description: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {code:python}x = a.subtract(b) y = b.subtract(a){code} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o5751.asJavaRDD. Trace: py4j.Py4JException: Method asJavaRDD([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:272) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745){noformat} Sometimes the error will complain about it not having a 'size' parameter. was: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {{x = a.subtract(b) y = b.subtract(a)}} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. One of the errors I will get is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'{noformat} Another error is: {noformat}File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/python/lib/
[jira] [Updated] (SPARK-22468) subtract creating empty DataFrame that isn't initialised properly
[ https://issues.apache.org/jira/browse/SPARK-22468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Porritt updated SPARK-22468: -- Description: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: {{x = a.subtract(b) y = b.subtract(a)}} I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. One of the errors I will get is: {{ File "", line 642, in if not y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add'}} Another error is: {{ File "", line 642, in if not y.rdd.isEmpty(): File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/python/lib/pyspark.zip/pyspark/rdd.py", line 2458, in _jrdd File "/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o5751.asJavaRDD. Trace: py4j.Py4JException: Method asJavaRDD([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:272) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)}} Sometimes the error will complain about it not having a 'size' parameter. was: I have an issue whereby a subtract between two DataFrames that will correctly end up with an empty DataFrame, seemingly has the DataFrame not initialised properly. In my code I try and do a subtract both ways: x = a.subtract(b) y = b.subtract(a) I then do an .rdd.isEmpty() on both of them to check if I need to do something with the results. Often the 'y' subtract will fail if the 'x' subtract is non-empty. It's hard to reproduce however. The error I will get is: File "", line 642, in if y.rdd.isEmpty(): File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1377, in isEmpty File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 1343, in take File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 992, in runJob File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd File "/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2390, in _wrap_function File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1386, in __call__ File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1372, in _get_args File "/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_collections.py", line 501, in convert AttributeError: 'NoneType' object has no attribute 'add' Sometimes the error will complain about it not having a 'size' parameter. > subtract creating empty DataFrame that isn't initialised properly > -- > > Key: SPARK-22468 > URL: https://issues.apache.org/jira/browse/SPARK-22468 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.0 >Reporter: James Porritt > > I have an issue whereby a subtract between two DataFrames that will correctly > end up with an empty DataFrame, seemingly has the DataFra