[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2015-01-08 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270097#comment-14270097
 ] 

Davies Liu commented on SPARK-1630:
---

We hit this issue with Kafka Python API, it will be fixed in 
https://github.com/apache/spark/pull/3715

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
>Assignee: Davies Liu
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2014-07-29 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077945#comment-14077945
 ] 

Josh Rosen commented on SPARK-1630:
---

Hi Kalpit,

Thanks for sharing your use-case; it seems like a reasonable thing that we 
should support.

As part of [~mlnick]'s patch for [SPARK-1416], we now have 
{{SerDeUtil.rddToPython}} for converting pair RDDs of arbitrary Java objects 
into RDDs that can be read by PySpark.  One alternative to this fix proposed 
here would be to add a similar converter from non-pair-RDDs to PythonRDDs that 
used the Java-side pickle library to pickle the strings as nulls.  However, 
this could have a negative performance impact since we'd be passing pickled 
objects instead of UTF-8 strings.

Given that the current proposed fix only affects RDD and seems unlikely 
to mask serious bugs, I'm inclined to merge it.

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
>Assignee: Davies Liu
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2014-07-29 Thread Kalpit Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077899#comment-14077899
 ] 

Kalpit Shah commented on SPARK-1630:


Here's my case that led me to filing this bug and a patch : I have a custom RDD 
which is implemented in Java. It implements compute() and partitions() API with 
semantics specific to our application. For some of our cases, CustomRDD 
could have NULL values. In those cases, we didn't have a way to access the same 
in Python.

IMO, this patch helps serve two purposes :
1. If a CustomRDD is implemented using Java or Scala and a user wishes 
to access this RDD in Python,R or some other language, they will be able to do 
so without loss of information (NULLs preserved).
2. It facilitates preservation of cardinality and order of elements within a 
partition.

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
>Assignee: Davies Liu
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2014-07-29 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077492#comment-14077492
 ] 

Josh Rosen commented on SPARK-1630:
---

In the current Spark codebase, the PythonRDD constructor is only called from 
Python code.  In order for a user-created Scala/Java RDD to be passed 
to PySpark, the user would need to have dug deep into PySpark's private APIs to 
call out to Java/Scala code to create a transformed RDD and wrap it into a 
PythonRDD.  Given this, is it fair to say that any in-the-wild NPEs encountered 
here by using Spark's public APIs are due to bugs in Spark/PySpark, or is there 
a case that I'm overlooking (e.g. is TextInputFormat allowed to return nulls?)?

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
>Assignee: Davies Liu
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2014-07-29 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077465#comment-14077465
 ] 

Davies Liu commented on SPARK-1630:
---

If a RDD is generated in Scala/Java by user code, such as rdd.map(user_func), 
it's possible to generate an Null in it (depend on some corner cases), then it 
will cause NPE.

Given RDD[String], it's correct that some row will be null, so it's better 
handle it gracefully.

This issue can not be reproduced in pure Python code.

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
>Assignee: Davies Liu
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2014-07-29 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077452#comment-14077452
 ] 

Josh Rosen commented on SPARK-1630:
---

We aren't passing completely arbitrary iterators of Java objects to 
{{writeIteratorToStream}}; instead, we only handle iterators of strings and 
byte arrays.  Nulls in data read from Hadoop input formats should already be 
converted to None by the Java pickling code.  Do you have an example where 
PythonRDD receives a null element and it's not due to a bug?  I'm worried that 
this patch will mask the presence of other errors.

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
>Assignee: Davies Liu
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2014-07-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072235#comment-14072235
 ] 

Apache Spark commented on SPARK-1630:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/1551

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
>Assignee: Davies Liu
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2014-07-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14067729#comment-14067729
 ] 

Apache Spark commented on SPARK-1630:
-

User 'kalpit' has created a pull request for this issue:
[https://github.com/apache/spark/pull/554|https://github.com/apache/spark/pull/554]

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
> Fix For: 1.1.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully

2014-04-25 Thread Kalpit Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981458#comment-13981458
 ] 

Kalpit Shah commented on SPARK-1630:


https://github.com/apache/spark/pull/554

> PythonRDDs don't handle nulls gracefully
> 
>
> Key: SPARK-1630
> URL: https://issues.apache.org/jira/browse/SPARK-1630
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 0.9.0, 0.9.1
>Reporter: Kalpit Shah
> Fix For: 1.0.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If PythonRDDs receive a null element in iterators, they currently NPE. It 
> would be better do log a DEBUG message and skip the write of NULL elements.
> Here are the 2 stack traces :
> 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread 
> Thread[stdin writer for python,5,main]
> java.lang.NullPointerException
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267)
>   at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88)
> -
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.writeToFile.
> : java.lang.NullPointerException
>   at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285)
>   at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280)
>   at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:744)  



--
This message was sent by Atlassian JIRA
(v6.2#6252)