Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1551#discussion_r15569151
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
    @@ -344,7 +345,12 @@ private[spark] object PythonRDD extends Logging {
                   throw new SparkException("Unexpected Tuple2 element type " + 
pair._1.getClass)
               }
             case other =>
    -          throw new SparkException("Unexpected element type " + 
first.getClass)
    +          if (other == null) {
    +            dataOut.writeInt(SpecialLengths.NULL)
    +            writeIteratorToStream(iter, dataOut)
    --- End diff --
    
    Again, sorry, I don't think this improves stability:
    1) Users are not supposed to call private APIs. In fact even *Scala* code 
can't call PythonRDD because that is private[spark] -- it's just an artifact of 
the way Scala implements package-private that the class becomes public in Java. 
If you'd like support for UDFs we need to add that as a separate, top-level 
feature.
    2) This change would mask bugs in the current way we write Python 
converters. Our current converters only pass in Strings and arrays of bytes, 
which shouldn't be null. (For datasets that contain null they convert it to a 
picked form of None already). This means that if someone introduces a bug in 
one of our existing code paths, that bug will be harder to fix because instead 
of being an NPE, it will be some weird value coming out in Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to