Github user e-dorigatti commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21383#discussion_r190603773
  
    --- Diff: python/pyspark/tests.py ---
    @@ -1246,6 +1277,25 @@ def test_pipe_unicode(self):
             result = rdd.pipe('cat').collect()
             self.assertEqual(data, result)
     
    +    def test_stopiteration_in_client_code(self):
    +
    +        def stopit(*x):
    +            raise StopIteration()
    +
    +        seq_rdd = self.sc.parallelize(range(10))
    +        keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10))
    +        exc = Py4JJavaError, RuntimeError
    --- End diff --
    
    Both of them can happen, depending on where the `StopIteration` is raised. 
Consider for example `RDD.reduce`: if the exception is raised when reducing 
inside a partition, the user will get a `Py4JJavaError`, but if the error is 
raised when reducing locally the results 
[here](https://github.com/e-dorigatti/spark/blob/fix_spark_23754/python/pyspark/rdd.py#L858),
 it will be a `RuntimeError` (the one we raise in `fail_on_stopiteration`)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to