Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190603773 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,25 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) + def test_stopiteration_in_client_code(self): + + def stopit(*x): + raise StopIteration() + + seq_rdd = self.sc.parallelize(range(10)) + keyed_rdd = self.sc.parallelize((x % 2, x) for x in range(10)) + exc = Py4JJavaError, RuntimeError --- End diff -- Both of them can happen, depending on where the `StopIteration` is raised. Consider for example `RDD.reduce`: if the exception is raised when reducing inside a partition, the user will get a `Py4JJavaError`, but if the error is raised when reducing locally the results [here](https://github.com/e-dorigatti/spark/blob/fix_spark_23754/python/pyspark/rdd.py#L858), it will be a `RuntimeError` (the one we raise in `fail_on_stopiteration`)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org