[ https://issues.apache.org/jira/browse/SPARK-24034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emilio Dorigatti updated SPARK-24034: ------------------------------------- Description: Consider the following code {noformat} def mapper(xx): if xx % 2 == 0: raise StopIteration() else: return xx sc.parallelize(range(100)).map(mapper).collect() {noformat} The result I get is {{[57, 71, 85]}} I think it happens because {{map }}is implemented in terms of {{mapPartitionsWithIndex}} using a custom iterator, so the {{StopIteration}} raised by the mapper is handled by that iterator. I think this should be raised to the user instead. I think I can take care of this, if I am allowed to (first time I contribute, not sure how it works) NB: this may be the underlying cause of https://issues.apache.org/jira/browse/SPARK-23754 was: Consider the following code {noformat} def mapper(xx): if xx % 2 == 0: raise StopIteration() else: return xx sc.parallelize(range(100)).map(mapper)collect() {noformat} The result I get is {{[57, 71, 85]}} I think it happens because {{map }}is implemented in terms of {{mapPartitionsWithIndex}} using a custom iterator, so the {{StopIteration}} raised by the mapper is handled by that iterator. I think this should be raised to the user instead. I think I can take care of this, if I am allowed to (first time I contribute, not sure how it works) NB: this may be the underlying cause of https://issues.apache.org/jira/browse/SPARK-23754 > StopIteration in pyspark mapper results in partial results > ---------------------------------------------------------- > > Key: SPARK-24034 > URL: https://issues.apache.org/jira/browse/SPARK-24034 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0 > Reporter: Emilio Dorigatti > Priority: Major > > Consider the following code > {noformat} > def mapper(xx): > if xx % 2 == 0: > raise StopIteration() > else: > return xx > sc.parallelize(range(100)).map(mapper).collect() > {noformat} > The result I get is {{[57, 71, 85]}} > I think it happens because {{map }}is implemented in terms of > {{mapPartitionsWithIndex}} using a custom iterator, so the {{StopIteration}} > raised by the mapper is handled by that iterator. I think this should be > raised to the user instead. > I think I can take care of this, if I am allowed to (first time I contribute, > not sure how it works) > NB: this may be the underlying cause of > https://issues.apache.org/jira/browse/SPARK-23754 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org