[ https://issues.apache.org/jira/browse/SPARK-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584761#comment-15584761 ]
Xiangrui Meng commented on SPARK-14393: --------------------------------------- This is a bigger issue. It would happen with {`monotonically_increasing_id`, `rand`, `randn`, etc} x {`coalesce`, `union`, etc}. The root cause is that the partition ID used to initialize the operator is not the partition ID associated with the DataFrame where the column was originally defined, which is expected by users. cc [~r...@databricks.com] [~yhuai] > monotonicallyIncreasingId not monotonically increasing with downstream > coalesce > ------------------------------------------------------------------------------- > > Key: SPARK-14393 > URL: https://issues.apache.org/jira/browse/SPARK-14393 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.0 > Reporter: Jason Piper > > When utilising monotonicallyIncreasingId with a coalesce, it appears that > every partition uses the same offset (0) leading to non-monotonically > increasing IDs. > See examples below > {code} > >>> sqlContext.range(10).select(monotonicallyIncreasingId()).show() > +---------------------------+ > |monotonicallyincreasingid()| > +---------------------------+ > | 25769803776| > | 51539607552| > | 77309411328| > | 103079215104| > | 128849018880| > | 163208757248| > | 188978561024| > | 214748364800| > | 240518168576| > | 266287972352| > +---------------------------+ > >>> sqlContext.range(10).select(monotonicallyIncreasingId()).coalesce(1).show() > +---------------------------+ > |monotonicallyincreasingid()| > +---------------------------+ > | 0| > | 0| > | 0| > | 0| > | 0| > | 0| > | 0| > | 0| > | 0| > | 0| > +---------------------------+ > >>> sqlContext.range(10).repartition(5).select(monotonicallyIncreasingId()).coalesce(1).show() > +---------------------------+ > |monotonicallyincreasingid()| > +---------------------------+ > | 0| > | 1| > | 0| > | 0| > | 1| > | 2| > | 3| > | 0| > | 1| > | 2| > +---------------------------+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org