[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing
[ https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425607#comment-15425607 ] Ted Yu commented on SPARK-10868: bq. this makes sense What was the reasoning behind the initial assessment ? > monotonicallyIncreasingId() supports offset for indexing > > > Key: SPARK-10868 > URL: https://issues.apache.org/jira/browse/SPARK-10868 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: Martin Senne > > With SPARK-7135 and https://github.com/apache/spark/pull/5709 > `monotonicallyIncreasingID()` allows to create an index column with unique > ids. The indexing always starts at 0 (no offset). > *Feature wish* > Having a parameter `offset`, such that the function can be used as > {{monotonicallyIncreasingID( offset )}} > and indexing _starts at *offset* instead of 0_. > *Use-case* > Add rows to a DataFrame that is already written to a DB (via > _.write.jdbc(...)_). > In detail: > - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 > in that column is existent in DB. > - New rows need to be added to *A*. This included > -- Creating a DataFrame *A'* with new rows, but without id column > -- Add the index column to *A'* - this time starting at *200*, as there are > already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 > ) is required.*) > -- union *A* and *A'* > -- store into DB -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing
[ https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414164#comment-15414164 ] Ted Yu commented on SPARK-10868: I can pick up this one if Martin hasn't started working on it. > monotonicallyIncreasingId() supports offset for indexing > > > Key: SPARK-10868 > URL: https://issues.apache.org/jira/browse/SPARK-10868 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: Martin Senne > > With SPARK-7135 and https://github.com/apache/spark/pull/5709 > `monotonicallyIncreasingID()` allows to create an index column with unique > ids. The indexing always starts at 0 (no offset). > *Feature wish* > Having a parameter `offset`, such that the function can be used as > {{monotonicallyIncreasingID( offset )}} > and indexing _starts at *offset* instead of 0_. > *Use-case* > Add rows to a DataFrame that is already written to a DB (via > _.write.jdbc(...)_). > In detail: > - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 > in that column is existent in DB. > - New rows need to be added to *A*. This included > -- Creating a DataFrame *A'* with new rows, but without id column > -- Add the index column to *A'* - this time starting at *200*, as there are > already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 > ) is required.*) > -- union *A* and *A'* > -- store into DB -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing
[ https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414162#comment-15414162 ] Apache Spark commented on SPARK-10868: -- User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/14568 > monotonicallyIncreasingId() supports offset for indexing > > > Key: SPARK-10868 > URL: https://issues.apache.org/jira/browse/SPARK-10868 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: Martin Senne > > With SPARK-7135 and https://github.com/apache/spark/pull/5709 > `monotonicallyIncreasingID()` allows to create an index column with unique > ids. The indexing always starts at 0 (no offset). > *Feature wish* > Having a parameter `offset`, such that the function can be used as > {{monotonicallyIncreasingID( offset )}} > and indexing _starts at *offset* instead of 0_. > *Use-case* > Add rows to a DataFrame that is already written to a DB (via > _.write.jdbc(...)_). > In detail: > - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 > in that column is existent in DB. > - New rows need to be added to *A*. This included > -- Creating a DataFrame *A'* with new rows, but without id column > -- Add the index column to *A'* - this time starting at *200*, as there are > already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 > ) is required.*) > -- union *A* and *A'* > -- store into DB -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing
[ https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944137#comment-14944137 ] Martin Senne commented on SPARK-10868: -- I will do (and give my best!) Thx for offering this opportunity > monotonicallyIncreasingId() supports offset for indexing > > > Key: SPARK-10868 > URL: https://issues.apache.org/jira/browse/SPARK-10868 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: Martin Senne > > With SPARK-7135 and https://github.com/apache/spark/pull/5709 > `monotonicallyIncreasingID()` allows to create an index column with unique > ids. The indexing always starts at 0 (no offset). > *Feature wish* > Having a parameter `offset`, such that the function can be used as > {{monotonicallyIncreasingID( offset )}} > and indexing _starts at *offset* instead of 0_. > *Use-case* > Add rows to a DataFrame that is already written to a DB (via > _.write.jdbc(...)_). > In detail: > - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 > in that column is existent in DB. > - New rows need to be added to *A*. This included > -- Creating a DataFrame *A'* with new rows, but without id column > -- Add the index column to *A'* - this time starting at *200*, as there are > already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 > ) is required.*) > -- union *A* and *A'* > -- store into DB -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10868) monotonicallyIncreasingId() supports offset for indexing
[ https://issues.apache.org/jira/browse/SPARK-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944108#comment-14944108 ] Reynold Xin commented on SPARK-10868: - [~MartinSenne] this makes sense, and shouldn't be too hard to do. Are you interested in submitting a pull request for this? > monotonicallyIncreasingId() supports offset for indexing > > > Key: SPARK-10868 > URL: https://issues.apache.org/jira/browse/SPARK-10868 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: Martin Senne > > With SPARK-7135 and https://github.com/apache/spark/pull/5709 > `monotonicallyIncreasingID()` allows to create an index column with unique > ids. The indexing always starts at 0 (no offset). > *Feature wish* > Having a parameter `offset`, such that the function can be used as > {{monotonicallyIncreasingID( offset )}} > and indexing _starts at *offset* instead of 0_. > *Use-case* > Add rows to a DataFrame that is already written to a DB (via > _.write.jdbc(...)_). > In detail: > - A DataFrame *A* (containing an ID column) and having indices from 0 to 199 > in that column is existent in DB. > - New rows need to be added to *A*. This included > -- Creating a DataFrame *A'* with new rows, but without id column > -- Add the index column to *A'* - this time starting at *200*, as there are > already entries with id's from 0 to 199 (*here, monotonicallyInreasingID( 200 > ) is required.*) > -- union *A* and *A'* > -- store into DB -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org