[ https://issues.apache.org/jira/browse/SPARK-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Simon poortman updated SPARK-7150: ---------------------------------- Attachment: Network Management Downloads.zip > SQLContext.range() > ------------------ > > Key: SPARK-7150 > URL: https://issues.apache.org/jira/browse/SPARK-7150 > Project: Spark > Issue Type: Sub-task > Components: ML, SQL > Reporter: Joseph K. Bradley > Assignee: Adrian Wang > Priority: Minor > Labels: starter > Fix For: 1.4.0 > > Attachments: Network Management Downloads.zip > > > It would be handy to have easy ways to construct random columns for > DataFrames. Proposed API: > {code} > class SQLContext { > // Return a DataFrame with a single column named "id" that has consecutive > value from 0 to n. > def range(n: Long): DataFrame > def range(n: Long, numPartitions: Int): DataFrame > } > {code} > Usage: > {code} > // uniform distribution > ctx.range(1000).select(rand()) > // normal distribution > ctx.range(1000).select(randn()) > {code} > We should add an RangeIterator that supports long start/stop position, and > then use it to create an RDD as the basis for this DataFrame. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org