[jira] [Commented] (SPARK-2789) Apply names to RDD to becoming SchemaRDD

Apache Spark (JIRA) Fri, 06 Feb 2015 00:13:55 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308825#comment-14308825
 ]


Apache Spark commented on SPARK-2789:
-------------------------------------

User 'dwmclary' has created a pull request for this issue:
https://github.com/apache/spark/pull/4421

> Apply names to RDD to becoming SchemaRDD
> ----------------------------------------
>
>                 Key: SPARK-2789
>                 URL: https://issues.apache.org/jira/browse/SPARK-2789
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Davies Liu
>
> In order to simplify apply schema, we could add an API called applyNames(), 
> which will infer the types in the RDD and create an schema with names, then 
> apply  this schema on it to becoming a SchemaRDD. The names could be provides 
> by String with names separated  by space.
> For example:
> rdd = sc.parallelize([("Alice", 10)])
> srdd = sqlCtx.applyNames(rdd, "name age")
> User don't need to create an case class or StructType to have all power of 
> Spark SQL.
> The string presentation of schema also could support nested structure 
> (MapType, ArrayType and StructType), for example:
> "name age address(city zip) likes[title stars] props{[value type]}"
> It will equal to unnamed schema:
> root
> |--name
> |--age
> |--address
> |--|--city
> |--|--zip
> |--likes
> |--|--element
> |--|--|--title
> |--|--|--starts
> |--props
> |--|--key:
> |--|--value:
> |--|--|--element
> |--|--|--|--value
> |--|--|--|--type
> All the names of fields are seperated by space, the struct of field (if it is 
> nested type) follows the name without space, wich shoud startswith "(" 
> (StructType) or "[" (ArrayType) or "{" (MapType).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2789) Apply names to RDD to becoming SchemaRDD

Reply via email to