[ https://issues.apache.org/jira/browse/SPARK-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308825#comment-14308825 ]
Apache Spark commented on SPARK-2789: ------------------------------------- User 'dwmclary' has created a pull request for this issue: https://github.com/apache/spark/pull/4421 > Apply names to RDD to becoming SchemaRDD > ---------------------------------------- > > Key: SPARK-2789 > URL: https://issues.apache.org/jira/browse/SPARK-2789 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Davies Liu > > In order to simplify apply schema, we could add an API called applyNames(), > which will infer the types in the RDD and create an schema with names, then > apply this schema on it to becoming a SchemaRDD. The names could be provides > by String with names separated by space. > For example: > rdd = sc.parallelize([("Alice", 10)]) > srdd = sqlCtx.applyNames(rdd, "name age") > User don't need to create an case class or StructType to have all power of > Spark SQL. > The string presentation of schema also could support nested structure > (MapType, ArrayType and StructType), for example: > "name age address(city zip) likes[title stars] props{[value type]}" > It will equal to unnamed schema: > root > |--name > |--age > |--address > |--|--city > |--|--zip > |--likes > |--|--element > |--|--|--title > |--|--|--starts > |--props > |--|--key: > |--|--value: > |--|--|--element > |--|--|--|--value > |--|--|--|--type > All the names of fields are seperated by space, the struct of field (if it is > nested type) follows the name without space, wich shoud startswith "(" > (StructType) or "[" (ArrayType) or "{" (MapType). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org