[ https://issues.apache.org/jira/browse/SPARK-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-2789: ------------------------------------ Component/s: SQL > Apply names to RDD to becoming SchemaRDD > ---------------------------------------- > > Key: SPARK-2789 > URL: https://issues.apache.org/jira/browse/SPARK-2789 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Davies Liu > > In order to simplify apply schema, we could add an API called applyNames(), > which will infer the types in the RDD and create an schema with names, then > apply this schema on it to becoming a SchemaRDD. The names could be provides > by String with names separated by space. > For example: > rdd = sc.parallelize([("Alice", 10)]) > srdd = sqlCtx.applyNames(rdd, "name age") > User don't need to create an case class or StructType to have all power of > Spark SQL. > The string presentation of schema also could support nested structure > (MapType, ArrayType and StructType), for example: > "name age address(city zip) likes[title stars] props{[value type]}" > It will equal to unnamed schema: > root > |--name > |--age > |--address > |--|--city > |--|--zip > |--likes > |--|--element > |--|--|--title > |--|--|--starts > |--props > |--|--key: > |--|--value: > |--|--|--element > |--|--|--|--value > |--|--|--|--type > All the names of fields are seperated by space, the struct of field (if it is > nested type) follows the name without space, wich shoud startswith "(" > (StructType) or "[" (ArrayType) or "{" (MapType). -- This message was sent by Atlassian JIRA (v6.2#6252)