[jira] [Updated] (SPARK-7178) Improve DataFrame documentation and code samples

Reynold Xin (JIRA) Tue, 04 Aug 2015 19:30:23 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reynold Xin updated SPARK-7178:
-------------------------------
    Sprint: Spark 1.5 doc/QA sprint

> Improve DataFrame documentation and code samples
> ------------------------------------------------
>
>                 Key: SPARK-7178
>                 URL: https://issues.apache.org/jira/browse/SPARK-7178
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.3.1
>            Reporter: Chris Fregly
>              Labels: dataframe
>
> AND and OR are not straightforward when using the new DataFrame API.
> the current convention - accepted by Pandas users - is to use the bitwise & 
> and | instead of AND and OR.  when using these, however, you need to wrap 
> each expression in parenthesis to prevent the bitwise operator from 
> dominating.
> also, working with StructTypes is a bit confusing.  the following link:  
> https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
>  (Python tab) implies that you can work with tuples directly when creating a 
> DataFrame.
> however, the following code errors out unless we explicitly use Row's:
> {code}
> from pyspark.sql import Row
> from pyspark.sql.types import *
> # The schema is encoded in a string.
> schemaString = "a"
> fields = [StructField(field_name, MapType(StringType(),IntegerType())) for 
> field_name in schemaString.split()]
> schema = StructType(fields)
> df = sqlContext.createDataFrame([Row(a={'b': 1})], schema)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7178) Improve DataFrame documentation and code samples

Reply via email to