[jira] [Comment Edited] (SPARK-7178) Improve DataFrame documentation and code samples

Chris Fregly (JIRA) Tue, 28 Apr 2015 13:08:38 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517858#comment-14517858
 ]


Chris Fregly edited comment on SPARK-7178 at 4/28/15 8:07 PM:
--------------------------------------------------------------

added these to the forums

AND and OR:  
https://forums.databricks.com/questions/758/how-do-i-use-and-and-or-within-my-dataframe-operat.html

Nested Map Columns in DataFrames:
https://forums.databricks.com/questions/764/how-do-i-create-a-dataframe-with-nested-map-column.html

Casting columns of DataFrames:
https://forums.databricks.com/questions/767/how-do-i-cast-within-a-dataframe.html


was (Author: cfregly):
added this to the forums to address the AND and OR:  
https://forums.databricks.com/questions/758/how-do-i-use-and-and-or-within-my-dataframe-operat.html

> Improve DataFrame documentation and code samples
> ------------------------------------------------
>
>                 Key: SPARK-7178
>                 URL: https://issues.apache.org/jira/browse/SPARK-7178
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.3.1
>            Reporter: Chris Fregly
>              Labels: dataframe
>
> AND and OR are not straightforward when using the new DataFrame API.
> the current convention - accepted by Pandas users - is to use the bitwise & 
> and | instead of AND and OR.  when using these, however, you need to wrap 
> each expression in parenthesis to prevent the bitwise operator from 
> dominating.
> also, working with StructTypes is a bit confusing.  the following link:  
> https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
>  (Python tab) implies that you can work with tuples directly when creating a 
> DataFrame.
> however, the following code errors out unless we explicitly use Row's:
> {code}
> from pyspark.sql import Row
> from pyspark.sql.types import *
> # The schema is encoded in a string.
> schemaString = "a"
> fields = [StructField(field_name, MapType(StringType(),IntegerType())) for 
> field_name in schemaString.split()]
> schema = StructType(fields)
> df = sqlContext.createDataFrame([Row(a={'b': 1})], schema)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-7178) Improve DataFrame documentation and code samples

Reply via email to