[ https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517858#comment-14517858 ]
Chris Fregly edited comment on SPARK-7178 at 4/28/15 8:07 PM: -------------------------------------------------------------- added these to the forums AND and OR: https://forums.databricks.com/questions/758/how-do-i-use-and-and-or-within-my-dataframe-operat.html Nested Map Columns in DataFrames: https://forums.databricks.com/questions/764/how-do-i-create-a-dataframe-with-nested-map-column.html Casting columns of DataFrames: https://forums.databricks.com/questions/767/how-do-i-cast-within-a-dataframe.html was (Author: cfregly): added this to the forums to address the AND and OR: https://forums.databricks.com/questions/758/how-do-i-use-and-and-or-within-my-dataframe-operat.html > Improve DataFrame documentation and code samples > ------------------------------------------------ > > Key: SPARK-7178 > URL: https://issues.apache.org/jira/browse/SPARK-7178 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.3.1 > Reporter: Chris Fregly > Labels: dataframe > > AND and OR are not straightforward when using the new DataFrame API. > the current convention - accepted by Pandas users - is to use the bitwise & > and | instead of AND and OR. when using these, however, you need to wrap > each expression in parenthesis to prevent the bitwise operator from > dominating. > also, working with StructTypes is a bit confusing. the following link: > https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema > (Python tab) implies that you can work with tuples directly when creating a > DataFrame. > however, the following code errors out unless we explicitly use Row's: > {code} > from pyspark.sql import Row > from pyspark.sql.types import * > # The schema is encoded in a string. > schemaString = "a" > fields = [StructField(field_name, MapType(StringType(),IntegerType())) for > field_name in schemaString.split()] > schema = StructType(fields) > df = sqlContext.createDataFrame([Row(a={'b': 1})], schema) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org