[ https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-7178: ------------------------------- Sprint: Spark 1.5 doc/QA sprint > Improve DataFrame documentation and code samples > ------------------------------------------------ > > Key: SPARK-7178 > URL: https://issues.apache.org/jira/browse/SPARK-7178 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.3.1 > Reporter: Chris Fregly > Labels: dataframe > > AND and OR are not straightforward when using the new DataFrame API. > the current convention - accepted by Pandas users - is to use the bitwise & > and | instead of AND and OR. when using these, however, you need to wrap > each expression in parenthesis to prevent the bitwise operator from > dominating. > also, working with StructTypes is a bit confusing. the following link: > https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema > (Python tab) implies that you can work with tuples directly when creating a > DataFrame. > however, the following code errors out unless we explicitly use Row's: > {code} > from pyspark.sql import Row > from pyspark.sql.types import * > # The schema is encoded in a string. > schemaString = "a" > fields = [StructField(field_name, MapType(StringType(),IntegerType())) for > field_name in schemaString.split()] > schema = StructType(fields) > df = sqlContext.createDataFrame([Row(a={'b': 1})], schema) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org