[jira] [Comment Edited] (SPARK-7178) Improve DataFrame documentation and code samples
[ https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702520#comment-14702520 ] Reynold Xin edited comment on SPARK-7178 at 8/19/15 5:37 AM: - Closing this one since we will update DataFrame documentation in other tickets. And also and/or now have better error messages in Python. was (Author: rxin): Closing this one since we will update DataFrame documentation in other tickets. Improve DataFrame documentation and code samples Key: SPARK-7178 URL: https://issues.apache.org/jira/browse/SPARK-7178 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Chris Fregly Labels: dataframe AND and OR are not straightforward when using the new DataFrame API. the current convention - accepted by Pandas users - is to use the bitwise and | instead of AND and OR. when using these, however, you need to wrap each expression in parenthesis to prevent the bitwise operator from dominating. also, working with StructTypes is a bit confusing. the following link: https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema (Python tab) implies that you can work with tuples directly when creating a DataFrame. however, the following code errors out unless we explicitly use Row's: {code} from pyspark.sql import Row from pyspark.sql.types import * # The schema is encoded in a string. schemaString = a fields = [StructField(field_name, MapType(StringType(),IntegerType())) for field_name in schemaString.split()] schema = StructType(fields) df = sqlContext.createDataFrame([Row(a={'b': 1})], schema) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7178) Improve DataFrame documentation and code samples
[ https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517858#comment-14517858 ] Chris Fregly edited comment on SPARK-7178 at 4/28/15 8:07 PM: -- added these to the forums AND and OR: https://forums.databricks.com/questions/758/how-do-i-use-and-and-or-within-my-dataframe-operat.html Nested Map Columns in DataFrames: https://forums.databricks.com/questions/764/how-do-i-create-a-dataframe-with-nested-map-column.html Casting columns of DataFrames: https://forums.databricks.com/questions/767/how-do-i-cast-within-a-dataframe.html was (Author: cfregly): added this to the forums to address the AND and OR: https://forums.databricks.com/questions/758/how-do-i-use-and-and-or-within-my-dataframe-operat.html Improve DataFrame documentation and code samples Key: SPARK-7178 URL: https://issues.apache.org/jira/browse/SPARK-7178 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Chris Fregly Labels: dataframe AND and OR are not straightforward when using the new DataFrame API. the current convention - accepted by Pandas users - is to use the bitwise and | instead of AND and OR. when using these, however, you need to wrap each expression in parenthesis to prevent the bitwise operator from dominating. also, working with StructTypes is a bit confusing. the following link: https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema (Python tab) implies that you can work with tuples directly when creating a DataFrame. however, the following code errors out unless we explicitly use Row's: {code} from pyspark.sql import Row from pyspark.sql.types import * # The schema is encoded in a string. schemaString = a fields = [StructField(field_name, MapType(StringType(),IntegerType())) for field_name in schemaString.split()] schema = StructType(fields) df = sqlContext.createDataFrame([Row(a={'b': 1})], schema) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7178) Improve DataFrame documentation and code samples
[ https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516021#comment-14516021 ] Chris Fregly edited comment on SPARK-7178 at 4/28/15 12:46 AM: --- i recommend updating all of the following: 1) scala/python/pyspark docs (ie. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext.createDataFrame) 2) SQL Programming guide (ie. https://spark.apache.org/docs/latest/sql-programming-guide.html) was (Author: cfregly): i recommend updating both the scala docs and the SQL Programming guide. Improve DataFrame documentation and code samples Key: SPARK-7178 URL: https://issues.apache.org/jira/browse/SPARK-7178 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Chris Fregly Labels: dataframe AND and OR are not straightforward when using the new DataFrame API. the current convention - accepted by Pandas users - is to use the bitwise and | instead of AND and OR. when using these, however, you need to wrap each expression in parenthesis to prevent the bitwise operator from dominating. also, it's a bit confusing when creating a -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org