[jira] [Comment Edited] (SPARK-7178) Improve DataFrame documentation and code samples

2015-08-18 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702520#comment-14702520
 ] 

Reynold Xin edited comment on SPARK-7178 at 8/19/15 5:37 AM:
-

Closing this one since we will update DataFrame documentation in other tickets.

And also and/or now have better error messages in Python.



was (Author: rxin):
Closing this one since we will update DataFrame documentation in other tickets.


 Improve DataFrame documentation and code samples
 

 Key: SPARK-7178
 URL: https://issues.apache.org/jira/browse/SPARK-7178
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Chris Fregly
  Labels: dataframe

 AND and OR are not straightforward when using the new DataFrame API.
 the current convention - accepted by Pandas users - is to use the bitwise  
 and | instead of AND and OR.  when using these, however, you need to wrap 
 each expression in parenthesis to prevent the bitwise operator from 
 dominating.
 also, working with StructTypes is a bit confusing.  the following link:  
 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
  (Python tab) implies that you can work with tuples directly when creating a 
 DataFrame.
 however, the following code errors out unless we explicitly use Row's:
 {code}
 from pyspark.sql import Row
 from pyspark.sql.types import *
 # The schema is encoded in a string.
 schemaString = a
 fields = [StructField(field_name, MapType(StringType(),IntegerType())) for 
 field_name in schemaString.split()]
 schema = StructType(fields)
 df = sqlContext.createDataFrame([Row(a={'b': 1})], schema)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7178) Improve DataFrame documentation and code samples

2015-04-28 Thread Chris Fregly (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517858#comment-14517858
 ] 

Chris Fregly edited comment on SPARK-7178 at 4/28/15 8:07 PM:
--

added these to the forums

AND and OR:  
https://forums.databricks.com/questions/758/how-do-i-use-and-and-or-within-my-dataframe-operat.html

Nested Map Columns in DataFrames:
https://forums.databricks.com/questions/764/how-do-i-create-a-dataframe-with-nested-map-column.html

Casting columns of DataFrames:
https://forums.databricks.com/questions/767/how-do-i-cast-within-a-dataframe.html


was (Author: cfregly):
added this to the forums to address the AND and OR:  
https://forums.databricks.com/questions/758/how-do-i-use-and-and-or-within-my-dataframe-operat.html

 Improve DataFrame documentation and code samples
 

 Key: SPARK-7178
 URL: https://issues.apache.org/jira/browse/SPARK-7178
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Chris Fregly
  Labels: dataframe

 AND and OR are not straightforward when using the new DataFrame API.
 the current convention - accepted by Pandas users - is to use the bitwise  
 and | instead of AND and OR.  when using these, however, you need to wrap 
 each expression in parenthesis to prevent the bitwise operator from 
 dominating.
 also, working with StructTypes is a bit confusing.  the following link:  
 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
  (Python tab) implies that you can work with tuples directly when creating a 
 DataFrame.
 however, the following code errors out unless we explicitly use Row's:
 {code}
 from pyspark.sql import Row
 from pyspark.sql.types import *
 # The schema is encoded in a string.
 schemaString = a
 fields = [StructField(field_name, MapType(StringType(),IntegerType())) for 
 field_name in schemaString.split()]
 schema = StructType(fields)
 df = sqlContext.createDataFrame([Row(a={'b': 1})], schema)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7178) Improve DataFrame documentation and code samples

2015-04-27 Thread Chris Fregly (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516021#comment-14516021
 ] 

Chris Fregly edited comment on SPARK-7178 at 4/28/15 12:46 AM:
---

i recommend updating all of the following:
1)  scala/python/pyspark docs 
(ie. 
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext.createDataFrame)
2)  SQL Programming guide 
(ie. https://spark.apache.org/docs/latest/sql-programming-guide.html)



was (Author: cfregly):
i recommend updating both the scala docs and the SQL Programming guide.

 Improve DataFrame documentation and code samples
 

 Key: SPARK-7178
 URL: https://issues.apache.org/jira/browse/SPARK-7178
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Chris Fregly
  Labels: dataframe

 AND and OR are not straightforward when using the new DataFrame API.
 the current convention - accepted by Pandas users - is to use the bitwise  
 and | instead of AND and OR.  when using these, however, you need to wrap 
 each expression in parenthesis to prevent the bitwise operator from 
 dominating.
 also, it's a bit confusing when creating a 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org