[jira] [Commented] (SPARK-12954) pyspark API 1.3.0 how we can patitionning by columns
[ https://issues.apache.org/jira/browse/SPARK-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110596#comment-15110596 ] malouke commented on SPARK-12954: - ok sorry, > pyspark API 1.3.0 how we can patitionning by columns > --- > > Key: SPARK-12954 > URL: https://issues.apache.org/jira/browse/SPARK-12954 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.3.0 > Environment: spark 1.3.0 > cloudera manger > linux platfrome > pyspark >Reporter: malouke >Priority: Blocker > Labels: documentation, features, performance, test > > hi, > before posting this question i try lot of things , but i dont found solution. > i have 9 table and i join thems with two ways: > -1 first test with df.join(df2, df.id == df.id2,'left_outer') > -2 sqlcontext.sql("select * from t1 left join t2 on id_t1=id_t2") > after that i want partition by date the result of join : > -in pyspark 1.5.2 i try partitionBy if table it's not comming from result of > at most two tables evry thiings ok. but when i join more than three tables i > dont have result after severals hours . > - in pyspark 1.3.0 i dont found in api one function let me partition by dat > columns > Q: some one can help me to resolve this probleme > thank you in advance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12954) pyspark API 1.3.0 how we can patitionning by columns
[ https://issues.apache.org/jira/browse/SPARK-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110603#comment-15110603 ] malouke commented on SPARK-12954: - hi sean, where i can ask question ? > pyspark API 1.3.0 how we can patitionning by columns > --- > > Key: SPARK-12954 > URL: https://issues.apache.org/jira/browse/SPARK-12954 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.3.0 > Environment: spark 1.3.0 > cloudera manger > linux platfrome > pyspark >Reporter: malouke >Priority: Blocker > Labels: documentation, features, performance, test > > hi, > before posting this question i try lot of things , but i dont found solution. > i have 9 table and i join thems with two ways: > -1 first test with df.join(df2, df.id == df.id2,'left_outer') > -2 sqlcontext.sql("select * from t1 left join t2 on id_t1=id_t2") > after that i want partition by date the result of join : > -in pyspark 1.5.2 i try partitionBy if table it's not comming from result of > at most two tables evry thiings ok. but when i join more than three tables i > dont have result after severals hours . > - in pyspark 1.3.0 i dont found in api one function let me partition by dat > columns > Q: some one can help me to resolve this probleme > thank you in advance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org