bucket joins on multiple data frames.

2021-09-08 Thread Adrian Stern
Sorry if this has been answered, but I had a question about bucketed joins that I can't seem to find the answer to online. - I have a bunch of pyspark data frames (let's call them df1, df2, ...df10). I need to join them all together using the same key. - joined = df1.join(df2, "key",

Unsubscribe

2021-09-08 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
Unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Is BindingParquetOutputCommitter still used?

2021-09-08 Thread Vladimir Prus
Hi, per https://spark.apache.org/docs/latest/cloud-integration.html, when using S3 storage one is advised to set these options: spark.sql.sources.commitProtocolClass > org.apache.spark.internal.io.cloud.PathOutputCommitProtocol > spark.sql.parquet.output.committer.class >

Fwd: issue in Apache Spark install

2021-09-08 Thread Mukhtar Ali
Dear Learning member of of https://learning.oreilly.com some problem in install Apache Spark I try both CMD and Jupyter file same issue* Exception: Java gateway process exited before sending its port number* please resolve this issue find the attachment in Jupyter In CMD C:\Users\User>pyspark