Sorry if this has been answered, but I had a question about bucketed joins
that I can't seem to find the answer to online.
- I have a bunch of pyspark data frames (let's call them df1, df2,
...df10). I need to join them all together using the same key.
- joined = df1.join(df2, "key",
n example of 2)
>>>
>>>
>>>
>>>
>>> Note: My question is in some way related to this question, but I don't
>>> think
>>> it is answered here:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Wh
>>> y-can-t-a-Transformer-have-multiple-output-columns-td18689.html
>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/W
>>> hy-can-t-a-Transformer-have-multiple-output-columns-td18689.html>
>>>
>>> Thanks
>>> Adrian
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/How-does-preprocessing-fit-into-Spark-
>>> MLlib-pipeline-tp28473.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>
--
Adrian Stern, Senior Software Engineer at Vidora
www.vidora.com
Follow us on LinkedIn <https://www.linkedin.com/company/vidora>, Twitter
<https://twitter.com/vidoracorp> or Facebook
<https://www.facebook.com/vidoracorp>