Hi,
How can I write to multiple outputs for each key? I tried to create
custom partitioner or define the number of partition but does not work.
There are only the few tasks/partitions (which equals to the number of
all key combination) gets large datasets, data is not splitting to all
I believe groupByKey currently requires that all items for a specific key fit
into a single and executive's memory:
http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
This previous discussion has some pointers if you must