write multiple outputs by key

2015-06-06 Thread patcharee
Hi, How can I write to multiple outputs for each key? I tried to create custom partitioner or define the number of partition but does not work. There are only the few tasks/partitions (which equals to the number of all key combination) gets large datasets, data is not splitting to all

Re: write multiple outputs by key

2015-06-06 Thread Will Briggs
I believe groupByKey currently requires that all items for a specific key fit into a single and executive's memory: http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html This previous discussion has some pointers if you must