Spark Lost executor && shuffle.FetchFailedException

2015-09-21 Thread biyan900116
Hi All: When I write the data to the hive dynamic partition table, many errors and warnings as following happen... Is the reason that shuffle output is so large ? = 15/09/21 14:53:09 ERROR cluster.YarnClusterScheduler: Lost executor 402 on dn03.datanode.com: remote Rpc client

Re: Spark aggregateByKey Issues

2015-09-15 Thread biyan900116
Hi Alexis: Of course, it’s very useful to me, specially about the operations after sort operation is done. And, i still have one question: How to set the decent number of partition, if it need not to be equal to the number of keys ? > 在 2015年9月15日,下午3:41,Alexis Gillain

Re: Spark aggregateByKey Issues

2015-09-14 Thread biyan900116
Hi Alexis: Thank you for your replying. My case is that each operation to one record need to depend on one value that will be set by the operating to the last record. So your advise is that i can use “sortByKey”. “sortByKey” will put all records with the same Key in one partition. Need I