Coalesce tries to reduce the number of partitions into smaller number of partitions, without moving the data around (as much as possible). Since most of received data is in a few machines (those running receivers), coallesce just makes bigger merged partitions in those.
Without coalesce Machine 1: 10 partitions processing in parallel Machine 2: 2 partitions processing in parallel With coalesce Machine 1: 10 partitions merged into 1 partition processed together taking 10 times longer Machine 2: 2 partitions merged into 1 partition process together taking 2 times longer Hope this clarifies. TD On Fri, Apr 10, 2015 at 5:16 AM, 邓刚[技术中心] <triones.d...@vipshop.com> wrote: > Hi All, > > We are running a spark streaming application. The data source is > kafka, the data partition of kafka is not well-distributed > <http://dict.cn/well-distributed> but every receiver on every executor > can receive data, just different of the amount. > > and our data is very large so we try to a local repartition with > coalesce(*.false). but we found an odd appearances。 > > > > Most of the task running on one executor. See picture one. When we remove > the coalesce call the task can distributed > <http://dict.cn/well-distributed> better see picture two. Any one knows > why? > > > > *Picture one* > > > > > > *Picture two* > > > 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! > This communication is intended only for the addressee(s) and may contain > information that is privileged and confidential. You are hereby notified > that, if you are not an intended recipient listed above, or an authorized > employee or agent of an addressee of this communication responsible for > delivering e-mail messages to an intended recipient, any dissemination, > distribution or reproduction of this communication (including any > attachments hereto) is strictly prohibited. If you have received this > communication in error, please notify us immediately by a reply e-mail > addressed to the sender and permanently delete the original e-mail > communication and any attachments from all storage devices without making > or otherwise retaining a copy. >