What happens when a whole node running your " per node streaming engine (built-in checkpoint and recovery)" fails? Can its checkpoint and recovery mechanism handle whole node failure? Can you recover from the checkpoint on a different node?
Spark and Spark Streaming were designed with the idea that executors are disposable, and there should not be any node-specific long term state that you rely on unless you can recover that state on a different node. On Mon, Oct 5, 2015 at 3:03 PM, Renyi Xiong <renyixio...@gmail.com> wrote: > if RDDs from same DStream not guaranteed to run on same worker, then the > question becomes: > > is it possible to specify an unlimited duration in ssc to have a > continuous stream (as opposed to discretized). > > say, we have a per node streaming engine (built-in checkpoint and > recovery) we'd like to integrate with spark streaming. can we have a > never-ending batch (or RDD) this way? > > On Mon, Sep 28, 2015 at 4:31 PM, <mailer-dae...@apache.org> wrote: > >> Hi. This is the qmail-send program at apache.org. >> I'm afraid I wasn't able to deliver your message to the following >> addresses. >> This is a permanent error; I've given up. Sorry it didn't work out. >> >> <u...@spark.apache.org>: >> Must be sent from an @apache.org address or a subscriber address or an >> address in LDAP. >> >> --- Below this line is a copy of the message. >> >> Return-Path: <renyixio...@gmail.com> >> Received: (qmail 95559 invoked by uid 99); 28 Sep 2015 23:31:46 -0000 >> Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) >> by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2015 23:31:46 >> +0000 >> Received: from localhost (localhost [127.0.0.1]) >> by spamd3-us-west.apache.org (ASF Mail Server at >> spamd3-us-west.apache.org) with ESMTP id 96E361809BA >> for <u...@spark.apache.org>; Mon, 28 Sep 2015 23:31:45 +0000 >> (UTC) >> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org >> X-Spam-Flag: NO >> X-Spam-Score: 3.129 >> X-Spam-Level: *** >> X-Spam-Status: No, score=3.129 tagged_above=-999 required=6.31 >> tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, >> FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, >> RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] >> autolearn=disabled >> Authentication-Results: spamd3-us-west.apache.org (amavisd-new); >> dkim=pass (2048-bit key) header.d=gmail.com >> Received: from mx1-us-west.apache.org ([10.40.0.8]) >> by localhost (spamd3-us-west.apache.org [10.40.0.10]) >> (amavisd-new, port 10024) >> with ESMTP id FAGoohFE7Y7A for <u...@spark.apache.org>; >> Mon, 28 Sep 2015 23:31:44 +0000 (UTC) >> Received: from mail-la0-f51.google.com (mail-la0-f51.google.com >> [209.85.215.51]) >> by mx1-us-west.apache.org (ASF Mail Server at >> mx1-us-west.apache.org) with ESMTPS id 2ED40204C9 >> for <u...@spark.apache.org>; Mon, 28 Sep 2015 23:31:44 +0000 >> (UTC) >> Received: by labzv5 with SMTP id zv5so32919088lab.1 >> for <u...@spark.apache.org>; Mon, 28 Sep 2015 16:31:42 -0700 >> (PDT) >> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; >> d=gmail.com; s=20120113; >> >> h=mime-version:in-reply-to:references:date:message-id:subject:from:to >> :cc:content-type; >> bh=F36l+I4dfDHTL7nQ0K9mAW4aVtPpVpYc0rWWpPNjt4c=; >> >> b=QfRdLEWf4clJqwkZSH7n0oHjXLNifWdhYxvCDZ+P37oSfM0vd/8Bx962hTflRQkD1q >> >> 2B3go7g8bpnQlhZgMRrZfT6hk7vUtNA3lOZjYeN+cPyoVRaBwm3LIID5vF4cw5hFAWaM >> >> LUenU7E7b9kJY8JkyhIfpya8CLKz+Yo6EjCv3W6BAvv2YiNPgbOQkpx7u8y9dw0kHGig >> >> 1hv37Ey/DZpoKCgbSesv+sztYslevu+VBgxHFkveEyxH1saRr6OqTM7fnL2E6fP4E8qu >> >> W81G1ZfNW1Pp9i5IcCb/9S7YTZDnBlUj4yROsOfNANRGmed71QpQD9l8NnAQXmeqoeNF >> SyEg== >> MIME-Version: 1.0 >> X-Received: by 10.25.213.75 with SMTP id m72mr4047578lfg.17.1443483102618; >> Mon, 28 Sep 2015 16:31:42 -0700 (PDT) >> Received: by 10.25.207.18 with HTTP; Mon, 28 Sep 2015 16:31:42 -0700 (PDT) >> In-Reply-To: <CAPn6-YTZo+-A1HThmBO4mKO0sELrvxP7DZEpg3GoWN=Qz= >> 2...@mail.gmail.com> >> References: < >> cangsv6-k+33gvgtiynwhz2gsbudf_wwwnazvupbqe8qdcg_...@mail.gmail.com> >> <CAPn6-YQ3Q-=HMrqz5FLLPx_HmjmHkHP7cwsPYvsxw-tb7a8P= >> g...@mail.gmail.com> >> <CAPn6-YTZo+-A1HThmBO4mKO0sELrvxP7DZEpg3GoWN=Qz= >> 2...@mail.gmail.com> >> Date: Mon, 28 Sep 2015 16:31:42 -0700 >> Message-ID: < >> cangsv69hyqbbvb8_8zshstlrpdy-37fjnwyvxce-xf7dphq...@mail.gmail.com> >> Subject: Re: Spark streaming DStream state on worker >> From: Renyi Xiong <renyixio...@gmail.com> >> To: Shixiong Zhu <zsxw...@gmail.com> >> Cc: "u...@spark.apache.org" <u...@spark.apache.org> >> Content-Type: multipart/alternative; boundary=001a11411fde922c170520d71928 >> >> --001a11411fde922c170520d71928 >> Content-Type: text/plain; charset=UTF-8 >> >> you answered my question I think that RDDs from same DStream not >> guaranteed >> to run on same worker >> >> On Thu, Sep 24, 2015 at 1:51 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: >> >> > +user, -dev >> > >> > It's not clear about `compute` in your question. There are two `compute` >> > here. >> > >> > 1. DStream.compute: it always runs in the driver, and all RDDs are >> created >> > in the driver. E.g., >> > >> > DStream.foreachRDD(rdd => rdd.count()) >> > >> > "rdd.count()" is called in the driver. >> > >> > 2. RDD.compute: this will run in the executor and the location is not >> > guaranteed. E.g., >> > >> > DStream.foreachRDD(rdd => rdd.foreach { v => >> > println(v) >> > }) >> > >> > "println(v)" is called in the executor. >> > >> > >> > Best Regards, >> > Shixiong Zhu >> > >> > 2015-09-17 3:47 GMT+08:00 Renyi Xiong <renyixio...@gmail.com>: >> > >> >> Hi, >> >> >> >> I want to do temporal join operation on DStream across RDDs, my >> question >> >> is: Are RDDs from same DStream always computed on same worker (except >> >> failover) ? >> >> >> >> thanks, >> >> Renyi. >> >> >> > >> > >> > >> >> --001a11411fde922c170520d71928 >> Content-Type: text/html; charset=UTF-8 >> Content-Transfer-Encoding: quoted-printable >> >> <div dir=3D"ltr">you answered my question I think that RDDs from same >> DStre= >> am not guaranteed to run on same worker</div><div >> class=3D"gmail_extra"><br= >> ><div class=3D"gmail_quote">On Thu, Sep 24, 2015 at 1:51 AM, Shixiong Zhu >> <= >> span dir=3D"ltr"><<a href=3D"mailto:zsxw...@gmail.com" >> target=3D"_blank"= >> >zsxw...@gmail.com</a>></span> wrote:<br><blockquote >> class=3D"gmail_quot= >> e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc >> solid;padding-left:1ex">= >> <div dir=3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr"><div>+user, >> -d= >> ev</div><div><div class=3D"h5"><div><br></div><div>It's not clear >> about= >> `compute` in your question. There are two `compute` >> here.</div><div><br></= >> div><div>1. DStream.compute: it always runs in the driver, and all RDDs >> are= >> created in the driver. >> E.g.,=C2=A0</div><div><br></div><div>DStream.foreac= >> hRDD(rdd =3D> >> rdd.count())</div><div><br></div><div>"rdd.count()&qu= >> ot; is called in the driver.</div><div><br></div><div>2. RDD.compute: >> this = >> will run in the executor and the location is not guaranteed. >> E.g.,</div><di= >> v><br></div><div>DStream.foreachRDD(rdd =3D> rdd.foreach { v >> =3D></di= >> v><div>=C2=A0 =C2=A0 >> println(v)</div><div>})<br></div><div><br></div><div>&= >> quot;println(v)" is called in the >> executor.</div><br></div></div></div= >> ><div><div class=3D"h5"><div class=3D"gmail_extra"><br >> clear=3D"all"><div><= >> div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div >> dir=3D"ltr"><div><div = >> dir=3D"ltr"><p>Best Regards,</p><div>Shixiong >> Zhu</div></div></div></div></= >> div></div></div></div></div></div><div><div> >> <br><div class=3D"gmail_quote">2015-09-17 3:47 GMT+08:00 Renyi Xiong >> <span = >> dir=3D"ltr"><<a href=3D"mailto:renyixio...@gmail.com" >> target=3D"_blank">= >> renyixio...@gmail.com</a>></span>:<br><blockquote >> class=3D"gmail_quote" = >> style=3D"margin:0 0 0 .8ex;border-left:1px #ccc >> solid;padding-left:1ex"><di= >> v dir=3D"ltr"><div>Hi,</div><div><br></div><div>I want to >> do=C2=A0temporal = >> join operation on DStream across RDDs, my question is: Are RDDs from same >> D= >> Stream always computed on same worker (except failover) >> ?</div><div><br></d= >> iv><div>thanks,</div><div>Renyi.</div></div> >> </blockquote></div><br></div></div></div> >> </div></div></div><br></div> >> </blockquote></div><br></div> >> >> --001a11411fde922c170520d71928-- >> > >