What happens when a whole node running  your " per node streaming engine
(built-in checkpoint and recovery)" fails? Can its checkpoint and recovery
mechanism handle whole node failure? Can you recover from the checkpoint on
a different node?

Spark and Spark Streaming were designed with the idea that executors are
disposable, and there should not be any node-specific long term state that
you rely on unless you can recover that state on a different node.

On Mon, Oct 5, 2015 at 3:03 PM, Renyi Xiong <renyixio...@gmail.com> wrote:

> if RDDs from same DStream not guaranteed to run on same worker, then the
> question becomes:
>
> is it possible to specify an unlimited duration in ssc to have a
> continuous stream (as opposed to discretized).
>
> say, we have a per node streaming engine (built-in checkpoint and
> recovery) we'd like to integrate with spark streaming. can we have a
> never-ending batch (or RDD) this way?
>
> On Mon, Sep 28, 2015 at 4:31 PM, <mailer-dae...@apache.org> wrote:
>
>> Hi. This is the qmail-send program at apache.org.
>> I'm afraid I wasn't able to deliver your message to the following
>> addresses.
>> This is a permanent error; I've given up. Sorry it didn't work out.
>>
>> <u...@spark.apache.org>:
>> Must be sent from an @apache.org address or a subscriber address or an
>> address in LDAP.
>>
>> --- Below this line is a copy of the message.
>>
>> Return-Path: <renyixio...@gmail.com>
>> Received: (qmail 95559 invoked by uid 99); 28 Sep 2015 23:31:46 -0000
>> Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142)
>>     by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2015 23:31:46
>> +0000
>> Received: from localhost (localhost [127.0.0.1])
>>         by spamd3-us-west.apache.org (ASF Mail Server at
>> spamd3-us-west.apache.org) with ESMTP id 96E361809BA
>>         for <u...@spark.apache.org>; Mon, 28 Sep 2015 23:31:45 +0000
>> (UTC)
>> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
>> X-Spam-Flag: NO
>> X-Spam-Score: 3.129
>> X-Spam-Level: ***
>> X-Spam-Status: No, score=3.129 tagged_above=-999 required=6.31
>>         tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
>>         FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3,
>>         RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001]
>>         autolearn=disabled
>> Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
>>         dkim=pass (2048-bit key) header.d=gmail.com
>> Received: from mx1-us-west.apache.org ([10.40.0.8])
>>         by localhost (spamd3-us-west.apache.org [10.40.0.10])
>> (amavisd-new, port 10024)
>>         with ESMTP id FAGoohFE7Y7A for <u...@spark.apache.org>;
>>         Mon, 28 Sep 2015 23:31:44 +0000 (UTC)
>> Received: from mail-la0-f51.google.com (mail-la0-f51.google.com
>> [209.85.215.51])
>>         by mx1-us-west.apache.org (ASF Mail Server at
>> mx1-us-west.apache.org) with ESMTPS id 2ED40204C9
>>         for <u...@spark.apache.org>; Mon, 28 Sep 2015 23:31:44 +0000
>> (UTC)
>> Received: by labzv5 with SMTP id zv5so32919088lab.1
>>         for <u...@spark.apache.org>; Mon, 28 Sep 2015 16:31:42 -0700
>> (PDT)
>> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>>         d=gmail.com; s=20120113;
>>
>> h=mime-version:in-reply-to:references:date:message-id:subject:from:to
>>          :cc:content-type;
>>         bh=F36l+I4dfDHTL7nQ0K9mAW4aVtPpVpYc0rWWpPNjt4c=;
>>
>> b=QfRdLEWf4clJqwkZSH7n0oHjXLNifWdhYxvCDZ+P37oSfM0vd/8Bx962hTflRQkD1q
>>
>>  2B3go7g8bpnQlhZgMRrZfT6hk7vUtNA3lOZjYeN+cPyoVRaBwm3LIID5vF4cw5hFAWaM
>>
>>  LUenU7E7b9kJY8JkyhIfpya8CLKz+Yo6EjCv3W6BAvv2YiNPgbOQkpx7u8y9dw0kHGig
>>
>>  1hv37Ey/DZpoKCgbSesv+sztYslevu+VBgxHFkveEyxH1saRr6OqTM7fnL2E6fP4E8qu
>>
>>  W81G1ZfNW1Pp9i5IcCb/9S7YTZDnBlUj4yROsOfNANRGmed71QpQD9l8NnAQXmeqoeNF
>>          SyEg==
>> MIME-Version: 1.0
>> X-Received: by 10.25.213.75 with SMTP id m72mr4047578lfg.17.1443483102618;
>>  Mon, 28 Sep 2015 16:31:42 -0700 (PDT)
>> Received: by 10.25.207.18 with HTTP; Mon, 28 Sep 2015 16:31:42 -0700 (PDT)
>> In-Reply-To: <CAPn6-YTZo+-A1HThmBO4mKO0sELrvxP7DZEpg3GoWN=Qz=
>> 2...@mail.gmail.com>
>> References: <
>> cangsv6-k+33gvgtiynwhz2gsbudf_wwwnazvupbqe8qdcg_...@mail.gmail.com>
>>         <CAPn6-YQ3Q-=HMrqz5FLLPx_HmjmHkHP7cwsPYvsxw-tb7a8P=
>> g...@mail.gmail.com>
>>         <CAPn6-YTZo+-A1HThmBO4mKO0sELrvxP7DZEpg3GoWN=Qz=
>> 2...@mail.gmail.com>
>> Date: Mon, 28 Sep 2015 16:31:42 -0700
>> Message-ID: <
>> cangsv69hyqbbvb8_8zshstlrpdy-37fjnwyvxce-xf7dphq...@mail.gmail.com>
>> Subject: Re: Spark streaming DStream state on worker
>> From: Renyi Xiong <renyixio...@gmail.com>
>> To: Shixiong Zhu <zsxw...@gmail.com>
>> Cc: "u...@spark.apache.org" <u...@spark.apache.org>
>> Content-Type: multipart/alternative; boundary=001a11411fde922c170520d71928
>>
>> --001a11411fde922c170520d71928
>> Content-Type: text/plain; charset=UTF-8
>>
>> you answered my question I think that RDDs from same DStream not
>> guaranteed
>> to run on same worker
>>
>> On Thu, Sep 24, 2015 at 1:51 AM, Shixiong Zhu <zsxw...@gmail.com> wrote:
>>
>> > +user, -dev
>> >
>> > It's not clear about `compute` in your question. There are two `compute`
>> > here.
>> >
>> > 1. DStream.compute: it always runs in the driver, and all RDDs are
>> created
>> > in the driver. E.g.,
>> >
>> > DStream.foreachRDD(rdd => rdd.count())
>> >
>> > "rdd.count()" is called in the driver.
>> >
>> > 2. RDD.compute: this will run in the executor and the location is not
>> > guaranteed. E.g.,
>> >
>> > DStream.foreachRDD(rdd => rdd.foreach { v =>
>> >     println(v)
>> > })
>> >
>> > "println(v)" is called in the executor.
>> >
>> >
>> > Best Regards,
>> > Shixiong Zhu
>> >
>> > 2015-09-17 3:47 GMT+08:00 Renyi Xiong <renyixio...@gmail.com>:
>> >
>> >> Hi,
>> >>
>> >> I want to do temporal join operation on DStream across RDDs, my
>> question
>> >> is: Are RDDs from same DStream always computed on same worker (except
>> >> failover) ?
>> >>
>> >> thanks,
>> >> Renyi.
>> >>
>> >
>> >
>> >
>>
>> --001a11411fde922c170520d71928
>> Content-Type: text/html; charset=UTF-8
>> Content-Transfer-Encoding: quoted-printable
>>
>> <div dir=3D"ltr">you answered my question I think that RDDs from same
>> DStre=
>> am not guaranteed to run on same worker</div><div
>> class=3D"gmail_extra"><br=
>> ><div class=3D"gmail_quote">On Thu, Sep 24, 2015 at 1:51 AM, Shixiong Zhu
>> <=
>> span dir=3D"ltr">&lt;<a href=3D"mailto:zsxw...@gmail.com";
>> target=3D"_blank"=
>> >zsxw...@gmail.com</a>&gt;</span> wrote:<br><blockquote
>> class=3D"gmail_quot=
>> e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc
>> solid;padding-left:1ex">=
>> <div dir=3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr"><div>+user,
>> -d=
>> ev</div><div><div class=3D"h5"><div><br></div><div>It&#39;s not clear
>> about=
>>  `compute` in your question. There are two `compute`
>> here.</div><div><br></=
>> div><div>1. DStream.compute: it always runs in the driver, and all RDDs
>> are=
>>  created in the driver.
>> E.g.,=C2=A0</div><div><br></div><div>DStream.foreac=
>> hRDD(rdd =3D&gt;
>> rdd.count())</div><div><br></div><div>&quot;rdd.count()&qu=
>> ot; is called in the driver.</div><div><br></div><div>2. RDD.compute:
>> this =
>> will run in the executor and the location is not guaranteed.
>> E.g.,</div><di=
>> v><br></div><div>DStream.foreachRDD(rdd =3D&gt; rdd.foreach { v
>> =3D&gt;</di=
>> v><div>=C2=A0 =C2=A0
>> println(v)</div><div>})<br></div><div><br></div><div>&=
>> quot;println(v)&quot; is called in the
>> executor.</div><br></div></div></div=
>> ><div><div class=3D"h5"><div class=3D"gmail_extra"><br
>> clear=3D"all"><div><=
>> div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div
>> dir=3D"ltr"><div><div =
>> dir=3D"ltr"><p>Best Regards,</p><div>Shixiong
>> Zhu</div></div></div></div></=
>> div></div></div></div></div></div><div><div>
>> <br><div class=3D"gmail_quote">2015-09-17 3:47 GMT+08:00 Renyi Xiong
>> <span =
>> dir=3D"ltr">&lt;<a href=3D"mailto:renyixio...@gmail.com";
>> target=3D"_blank">=
>> renyixio...@gmail.com</a>&gt;</span>:<br><blockquote
>> class=3D"gmail_quote" =
>> style=3D"margin:0 0 0 .8ex;border-left:1px #ccc
>> solid;padding-left:1ex"><di=
>> v dir=3D"ltr"><div>Hi,</div><div><br></div><div>I want to
>> do=C2=A0temporal =
>> join operation on DStream across RDDs, my question is: Are RDDs from same
>> D=
>> Stream always computed on same worker (except failover)
>> ?</div><div><br></d=
>> iv><div>thanks,</div><div>Renyi.</div></div>
>> </blockquote></div><br></div></div></div>
>> </div></div></div><br></div>
>> </blockquote></div><br></div>
>>
>> --001a11411fde922c170520d71928--
>>
>
>

Reply via email to