Hi

Thanks for the reply. here is my situation: I hve a DB which enbles
synchronus CDC, think this as a DBtrigger which writes to a taable with
"changed" values as soon as something changes in production table. My job
will need to pick up the data "as soon as it arrives" which can be every 1
min interval. Ideally it will pick up the changes, transform it into a
jsonand puts it to kinesis. In short, I am emulating a Kinesis producer
with a DB source (dont even ask why, lets say these are the constraints :) )

Please advice (a) is spark a good choice here (b)  whats your suggestion
either way.

I understand I can easily do it using a simple java/python app but I am
little worried about managing scaling/fault tolerance and thats where my
concern is.

TIA
Ayan

On Mon, Jul 6, 2015 at 12:51 AM, Ashic Mahtab <as...@live.com> wrote:

> Hi Ayan,
> How "continuous" is your workload? As Akhil points out, with streaming,
> you'll give up at least one core for receiving, will need at most one more
> core for processing. Unless you're running on something like Mesos, this
> means that those cores are dedicated to your app, and can't be leveraged by
> other apps / jobs.
>
> If it's something periodic (once an hour, once every 15 minutes, etc.),
> then I'd simply write a "normal" spark application, and trigger it
> periodically. There are many things that can take care of that - sometimes
> a simple cronjob is enough!
>
> ------------------------------
> Date: Sun, 5 Jul 2015 22:48:37 +1000
> Subject: Re: JDBC Streams
> From: guha.a...@gmail.com
> To: ak...@sigmoidanalytics.com
> CC: user@spark.apache.org
>
>
> Thanks Akhil. In case I go with spark streaming, I guess I have to
> implment a custom receiver and spark streaming will call this receiver
> every batch interval, is that correct? Any gotcha you see in this plan?
> TIA...Best, Ayan
>
> On Sun, Jul 5, 2015 at 5:40 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
> If you want a long running application, then go with spark streaming
> (which kind of blocks your resources). On the other hand, if you use job
> server then you can actually use the resources (CPUs) for other jobs also
> when your dbjob is not using them.
>
> Thanks
> Best Regards
>
> On Sun, Jul 5, 2015 at 5:28 AM, ayan guha <guha.a...@gmail.com> wrote:
>
> Hi All
>
> I have a requireent to connect to a DB every few minutes and bring data to
> HBase. Can anyone suggest if spark streaming would be appropriate for this
> senario or I shoud look into jobserver?
>
> Thanks in advance
>
> --
> Best Regards,
> Ayan Guha
>
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>



-- 
Best Regards,
Ayan Guha

Reply via email to