Re: Spark Streaming + Kudu

Ravi Kanth Mon, 26 Feb 2018 20:05:58 -0800

Thank Clifford. We are running Kudu 1.4 version. Till date we didn't see
any issues in production and we are not losing tablet servers. But, as part
of testing I have to generate few unforeseen cases to analyse the
application performance. One among that is bringing down the tablet server
or master server intentionally during which I observed the loss of records.
Just wanted to test cases out of the happy path here. Once again thanks for
taking time to respond to me.


- Ravi

On 26 February 2018 at 19:58, Clifford Resnick <cresn...@mediamath.com>
wrote:

> I'll have to get back to you on the code bits, but I'm pretty sure we're
> doing simple sync batching. We're not in production yet, but after some
> months of development I haven't seen any failures, even when pushing load
> doing multiple years' backfill. I think the real question is why are you
> losing tablet servers? The only instability we ever had with Kudu was when
> it had that weird ntp sync issue that was fixed I think for 1.6. What
> version are you running?
>
> Anyway I would think that infinite loop should be catchable somewhere. Our
> pipeline is set to fail/retry with Flink snapshots. I imagine there is
> similar with Spark. Sorry I cant be of more help!
>
>
>
> On Feb 26, 2018 9:10 PM, Ravi Kanth <ravikanth....@gmail.com> wrote:
>
> Cliff,
>
> Thanks for the response. Well, I do agree that its simple and seamless. In
> my case, I am able to upsert ~25000 events/sec into Kudu. But, I am facing
> the problem when any of the Kudu Tablet or master server is down. I am not
> able to get a hold of the exception from client. The client is going into
> an infinite loop trying to connect to Kudu. Meanwhile, I am loosing my
> records. I tried handling the errors through getPendingErrors() but still
> it is helpless. I am using AsyncKuduClient to establish the connection and
> retrieving the syncClient from the Async to open the session and table. Any
> help?
>
> Thanks,
> Ravi
>
> On 26 February 2018 at 18:00, Cliff Resnick <cre...@gmail.com> wrote:
>
> While I can't speak for Spark, we do use the client API from Flink
> streaming and it's simple and seamless. It's especially nice if you require
> an Upsert semantic.
>
> On Feb 26, 2018 7:51 PM, "Ravi Kanth" <ravikanth....@gmail.com> wrote:
>
> Hi,
>
> Anyone using Spark Streaming to ingest data into Kudu and using Kudu
> Client API to do so rather than the traditional KuduContext API? I am stuck
> at a point and couldn't find a solution.
>
> Thanks,
> Ravi
>
>
>
>

Re: Spark Streaming + Kudu

Reply via email to