Hi again,

Thought I'd share some numbers on performance comparison between Kafka
Connect and Apache Spark.

In my case, for ingesting data from Kafka to HDFS (HDP), Kafka Connect
performs ~25% faster than Apache Spark. Both consumers read from a single,
non-partitioned Kafka topic containing 5 million records (~3 GB in size).
Kafka Connect ran in standalone mode while Apache Spark distributed the
writing part into HDFS by spreading it across 5 machines. It definitely
seems like Kafka Connect's performance can be improved further by figuring
out an optimal batch size for writing into HDFS.

Regards,
Shiva


On Wed, May 3, 2017 at 10:57 AM, Roland Kuhn <goo...@rkuhn.info> wrote:

> Whoever this is: would you please abstain from such anonymous
> unsubstantiated claims? I cannot comment on the truthfulness of your
> critique, common courtesy would entail that I know at least your name to
> ask for supporting evidence.
>
> Regards, Roland
>
> Sent from my iPhone
>
> On 3. May 2017, at 06:36, 'Ryan Tanner' via Akka User List <
> akka-user@googlegroups.com> wrote:
>
> Be careful with Flink.  IME it's got a long way to go.  The core model is
> fantastic but there's a lot of low-hanging fruit that needs to be fixed.
>
> On Tuesday, May 2, 2017 at 4:03:12 PM UTC-6, Evan Chan wrote:
>>
>> Hi Shiva,
>>
>> Spark will likely be too high latency for you.  Practical minimal batch
>> size is a couple seconds.
>>
>> Think of Akka as a best fit for if you want to deploy individual apps
>> each reading from some set of fixed Kafka partitions.  Each one could then
>> write to HDFS.  However you would need to handle failover, state, etc. etc.
>> It is likely that Flink has much more built in for you - HDFS
>> integration, checkpointing, failover, shuffling/routing of messages,
>> integration with Kafka, etc.
>>
>> You might want to look into Intel Gearpump - this is Akka-based very low
>> latency dynamic stream processing, and they have handled distribution
>> already.
>>
>> -Evan
>>
>> On Friday, April 28, 2017 at 7:52:50 AM UTC-7, Shivakumar Ramagopal wrote:
>>>
>>> Viktor,
>>>
>>>
>>> On Fri, Apr 28, 2017 at 5:03 PM, Viktor Klang <viktor...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Apr 28, 2017 at 1:12 PM, Shiva Ramagopal <tr....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Viktor,
>>>>>
>>>>> On Fri, Apr 28, 2017 at 2:55 PM, Viktor Klang <viktor...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Shiva,
>>>>>>
>>>>>> On Fri, Apr 28, 2017 at 11:20 AM, Shiva Ramagopal <tr....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm looking to compare Kafka Streams vs Akka Streams in two areas:
>>>>>>>
>>>>>>> 1. For ingesting between Kafka and HDFS/RDBMS
>>>>>>>
>>>>>>> Requirements are mainly around performance and latency. A Kafka
>>>>>>> topic can have several million events, each corresponding to a database
>>>>>>> change capture. When ingesting this topic into HDFS I'm also looking to
>>>>>>> partition the data by day, typically based on a timestamp field in the
>>>>>>> event record, aggregations on-the-fly (say by a userid field) and
>>>>>>> parquetizing (preferably on the fly to optimize performance by reducing 
>>>>>>> two
>>>>>>> I/O operations).
>>>>>>>
>>>>>>
>>>>>> Looking forward to see your benchmark!
>>>>>>
>>>>>
>>>>> Hey, you wanted requirements! :)
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 2. Low-latency processing
>>>>>>>
>>>>>>> Experiences around performances of Storm/Flink and Akka Streams
>>>>>>> would be *very* nice. Typical use-cases are de-duping and enrichment 
>>>>>>> with
>>>>>>> metrics computation (# duplicate events/records, aggregate metrics etc).
>>>>>>> Low latency and scalability are main considerations
>>>>>>>
>>>>>>
>>>>>> Low latency is not a metric, and scalability is not a profile. :)
>>>>>> In other words: What latency distribution are you targeting and what
>>>>>> scalability curve?
>>>>>>
>>>>>
>>>>> I know latency is a nuanced topic. I'm just looking for broad
>>>>> experiences on performance comparisons if anyone has done that. Currently
>>>>> we have a Storm topology over 5 nodes doing enrichment of events from 
>>>>> Kafka
>>>>> that involves a lookup into a db per event. 90th percentile of latency of
>>>>> this processing is under 200 ms and we are happy with this. While Storm is
>>>>> mature, Akka Streams seems more general purpose than Storm. I'd like to 
>>>>> use
>>>>> Akka Streams for this reason if performance is comparable to Storm.
>>>>>
>>>>
>>>> There's one important architectural difference here tho: Akka Streams
>>>> are local-only (as in materialization). You can of course materialize Akka
>>>> Streams on multiple nodes and use a transport to coordinate data
>>>> processing. Interestingly it doesn't lock you in to a particular backend
>>>> such that Kafka Streams or even Storm would do.
>>>>
>>>
>>> The materialization part was something I was not aware of. Thanks for
>>> pointing it out, really appreciate it.
>>>
>>>>
>>>> All of this boils down to requirements. Something like Flink or Google
>>>> Beam could be viable options here as well. In your situation I'd look at
>>>> the requirements and make a couple of prototypes before picking a winner.
>>>>
>>>
>>> I was also looking for something to unify batch and streaming - which is
>>> how I came to look at SMACK. I'd really like to keep the tech stack small
>>> and have the parts integrate tightly with each other. Guess I have to pick
>>> between Spark, Flink, Beam and Storm.
>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 27, 2017 at 8:31 PM, Viktor Klang <viktor...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 27, 2017 at 10:39 AM, Shiva Ramagopal <tr....@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have read through multiple articles describing the SMACK stack
>>>>>>>>> but I'm having difficulty understanding the role of Akka in the 
>>>>>>>>> stack. How
>>>>>>>>> does Akka fit in?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Akka is for building the application itself.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also I would like to know of experiences in using Akka Streams vs
>>>>>>>>> Kafka Connect for ingesting from Kafka into HDFS (Hive) and RDBMS. Has
>>>>>>>>> anyone used Akka Streams for say, dynamic partitioning of events from 
>>>>>>>>> a
>>>>>>>>> Kafka topic into HDFS?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Based on what requirements?
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> TIA
>>>>>>>>> -Shiva
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/c
>>>>>>>>> urrent/additional/faq.html
>>>>>>>>> >>>>>>>>>> Search the archives: https://groups.google.com/grou
>>>>>>>>> p/akka-user
>>>>>>>>> ---
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "Akka User List" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to akka-user+...@googlegroups.com.
>>>>>>>>> To post to this group, send email to akka...@googlegroups.com.
>>>>>>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Cheers,
>>>>>>>> √
>>>>>>>>
>>>>>>>> --
>>>>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/c
>>>>>>>> urrent/additional/faq.html
>>>>>>>> >>>>>>>>>> Search the archives: https://groups.google.com/grou
>>>>>>>> p/akka-user
>>>>>>>> ---
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "Akka User List" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to akka-user+...@googlegroups.com.
>>>>>>>> To post to this group, send email to akka...@googlegroups.com.
>>>>>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/c
>>>>>>> urrent/additional/faq.html
>>>>>>> >>>>>>>>>> Search the archives: https://groups.google.com/grou
>>>>>>> p/akka-user
>>>>>>> ---
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Akka User List" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to akka-user+...@googlegroups.com.
>>>>>>> To post to this group, send email to akka...@googlegroups.com.
>>>>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Cheers,
>>>>>> √
>>>>>>
>>>>>> --
>>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/c
>>>>>> urrent/additional/faq.html
>>>>>> >>>>>>>>>> Search the archives: https://groups.google.com/grou
>>>>>> p/akka-user
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Akka User List" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to akka-user+...@googlegroups.com.
>>>>>> To post to this group, send email to akka...@googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/c
>>>>> urrent/additional/faq.html
>>>>> >>>>>>>>>> Search the archives: https://groups.google.com/grou
>>>>> p/akka-user
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Akka User List" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to akka-user+...@googlegroups.com.
>>>>> To post to this group, send email to akka...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Cheers,
>>>> √
>>>>
>>>> --
>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/c
>>>> urrent/additional/faq.html
>>>> >>>>>>>>>> Search the archives: https://groups.google.com/grou
>>>> p/akka-user
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Akka User List" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to akka-user+...@googlegroups.com.
>>>> To post to this group, send email to akka...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to