Re: Kafka streams vs Spark streaming

Sachin Mittal Wed, 11 Oct 2017 04:54:46 -0700

Well depends upon use case. Say the metric you are evaluating is grouped by
a key and you want to parallelize the operation by adding more instances so
certain instance deal with only a particular group it is always better to
have partitioning also done on that key. This way a particular instance
will always compute upon certain partitions and hence certain keys.


So in such case you need to make sure the producers are also producing
based on that key.

Its optional yes but for good performance one needs to ensure topics are
partitioned based on key hashes.

In spark this is not needed as it is not backed by a topic.

In short kafka streams are backed by a topic and that does create some
downside (side by side having some upsides too).


On Wed, Oct 11, 2017 at 2:00 PM, Sabarish Sasidharan <sabarish....@gmail.com
> wrote:

> @Sachin
> >>The partition key is very important if you need to run multiple
> instances of streams application and certain instance processing certain
> partitions only.
>
> Again, depending on partition key is optional. It's actually a feature
> enabler, so we can use local state stores to improve throughput. I don't
> see this as a downside.
>
> Regards
> Sab
>
> On 11 Oct 2017 1:44 pm, "Sachin Mittal" <sjmit...@gmail.com> wrote:
>
>> Kafka streams has a lower learning curve and if your source data is in
>> kafka topics it is pretty simple to integrate it with.
>> It can run like a library inside your main programs.
>>
>> So as compared to spark streams
>> 1. Is much simpler to implement.
>> 2. Is not much heavy on hardware unlike spark.
>>
>>
>> On the downside
>> 1. It is not elastic. You need to anticipate before hand on volume of
>> data you will have. Very difficult to add and reduce topic partitions later
>> on.
>> 2. The partition key is very important if you need to run multiple
>> instances of streams application and certain instance processing certain
>> partitions only.
>>      In case you need aggregation on a different key you may need to
>> re-partition the data to a new topic and run new streams app against that.
>>
>> So yes if you have good idea about your data and if it comes from kafka
>> and you want to build something quick without much hardware kafka streams
>> is a way to go.
>>
>> We had first tried spark streaming but given hardware limitation and
>> complexity of fetching data from mongodb we decided kafka streams as way to
>> go forward.
>>
>> Thanks
>> Sachin
>>
>>
>>
>>
>>
>> On Wed, Oct 11, 2017 at 1:01 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Has anyone had an experience of using Kafka streams versus Spark?
>>>
>>> I am not familiar with Kafka streams concept except that it is a set of
>>> libraries.
>>>
>>> Any feedback will be appreciated.
>>>
>>> Regards,
>>>
>>> Mich
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>
>>

Re: Kafka streams vs Spark streaming

Reply via email to