Re: Flink, Kappa and Lambda

2015-11-13 Thread Welly Tambunan
Hi rss rss,

Yes. I have already read that book.

However given the state of streaming right now, and Kappa Architecture, I
don't think we need Lambda Architecture again ?

Any thoughts ?

On Thu, Nov 12, 2015 at 12:29 PM, rss rss  wrote:

> Hello,
>
>   regarding the Lambda architecture there is a following book -
> https://www.manning.com/books/big-data (Big Data. Principles and best
> practices of scalable realtime data systems
>  Nathan Marz and James Warren).
>
> Regards,
> Roman
>
> 2015-11-12 4:47 GMT+03:00 Welly Tambunan :
>
>> Hi Stephan,
>>
>>
>> Thanks for your response.
>>
>>
>> We are trying to justify whether it's enough to use Kappa Architecture
>> with Flink. This more about resiliency and message lost issue etc.
>>
>> The article is worry about message lost even if you are using Kafka.
>>
>> No matter the message queue or broker you rely on whether it be RabbitMQ,
>> JMS, ActiveMQ, Websphere, MSMQ and yes even Kafka you can lose messages in
>> any of the following ways:
>>
>>- A downstream system from the broker can have data loss
>>- All message queues today can lose already acknowledged messages
>>during failover or leader election.
>>- A bug can send the wrong messages to the wrong systems.
>>
>> Cheers
>>
>> On Wed, Nov 11, 2015 at 4:13 PM, Stephan Ewen  wrote:
>>
>>> Hi!
>>>
>>> Can you explain a little more what you want to achieve? Maybe then we
>>> can give a few more comments...
>>>
>>> I briefly read through some of the articles you linked, but did not
>>> quite understand their train of thoughts.
>>> For example, letting Tomcat write to Cassandra directly, and to Kafka,
>>> might just be redundant. Why not let the streaming job that reads the Kafka
>>> queue
>>> move the data to Cassandra as one of its results? Further more, durable
>>> storing the sequence of events is exactly what Kafka does, but the article
>>> suggests to use Cassandra for that, which I find very counter intuitive.
>>> It looks a bit like the suggested approach is only adopting streaming for
>>> half the task.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Tue, Nov 10, 2015 at 7:49 AM, Welly Tambunan 
>>> wrote:
>>>
 Hi All,

 I read a couple of article about Kappa and Lambda Architecture.


 http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink/

 I'm convince that Flink will simplify this one with streaming.

 However i also stumble upon this blog post that has valid argument to
 have a system of record storage ( event sourcing ) and finally lambda
 architecture is appear at the solution. Basically it will write twice to
 Queuing system and C* for safety. System of record here is basically
 storing the event (delta).

 [image: Inline image 1]


 https://lostechies.com/ryansvihla/2015/09/17/event-sourcing-and-system-of-record-sane-distributed-development-in-the-modern-era-2/

 Another approach is about lambda architecture for maintaining the
 correctness of the system.


 https://lostechies.com/ryansvihla/2015/09/17/real-time-analytics-with-spark-streaming-and-cassandra/


 Given that he's using Spark for the streaming processor, do we have to
 do the same thing with Apache Flink ?



 Cheers
 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com 

>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Flink, Kappa and Lambda

2015-11-11 Thread Nick Dimiduk
The first and 3rd points here aren't very fair -- they apply to all data
systems. Systems downstream of your database can lose data in the same way;
the database retention policy expires old data, downstream fails, and back
to the tapes you must go. Likewise with 3, a bug in any ETL system can
cause problems. Also not specific to streaming in general or Kafka/Flink
specifically.

I'm much more curious about the 2nd claim. The whole point of high
availability in these systems is to not lose data during failure. The
post's author is not specific on any of these points, but just like I look
to a distributed database community to prove to me it doesn't lose data in
these corner cases, so too do I expect Kafka to prove it is resilient. In
the absence of software formally proven correct, I look to empirical
evidence in the form of chaos monkey type tests.

On Wednesday, November 11, 2015, Welly Tambunan  wrote:

> Hi Stephan,
>
>
> Thanks for your response.
>
>
> We are trying to justify whether it's enough to use Kappa Architecture
> with Flink. This more about resiliency and message lost issue etc.
>
> The article is worry about message lost even if you are using Kafka.
>
> No matter the message queue or broker you rely on whether it be RabbitMQ,
> JMS, ActiveMQ, Websphere, MSMQ and yes even Kafka you can lose messages in
> any of the following ways:
>
>- A downstream system from the broker can have data loss
>- All message queues today can lose already acknowledged messages
>during failover or leader election.
>- A bug can send the wrong messages to the wrong systems.
>
> Cheers
>
> On Wed, Nov 11, 2015 at 4:13 PM, Stephan Ewen  > wrote:
>
>> Hi!
>>
>> Can you explain a little more what you want to achieve? Maybe then we can
>> give a few more comments...
>>
>> I briefly read through some of the articles you linked, but did not quite
>> understand their train of thoughts.
>> For example, letting Tomcat write to Cassandra directly, and to Kafka,
>> might just be redundant. Why not let the streaming job that reads the Kafka
>> queue
>> move the data to Cassandra as one of its results? Further more, durable
>> storing the sequence of events is exactly what Kafka does, but the article
>> suggests to use Cassandra for that, which I find very counter intuitive.
>> It looks a bit like the suggested approach is only adopting streaming for
>> half the task.
>>
>> Greetings,
>> Stephan
>>
>>
>> On Tue, Nov 10, 2015 at 7:49 AM, Welly Tambunan > > wrote:
>>
>>> Hi All,
>>>
>>> I read a couple of article about Kappa and Lambda Architecture.
>>>
>>>
>>> http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink/
>>>
>>> I'm convince that Flink will simplify this one with streaming.
>>>
>>> However i also stumble upon this blog post that has valid argument to
>>> have a system of record storage ( event sourcing ) and finally lambda
>>> architecture is appear at the solution. Basically it will write twice to
>>> Queuing system and C* for safety. System of record here is basically
>>> storing the event (delta).
>>>
>>> [image: Inline image 1]
>>>
>>>
>>> https://lostechies.com/ryansvihla/2015/09/17/event-sourcing-and-system-of-record-sane-distributed-development-in-the-modern-era-2/
>>>
>>> Another approach is about lambda architecture for maintaining the
>>> correctness of the system.
>>>
>>>
>>> https://lostechies.com/ryansvihla/2015/09/17/real-time-analytics-with-spark-streaming-and-cassandra/
>>>
>>>
>>> Given that he's using Spark for the streaming processor, do we have to
>>> do the same thing with Apache Flink ?
>>>
>>>
>>>
>>> Cheers
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com 
>>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


Re: Flink, Kappa and Lambda

2015-11-11 Thread Welly Tambunan
Hi Stephan,


Thanks for your response.


We are trying to justify whether it's enough to use Kappa Architecture with
Flink. This more about resiliency and message lost issue etc.

The article is worry about message lost even if you are using Kafka.

No matter the message queue or broker you rely on whether it be RabbitMQ,
JMS, ActiveMQ, Websphere, MSMQ and yes even Kafka you can lose messages in
any of the following ways:

   - A downstream system from the broker can have data loss
   - All message queues today can lose already acknowledged messages during
   failover or leader election.
   - A bug can send the wrong messages to the wrong systems.

Cheers

On Wed, Nov 11, 2015 at 4:13 PM, Stephan Ewen  wrote:

> Hi!
>
> Can you explain a little more what you want to achieve? Maybe then we can
> give a few more comments...
>
> I briefly read through some of the articles you linked, but did not quite
> understand their train of thoughts.
> For example, letting Tomcat write to Cassandra directly, and to Kafka,
> might just be redundant. Why not let the streaming job that reads the Kafka
> queue
> move the data to Cassandra as one of its results? Further more, durable
> storing the sequence of events is exactly what Kafka does, but the article
> suggests to use Cassandra for that, which I find very counter intuitive.
> It looks a bit like the suggested approach is only adopting streaming for
> half the task.
>
> Greetings,
> Stephan
>
>
> On Tue, Nov 10, 2015 at 7:49 AM, Welly Tambunan  wrote:
>
>> Hi All,
>>
>> I read a couple of article about Kappa and Lambda Architecture.
>>
>>
>> http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink/
>>
>> I'm convince that Flink will simplify this one with streaming.
>>
>> However i also stumble upon this blog post that has valid argument to
>> have a system of record storage ( event sourcing ) and finally lambda
>> architecture is appear at the solution. Basically it will write twice to
>> Queuing system and C* for safety. System of record here is basically
>> storing the event (delta).
>>
>> [image: Inline image 1]
>>
>>
>> https://lostechies.com/ryansvihla/2015/09/17/event-sourcing-and-system-of-record-sane-distributed-development-in-the-modern-era-2/
>>
>> Another approach is about lambda architecture for maintaining the
>> correctness of the system.
>>
>>
>> https://lostechies.com/ryansvihla/2015/09/17/real-time-analytics-with-spark-streaming-and-cassandra/
>>
>>
>> Given that he's using Spark for the streaming processor, do we have to do
>> the same thing with Apache Flink ?
>>
>>
>>
>> Cheers
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Flink, Kappa and Lambda

2015-11-09 Thread Welly Tambunan
Hi All,

I read a couple of article about Kappa and Lambda Architecture.

http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink/

I'm convince that Flink will simplify this one with streaming.

However i also stumble upon this blog post that has valid argument to have
a system of record storage ( event sourcing ) and finally lambda
architecture is appear at the solution. Basically it will write twice to
Queuing system and C* for safety. System of record here is basically
storing the event (delta).

[image: Inline image 1]

https://lostechies.com/ryansvihla/2015/09/17/event-sourcing-and-system-of-record-sane-distributed-development-in-the-modern-era-2/

Another approach is about lambda architecture for maintaining the
correctness of the system.

https://lostechies.com/ryansvihla/2015/09/17/real-time-analytics-with-spark-streaming-and-cassandra/


Given that he's using Spark for the streaming processor, do we have to do
the same thing with Apache Flink ?



Cheers
-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com