Re: Processing time series data in order

2016-12-21 Thread Ali Akhtar
The batch size can be large, so in memory ordering isn't an option,
unfortunately.

On Thu, Dec 22, 2016 at 7:09 AM, Jesse Hodges 
wrote:

> Depending on the expected max out of order window, why not order them in
> memory? Then you don't need to reread from Cassandra, in case of a problem
> you can reread data from Kafka.
>
> -Jesse
>
> > On Dec 21, 2016, at 7:24 PM, Ali Akhtar  wrote:
> >
> > - I'm receiving a batch of messages to a Kafka topic.
> >
> > Each message has a timestamp, however the messages can arrive / get
> processed out of order. I.e event 1's timestamp could've been a few seconds
> before event 2, and event 2 could still get processed before event 1.
> >
> > - I know the number of messages that are sent per batch.
> >
> > - I need to process the messages in order. The messages are basically
> providing the history of an item. I need to be able to track the history
> accurately (i.e, if an event occurred 3 times, i need to accurately log the
> dates of the first, 2nd, and 3rd time it occurred).
> >
> > The approach I'm considering is:
> >
> > - Creating a cassandra table which is ordered by the timestamp of the
> messages.
> >
> > - Once a batch of messages has arrived, writing them all to cassandra,
> counting on them being ordered by the timestamp even if they are processed
> out of order.
> >
> > - Then iterating over the messages in the cassandra table, to process
> them in order.
> >
> > However, I'm concerned about Cassandra's eventual consistency. Could it
> be that even though I wrote the messages, they are not there when I try to
> read them (which would be almost immediately after they are written)?
> >
> > Should I enforce consistency = ALL to make sure the messages will be
> available immediately after being written?
> >
> > Is there a better way to handle this thru either Kafka streams or
> Cassandra?
>


Re: Processing time series data in order

2016-12-21 Thread Jesse Hodges
Depending on the expected max out of order window, why not order them in 
memory? Then you don't need to reread from Cassandra, in case of a problem you 
can reread data from Kafka. 

-Jesse 

> On Dec 21, 2016, at 7:24 PM, Ali Akhtar  wrote:
> 
> - I'm receiving a batch of messages to a Kafka topic.
> 
> Each message has a timestamp, however the messages can arrive / get processed 
> out of order. I.e event 1's timestamp could've been a few seconds before 
> event 2, and event 2 could still get processed before event 1.
> 
> - I know the number of messages that are sent per batch.
> 
> - I need to process the messages in order. The messages are basically 
> providing the history of an item. I need to be able to track the history 
> accurately (i.e, if an event occurred 3 times, i need to accurately log the 
> dates of the first, 2nd, and 3rd time it occurred).
> 
> The approach I'm considering is:
> 
> - Creating a cassandra table which is ordered by the timestamp of the 
> messages.
> 
> - Once a batch of messages has arrived, writing them all to cassandra, 
> counting on them being ordered by the timestamp even if they are processed 
> out of order.
> 
> - Then iterating over the messages in the cassandra table, to process them in 
> order.
> 
> However, I'm concerned about Cassandra's eventual consistency. Could it be 
> that even though I wrote the messages, they are not there when I try to read 
> them (which would be almost immediately after they are written)?
> 
> Should I enforce consistency = ALL to make sure the messages will be 
> available immediately after being written?
> 
> Is there a better way to handle this thru either Kafka streams or Cassandra?


Processing time series data in order

2016-12-21 Thread Ali Akhtar
- I'm receiving a batch of messages to a Kafka topic.

Each message has a timestamp, however the messages can arrive / get
processed out of order. I.e event 1's timestamp could've been a few seconds
before event 2, and event 2 could still get processed before event 1.

- I know the number of messages that are sent per batch.

- I need to process the messages in order. The messages are basically
providing the history of an item. I need to be able to track the history
accurately (i.e, if an event occurred 3 times, i need to accurately log the
dates of the first, 2nd, and 3rd time it occurred).

The approach I'm considering is:

- Creating a cassandra table which is ordered by the timestamp of the
messages.

- Once a batch of messages has arrived, writing them all to cassandra,
counting on them being ordered by the timestamp even if they are processed
out of order.

- Then iterating over the messages in the cassandra table, to process them
in order.

However, I'm concerned about Cassandra's eventual consistency. Could it be
that even though I wrote the messages, they are not there when I try to
read them (which would be almost immediately after they are written)?

Should I enforce consistency = ALL to make sure the messages will be
available immediately after being written?

Is there a better way to handle this thru either Kafka streams or Cassandra?