Hey Roger,

I'm not sure if I understand the case you are describing.

As Chris says we don't yet give you fined grained control over when history
starts to disappear (though we designed with the intention of making that
configurable later). However I'm not sure if you need that for the case you
describe.

Say you have a job J that takes inputs I1...IN and produces output O1...ON
and in the process accumulates state in a topic S. I think the approach is
to launch a J' (changed or improved in some way) that reprocesses I1...IN
from the beginning of time (or some past point) into O1'...ON' and
accumulates state in S'. So the state for J and the state for J' are
totally independent. J' can't reuse J's state in general because the code
that generates that state may have changed.

-Jay

On Thu, Feb 19, 2015 at 9:30 AM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> Chris + Samza Devs,
>
> I was wondering whether Samza could support re-processing as described by
> the Kappa architecture or Liquid (
> http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper25u.pdf).
>
> It seems that a changelog is not sufficient to be able to restore state
> backward in time.  Kafka compaction will guarantee that local state can be
> restored from where it left off but I don't see how it can restore past
> state.
>
> Imagine the case where a stream job has a lot of state in it's local store
> but it has not updated any keys in a long time.
>
> Time t1: All of the data would be in the tail of the Kafka log (past the
> cleaner point).
> Time t2:  The job updates some keys.   Now we're in a state where the next
> compaction will blow away the old values for those keys.
> Time t3:  Compaction occurs and old values are discarded.
>
> Say we want to launch a re-processing job that would begin from t1.  If we
> launch that job before t3, it will correctly restore it's state.  However,
> if we launch the job after t3, it will be missing old values, right?
>
> Unless I'm misunderstanding something, the only way around this is to keep
> snapshots in addition to the changelog.  Has there been any discussion of
> providing an option in Samza of taking RocksDB snapshots and persisting
> them to an object store or HDFS?
>
> Thanks,
>
> Roger
>

Reply via email to