I'm actually a bit surprised to see METRON-691, because I know a while back
I did some experiments to ensure that data was being written to the indexes
that relate to the timestamp in the message, not the current time, and I
thought that messages were getting written to the proper historical
indexes, not the current one.  This was so long ago now, though, that it
would require another look, and I only reviewed it operationally (put
message on topic with certain timestamp, search for it in kibana).

If that is not the case currently (which I should be able to verify later
this week) then that would be pretty concerning and somewhat separate from
the previous "Metron Batch" style discussions, which are more focused on
data bulk load or historical analysis.

I will wait to see how the rest of this conversation pans out before giving
my thoughts on the bigger picture.

Jon

On Tue, Feb 28, 2017 at 9:19 AM Casey Stella <ceste...@gmail.com> wrote:

> I think this is a really tricky topic, but necessary.  I've given it a bit
> of thought over the last few months and I don't really see a great way to
> do it given the Profiler.  Here's what I've come up with so far, though, in
> my thinking.
>
>
>    - Replaying events will compress events in time (e.g. 2 years of data
>    may come through in 10 minutes)
>    - Replaying events may result in events being out of order temporally
>    even if it is written to kafka in order (just by virtue of hitting a
>    different kafka partition)
>
> Given both of these, in my mind we should handle replaying of data *not*
> within a streaming context so we can control the order and the grouping of
> the data.  In my mind, this is essentially the advent of batch Metron.  Off
> the top of my head, I'm having trouble thinking about how to parallelize
> this, however, in a pretty manner.
>
> Imagine a scenario where telemetry A has an enrichment E1 that depends on
> profile P1 and profile P1 depends on the previous 10 minutes of data.  How
> in a batch or streaming context can we ever hope to ensure that the
> profiles for P1 for the last 10 minutes are in place as data flows through
> across all data points? Now how about if the values that P1 depend on are
> computed from a profile P2?  Essentially you have a data dependency graph
> between enrichments and profiles and raw data that you need to work in
> order.
>
>
>
> On Tue, Feb 28, 2017 at 8:03 AM, Justin Leet <justinjl...@gmail.com>
> wrote:
>
> > There's a couple JIRAs related to the use of system time vs event time.
> >
> > METRON-590 Enable Use of Event Time in Profiler
> > <https://issues.apache.org/jira/browse/METRON-590>
> > METRON-691 Elastic Writer index partitions on system time, not event time
> > <https://issues.apache.org/jira/browse/METRON-691>
> >
> > Is there anything else that needs to be making this distinction, and if
> so,
> > do we need to be able to support both system time and event time for it?
> >
> > My immediate thought on this is that, once we work on replaying
> historical
> > data, we'll want system time for geo data passing through.  Given that
> the
> > geo files can update, we'd want to know which geo file we actually need
> to
> > be using at the appropriate time.
> >
> > We'll probably also want to double check anything else that writes out
> data
> > to a location and provides some sort of timestamping on it.
> >
> > Justin
> >
>
-- 

Jon

Sent from my mobile device

Reply via email to