Re: [DISCUSS] System time vs. Event Time

Justin Leet Tue, 28 Feb 2017 08:09:25 -0800

@Jon, it looks like it is based on system date.

>From ElasticsearchWriter.write:
String indexPostfix = dateFormat.format(new Date());
...
indexName = indexName + "_index_" + indexPostfix;
...
IndexRequestBuilder indexRequestBuilder = client.prepareIndex(indexName,
sensorType + "_doc");


Justin

On Tue, Feb 28, 2017 at 10:44 AM, zeo...@gmail.com <zeo...@gmail.com> wrote:

> I'm actually a bit surprised to see METRON-691, because I know a while back
> I did some experiments to ensure that data was being written to the indexes
> that relate to the timestamp in the message, not the current time, and I
> thought that messages were getting written to the proper historical
> indexes, not the current one.  This was so long ago now, though, that it
> would require another look, and I only reviewed it operationally (put
> message on topic with certain timestamp, search for it in kibana).
>
> If that is not the case currently (which I should be able to verify later
> this week) then that would be pretty concerning and somewhat separate from
> the previous "Metron Batch" style discussions, which are more focused on
> data bulk load or historical analysis.
>
> I will wait to see how the rest of this conversation pans out before giving
> my thoughts on the bigger picture.
>
> Jon
>
> On Tue, Feb 28, 2017 at 9:19 AM Casey Stella <ceste...@gmail.com> wrote:
>
> > I think this is a really tricky topic, but necessary.  I've given it a
> bit
> > of thought over the last few months and I don't really see a great way to
> > do it given the Profiler.  Here's what I've come up with so far, though,
> in
> > my thinking.
> >
> >
> >    - Replaying events will compress events in time (e.g. 2 years of data
> >    may come through in 10 minutes)
> >    - Replaying events may result in events being out of order temporally
> >    even if it is written to kafka in order (just by virtue of hitting a
> >    different kafka partition)
> >
> > Given both of these, in my mind we should handle replaying of data *not*
> > within a streaming context so we can control the order and the grouping
> of
> > the data.  In my mind, this is essentially the advent of batch Metron.
> Off
> > the top of my head, I'm having trouble thinking about how to parallelize
> > this, however, in a pretty manner.
> >
> > Imagine a scenario where telemetry A has an enrichment E1 that depends on
> > profile P1 and profile P1 depends on the previous 10 minutes of data.
> How
> > in a batch or streaming context can we ever hope to ensure that the
> > profiles for P1 for the last 10 minutes are in place as data flows
> through
> > across all data points? Now how about if the values that P1 depend on are
> > computed from a profile P2?  Essentially you have a data dependency graph
> > between enrichments and profiles and raw data that you need to work in
> > order.
> >
> >
> >
> > On Tue, Feb 28, 2017 at 8:03 AM, Justin Leet <justinjl...@gmail.com>
> > wrote:
> >
> > > There's a couple JIRAs related to the use of system time vs event time.
> > >
> > > METRON-590 Enable Use of Event Time in Profiler
> > > <https://issues.apache.org/jira/browse/METRON-590>
> > > METRON-691 Elastic Writer index partitions on system time, not event
> time
> > > <https://issues.apache.org/jira/browse/METRON-691>
> > >
> > > Is there anything else that needs to be making this distinction, and if
> > so,
> > > do we need to be able to support both system time and event time for
> it?
> > >
> > > My immediate thought on this is that, once we work on replaying
> > historical
> > > data, we'll want system time for geo data passing through.  Given that
> > the
> > > geo files can update, we'd want to know which geo file we actually need
> > to
> > > be using at the appropriate time.
> > >
> > > We'll probably also want to double check anything else that writes out
> > data
> > > to a location and provides some sort of timestamping on it.
> > >
> > > Justin
> > >
> >
> --
>
> Jon
>
> Sent from my mobile device
>

Re: [DISCUSS] System time vs. Event Time

Reply via email to