@Jon, it looks like it is based on system date. >From ElasticsearchWriter.write: String indexPostfix = dateFormat.format(new Date()); ... indexName = indexName + "_index_" + indexPostfix; ... IndexRequestBuilder indexRequestBuilder = client.prepareIndex(indexName, sensorType + "_doc");
Justin On Tue, Feb 28, 2017 at 10:44 AM, zeo...@gmail.com <zeo...@gmail.com> wrote: > I'm actually a bit surprised to see METRON-691, because I know a while back > I did some experiments to ensure that data was being written to the indexes > that relate to the timestamp in the message, not the current time, and I > thought that messages were getting written to the proper historical > indexes, not the current one. This was so long ago now, though, that it > would require another look, and I only reviewed it operationally (put > message on topic with certain timestamp, search for it in kibana). > > If that is not the case currently (which I should be able to verify later > this week) then that would be pretty concerning and somewhat separate from > the previous "Metron Batch" style discussions, which are more focused on > data bulk load or historical analysis. > > I will wait to see how the rest of this conversation pans out before giving > my thoughts on the bigger picture. > > Jon > > On Tue, Feb 28, 2017 at 9:19 AM Casey Stella <ceste...@gmail.com> wrote: > > > I think this is a really tricky topic, but necessary. I've given it a > bit > > of thought over the last few months and I don't really see a great way to > > do it given the Profiler. Here's what I've come up with so far, though, > in > > my thinking. > > > > > > - Replaying events will compress events in time (e.g. 2 years of data > > may come through in 10 minutes) > > - Replaying events may result in events being out of order temporally > > even if it is written to kafka in order (just by virtue of hitting a > > different kafka partition) > > > > Given both of these, in my mind we should handle replaying of data *not* > > within a streaming context so we can control the order and the grouping > of > > the data. In my mind, this is essentially the advent of batch Metron. > Off > > the top of my head, I'm having trouble thinking about how to parallelize > > this, however, in a pretty manner. > > > > Imagine a scenario where telemetry A has an enrichment E1 that depends on > > profile P1 and profile P1 depends on the previous 10 minutes of data. > How > > in a batch or streaming context can we ever hope to ensure that the > > profiles for P1 for the last 10 minutes are in place as data flows > through > > across all data points? Now how about if the values that P1 depend on are > > computed from a profile P2? Essentially you have a data dependency graph > > between enrichments and profiles and raw data that you need to work in > > order. > > > > > > > > On Tue, Feb 28, 2017 at 8:03 AM, Justin Leet <justinjl...@gmail.com> > > wrote: > > > > > There's a couple JIRAs related to the use of system time vs event time. > > > > > > METRON-590 Enable Use of Event Time in Profiler > > > <https://issues.apache.org/jira/browse/METRON-590> > > > METRON-691 Elastic Writer index partitions on system time, not event > time > > > <https://issues.apache.org/jira/browse/METRON-691> > > > > > > Is there anything else that needs to be making this distinction, and if > > so, > > > do we need to be able to support both system time and event time for > it? > > > > > > My immediate thought on this is that, once we work on replaying > > historical > > > data, we'll want system time for geo data passing through. Given that > > the > > > geo files can update, we'd want to know which geo file we actually need > > to > > > be using at the appropriate time. > > > > > > We'll probably also want to double check anything else that writes out > > data > > > to a location and provides some sort of timestamping on it. > > > > > > Justin > > > > > > -- > > Jon > > Sent from my mobile device >