Hi, This is in interesting discussion. Would you mind moving it to the jira or it's own DISCUSS thread?
Thanks! -D... On Thu, Nov 3, 2016 at 7:26 AM, zeo...@gmail.com <zeo...@gmail.com> wrote: > To clarify, it only needs to truncate fields > 32766 which need a > full/exact string match search to be run on them (analyzed fields generally > would not hit this limitation but I guess in theory they could). However, > that's probably every field which can get > 32766 because I'm assuming > those will all be strings. > > I also think using the profiler to monitor the truncation action could be a > useful default. > > Jon > > On Wed, Nov 2, 2016, 21:08 zeo...@gmail.com <zeo...@gmail.com> wrote: > > > That would break searching on uri entirely unless you queried and knew to > > truncate at 32766 because it's not analyzed. I don't like pushing that > > complication to the end user. > > > > I would suggest truncation in the indexingBolt (not using stellar because > > you'd want this across the board) for all fields > 32766 (how do we make > > sure this gets updated if the limitation changes in Lucene?) and adding > > metadata key-value pairs (pre-trunc length, hash, truncated bool, etc.). > > In the URI scenario I would also suggest doing a multifield mapping by > > default because of the way that data is useful (not sure which analyser > to > > use though - maybe write or find a good URI analyzer?). Since timestamp > is > > a required field for all messages (I'm pretty sure?) I'm ok with > timestamp > > and field value used as the UID, but would prefer something better. > > > > Jon > > > > On Wed, Nov 2, 2016, 20:33 James Sirota <jsir...@apache.org> wrote: > > > > Jon, > > > > For METRON-517 would it suffice to have a stellar statement to take a URI > > string and truncate it to length of 32766 in the ES writer? But still > > write the actual string to HDFS? You can then search against ES on the > > truncated portion, but retrieve the actual timestamp from HDFS. It's > easy > > to do because you know the timestamp from the original message. So you > > know which logs in HDFS to search through to find the data. > > > > 02.11.2016, 14:12, "zeo...@gmail.com" <zeo...@gmail.com>: > > > I personally would like to see the following things done before things > > > leave BETA: > > > (1) Address data integrity concerns (Specifically thinking of > METRON-370, > > > METRON-517) > > > (2) Make cluster tuning easier and more consistent (METRON-485, > > METRON-470, > > > and the "[DISCUSS] moving parsers back to flux" which I can't find a > JIRA > > > for). > > > > > > I would also want to see the upgrade path (as opposed to rebuild) be > more > > > thoroughly and regularly tested once things leave BETA. From my > > > perspective I think the project is very close but not yet ready. > > > > > > Jon > > > > > > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella <ceste...@gmail.com> > wrote: > > > > > > Hello Everyone, > > > > > > Now that the discussion around the next release has started, it has > been > > > proposed and I think it's a good time to discuss what to name this next > > > release. Before, we have adopted the BETA suffix. I think it might be > > > time to drop it and call the next release 0.2.2 > > > > > > Thoughts? > > > > > > Best, > > > > > > Casey > > > > > > -- > > > > > > Jon > > > > ------------------- > > Thank you, > > > > James Sirota > > PPMC- Apache Metron (Incubating) > > jsirota AT apache DOT org > > > > -- > > > > Jon > > > -- > > Jon >