So, having stellar operate on the whole message is definitely something that would be cool. That being said, it's also nice to motivate the construction of functions to do simple transformations/normalizations. That way, common useful capabilities may be reused all the places Stellar is used (which is all over the place at this point).
If we had some example normalizations, we might be able to address the gaps and it'd be a win-win. :) On Wed, Apr 26, 2017 at 9:28 AM, Otto Fowler <[email protected]> wrote: > What if you could implement your cleaning in Stellar functions, which would > be in libraries that were loaded as plugins and available to all your > parsers? > > my_field = ALI_CLEANMYFIELD(my_field) > > The idea would be: > > * Metron has an archetype for creating stellar ‘libraries’ > * You write your stellar functions and the unit/integration tests for them, > and maintain that project outside the metron tree ( as hopefully you will > be able to do soon with parsers -METRON-777, METRON-258 ) > * You use the metron management UI to install your stellar libraries > * You call your stellar functions from your parser configuration > > > > On April 26, 2017 at 09:11:25, Ali Nazemian ([email protected]) wrote: > > Currently, we are using normal regex at the Java source code to handle > those situations. However, it would be nice to have a separate bolt and > deal with them separately. Yeah, I can create a Jira issue regarding that. > The main reason I am asking for such a feature is the fact that lack of > such a feature makes the process of creating some parser for the community > a little painful for us. We need to maintain two different versions, one > for community another for the internal use case. Clearly, noise is an > inevitable part of real world use cases. > > Cheers, > Ali > > On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler <[email protected]> > wrote: > > > Hi, > > > > Are you doing this cleansing all in the parser or are you using any > > Stellar to do it? > > Can you create a jira? > > > > > > > > On April 26, 2017 at 08:59:16, Ali Nazemian ([email protected]) > wrote: > > > > Hi all, > > > > > > We are facing certain use cases in Metron production that happen to be > > related to noisy stream. For example, a wrong timestamp, duplicate > > hostname/IP address, etc. To deal with the normalization we have added an > > additional step for the corresponding parsers to do the data cleaning. > > Clearly, parsing is a standard factor which is mostly related to the > device > > that is generating the data and can be used for the same type of device > > everywhere, but normalization is very production dependent and there is > no > > point of mixing normalization with parsing. It would be nice to have a > > sperate bolt in a parsing topologies to dedicate to production > > related cleaning process. In that case, eveybody can easily contribute to > > Metron community with additional parsers without being worried about > mixing > > parsers and data cleaning process. > > > > > > Regards, > > > > Ali > > > > > > > -- > A.Nazemian >
