What if you could implement your cleaning in Stellar functions, which would be in libraries that were loaded as plugins and available to all your parsers?
my_field = ALI_CLEANMYFIELD(my_field) The idea would be: * Metron has an archetype for creating stellar ‘libraries’ * You write your stellar functions and the unit/integration tests for them, and maintain that project outside the metron tree ( as hopefully you will be able to do soon with parsers -METRON-777, METRON-258 ) * You use the metron management UI to install your stellar libraries * You call your stellar functions from your parser configuration On April 26, 2017 at 09:11:25, Ali Nazemian ([email protected]) wrote: Currently, we are using normal regex at the Java source code to handle those situations. However, it would be nice to have a separate bolt and deal with them separately. Yeah, I can create a Jira issue regarding that. The main reason I am asking for such a feature is the fact that lack of such a feature makes the process of creating some parser for the community a little painful for us. We need to maintain two different versions, one for community another for the internal use case. Clearly, noise is an inevitable part of real world use cases. Cheers, Ali On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler <[email protected]> wrote: > Hi, > > Are you doing this cleansing all in the parser or are you using any > Stellar to do it? > Can you create a jira? > > > > On April 26, 2017 at 08:59:16, Ali Nazemian ([email protected]) wrote: > > Hi all, > > > We are facing certain use cases in Metron production that happen to be > related to noisy stream. For example, a wrong timestamp, duplicate > hostname/IP address, etc. To deal with the normalization we have added an > additional step for the corresponding parsers to do the data cleaning. > Clearly, parsing is a standard factor which is mostly related to the device > that is generating the data and can be used for the same type of device > everywhere, but normalization is very production dependent and there is no > point of mixing normalization with parsing. It would be nice to have a > sperate bolt in a parsing topologies to dedicate to production > related cleaning process. In that case, eveybody can easily contribute to > Metron community with additional parsers without being worried about mixing > parsers and data cleaning process. > > > Regards, > > Ali > > -- A.Nazemian
