Ok, that's another story. hmmmm, we don't generally pre-parse becuase we try to not assume any particular format there (i.e. it could be strings, could be byte arrays). Maybe the right answer is to pass the raw, non-normalized data (best effort tyep of thing) through the parser and do the normalization post-parse..or is there a problem with that?
On Wed, Apr 26, 2017 at 9:33 AM, Ali Nazemian <alinazem...@gmail.com> wrote: > Hi Casey, > > It is actually pre-parse process, not a post-parse one. These type of > noises affect the position of an attribute for example and give us parsing > exception. The timestamp example was not a good one because that is > actually a post-parse exception. > > On Wed, Apr 26, 2017 at 11:28 PM, Casey Stella <ceste...@gmail.com> wrote: > > > So, further transformation post-parse was one of the motivating reasons > for > > Stellar (to do that transformation post-parse). Is there a capability > that > > it's lacking that we can add to fit your usecase? > > > > On Wed, Apr 26, 2017 at 9:24 AM, Ali Nazemian <alinazem...@gmail.com> > > wrote: > > > > > I've created a Jira ticket regarding this feature. > > > > > > https://issues.apache.org/jira/browse/METRON-893 > > > > > > > > > On Wed, Apr 26, 2017 at 11:11 PM, Ali Nazemian <alinazem...@gmail.com> > > > wrote: > > > > > > > Currently, we are using normal regex at the Java source code to > handle > > > > those situations. However, it would be nice to have a separate bolt > and > > > > deal with them separately. Yeah, I can create a Jira issue regarding > > > that. > > > > The main reason I am asking for such a feature is the fact that lack > of > > > > such a feature makes the process of creating some parser for the > > > community > > > > a little painful for us. We need to maintain two different versions, > > one > > > > for community another for the internal use case. Clearly, noise is an > > > > inevitable part of real world use cases. > > > > > > > > Cheers, > > > > Ali > > > > > > > > On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler < > ottobackwa...@gmail.com > > > > > > > wrote: > > > > > > > >> Hi, > > > >> > > > >> Are you doing this cleansing all in the parser or are you using any > > > >> Stellar to do it? > > > >> Can you create a jira? > > > >> > > > >> > > > >> > > > >> On April 26, 2017 at 08:59:16, Ali Nazemian (alinazem...@gmail.com) > > > >> wrote: > > > >> > > > >> Hi all, > > > >> > > > >> > > > >> We are facing certain use cases in Metron production that happen to > be > > > >> related to noisy stream. For example, a wrong timestamp, duplicate > > > >> hostname/IP address, etc. To deal with the normalization we have > added > > > an > > > >> additional step for the corresponding parsers to do the data > cleaning. > > > >> Clearly, parsing is a standard factor which is mostly related to the > > > >> device > > > >> that is generating the data and can be used for the same type of > > device > > > >> everywhere, but normalization is very production dependent and there > > is > > > >> no > > > >> point of mixing normalization with parsing. It would be nice to > have a > > > >> sperate bolt in a parsing topologies to dedicate to production > > > >> related cleaning process. In that case, eveybody can easily > contribute > > > to > > > >> Metron community with additional parsers without being worried about > > > >> mixing > > > >> parsers and data cleaning process. > > > >> > > > >> > > > >> Regards, > > > >> > > > >> Ali > > > >> > > > >> > > > > > > > > > > > > -- > > > > A.Nazemian > > > > > > > > > > > > > > > > -- > > > A.Nazemian > > > > > > > > > -- > A.Nazemian >