Re: Normalization topology or separate normalization bolt for parsing topology

Casey Stella Wed, 26 Apr 2017 06:37:46 -0700

Ok, that's another story.  hmmmm, we don't generally pre-parse becuase we
try to not assume any particular format there (i.e. it could be strings,
could be byte arrays).  Maybe the right answer is to pass the raw,
non-normalized data (best effort tyep of thing) through the parser and do
the normalization post-parse..or is there a problem with that?


On Wed, Apr 26, 2017 at 9:33 AM, Ali Nazemian <[email protected]> wrote:

> Hi Casey,
>
> It is actually pre-parse process, not a post-parse one. These type of
> noises affect the position of an attribute for example and give us parsing
> exception. The timestamp example was not a good one because that is
> actually a post-parse exception.
>
> On Wed, Apr 26, 2017 at 11:28 PM, Casey Stella <[email protected]> wrote:
>
> > So, further transformation post-parse was one of the motivating reasons
> for
> > Stellar (to do that transformation post-parse).  Is there a capability
> that
> > it's lacking that we can add to fit your usecase?
> >
> > On Wed, Apr 26, 2017 at 9:24 AM, Ali Nazemian <[email protected]>
> > wrote:
> >
> > > I've created a Jira ticket regarding this feature.
> > >
> > > https://issues.apache.org/jira/browse/METRON-893
> > >
> > >
> > > On Wed, Apr 26, 2017 at 11:11 PM, Ali Nazemian <[email protected]>
> > > wrote:
> > >
> > > > Currently, we are using normal regex at the Java source code to
> handle
> > > > those situations. However, it would be nice to have a separate bolt
> and
> > > > deal with them separately. Yeah, I can create a Jira issue regarding
> > > that.
> > > > The main reason I am asking for such a feature is the fact that lack
> of
> > > > such a feature makes the process of creating some parser for the
> > > community
> > > > a little painful for us. We need to maintain two different versions,
> > one
> > > > for community another for the internal use case. Clearly, noise is an
> > > > inevitable part of real world use cases.
> > > >
> > > > Cheers,
> > > > Ali
> > > >
> > > > On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Are you doing this cleansing all in the parser or are you using any
> > > >> Stellar to do it?
> > > >> Can you create a jira?
> > > >>
> > > >>
> > > >>
> > > >> On April 26, 2017 at 08:59:16, Ali Nazemian ([email protected])
> > > >> wrote:
> > > >>
> > > >> Hi all,
> > > >>
> > > >>
> > > >> We are facing certain use cases in Metron production that happen to
> be
> > > >> related to noisy stream. For example, a wrong timestamp, duplicate
> > > >> hostname/IP address, etc. To deal with the normalization we have
> added
> > > an
> > > >> additional step for the corresponding parsers to do the data
> cleaning.
> > > >> Clearly, parsing is a standard factor which is mostly related to the
> > > >> device
> > > >> that is generating the data and can be used for the same type of
> > device
> > > >> everywhere, but normalization is very production dependent and there
> > is
> > > >> no
> > > >> point of mixing normalization with parsing. It would be nice to
> have a
> > > >> sperate bolt in a parsing topologies to dedicate to production
> > > >> related cleaning process. In that case, eveybody can easily
> contribute
> > > to
> > > >> Metron community with additional parsers without being worried about
> > > >> mixing
> > > >> parsers and data cleaning process.
> > > >>
> > > >>
> > > >> Regards,
> > > >>
> > > >> Ali
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
>
>
>
> --
> A.Nazemian
>

Re: Normalization topology or separate normalization bolt for parsing topology

Reply via email to