klockla commented on PR #1728: URL: https://github.com/apache/stormcrawler/pull/1728#issuecomment-3546305258
> Hi @klockla, > > Thanks for the PR and for proposing an abstraction for PII removal during crawls. I’ve added a few comments. > > I also have a couple of questions: > > What was the reasoning behind choosing a bolt instead of a (parse) filter for the redaction step? > > Since this is a larger contribution, we’ll likely need an [ICLA](https://www.apache.org/licenses/contributor-agreements.html) before we can accept it. Hi @rzo1 I didn't really think about implementing it as a parse filter but as the process (text analysis by the NLP engine in Presidio) is quite consuming, I think it may be better to have this in a separate bolt to have better measures about tuple processing time/latency. I will fix the points related to your other comments and will need to have a look at this ICLA thing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
