GitHub user chenlica closed a discussion: Sample Zika Extraction (from old wiki)
>From the page https://github.com/apache/texera/wiki/Sample-Zika-Extraction >(may be dangling) ==== For all the operators, leave limit and offset empty 1. create KeywordSource with properties: keyword: zika data source: promed matching type: conjunction (default) attribute: content 2. create Projection attributes: _id, webpage, content 3. connect KeywordSource with Projection 4. create Regex_Person regex: (A|a|(an)|(An)) .{1,40} ((woman)|(man)) attribute: content 5. connect Projection with Regex_Person 6. create NLP_Location type: location attribute: content 7. connect Projection with NLP_Location 8. create Regex_Date regex: (((0?[1-9])|(1[0-2]))(\s|-|.|\/)((0?[1-9])|([12][0-9])|(3[01]))(\s|-|.|\/)([0-9]{4}|[0-9]{2}))|((0?[1-9])|([12][0-9])|(3[01])) ((jan(uary)?)|(feb(ruary)?)|(mar(ch)?)|(apr(il)?)|(may)|(june?)|(july?)|(aug(ust)?)|(sep(tember)?)|(oct(ober)?)|(nov(ember)?)|(dec(ember)?)) attribute: content 9. connect Projection with Regex_Date 10. create Join1 Join attribute: content id attribute: _id (default) PredicateType: CharacterDistance (default) distance: 100 11. connect Regex_Person and NLP_Location with Join1 12. create Join2 (same properties as Join1) 13. Connect Join1 and Regex_Date with Join2 14. Create TupleStreamSink (view results) 15. connect Join2 with TupleStreamSinkFor all the operators, leave limit and offset empty Here's a screenshot of the query plan:  GitHub link: https://github.com/apache/texera/discussions/3984 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
