Hi, Chirag, Since nutch urlfilter has been converted into plugin, I am going to take on the idea of rule-based filtering as you suggested before, maybe a new urlfilter plugin. Which commercial RETE engine you used? Any open source one?
Thanks, John On Mon, Jan 31, 2005 at 08:03:03PM -0500, Chirag Chaman wrote: > > 3. As rules grow filtering becomes slow -- prior to using Nutch we were > using a commercial RETE rules engine in which we have loaded the REs as > rules. This improved speed immensely. Maybe an overkill for now. Below is a > simpler way to do this. > > Here's what we're planning on building -- is this helpful? How would this > play in with plugins... > > <GROUP> Rule Group Name > <RULE> > <MATCH> RE to match </MATCH> > <ACTION> Discard/Substitution/GoTo </ACTION> > <SUBSTITUTION> Substitution </SUBSTUTION> > <GOTO>RuleGroupToSendProcess</GOTO> > <STOP> 0 or 1 - 0 would mean keep processing more rules <STOP> > </RULE> > </GROUP> > > Here's who this would work. > > -Each file has a "Default" group, under which all rules are kept. > -For more advanced rules, one could send control to another RuleGroup on > match (helpful when you want specific groups of rules for a certain domain, > extension, etc) -- this will cut down the number of rules to look at. > - the Stop exits upon a match or keeps processing more rules in the same > group. > > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of John X > Sent: Monday, January 31, 2005 7:53 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: [Nutch-dev] make URLFilter as plugin > > Hi, All, > > I propose to define plugin extension point for URLFilter, and convert > current RegexURLFilter.java, PrefixURLFilter.java, etc., into plugins. > However there is one requirement, different from other plugin extensions: we > should be able to specify the order by which plugins are loaded and applied. > I have not checked, but I assume, by default, we can always name plugins in > alphabetical order. > Stefan: any better way to do this? > > If no one thinks this is a bad idea, I am going to start work on it right > way. > > John > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool > for open source databases. Create drag-&-drop reports. Save time by over > 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Nutch-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Nutch-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > __________________________________________ http://www.neasys.com - A Good Place to Be Come to visit us today! ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
