[ 
https://issues.apache.org/jira/browse/PIG-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721012#action_12721012
 ] 

Dmitriy V. Ryaboy commented on PIG-855:
---------------------------------------

Jeff, the approach depends on whether you care more about false positives or 
false negatives.

The right way to do this is probably not to write a boolean function, but  
something that returns one of several codes -- known browser, known crawler, 
monitor,  stuff like wget and curl, and "unknown".

IAB has a standard list of bots and spiders 
(http://www.iab.net/sites/login.php), and maintains an industry standard for 
the filters that should be applied before numbers are reported.  

> Filter to determine if a UserAgent string is a bot
> --------------------------------------------------
>
>                 Key: PIG-855
>                 URL: https://issues.apache.org/jira/browse/PIG-855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Priority: Minor
>
> A PiggyBank contrib that would allow one to filter records by whether a 
> UserAgent strings represents a bot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to