[ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554971 ]
Edward Yoon commented on HADOOP-2480: ------------------------------------- OK, i see. And, i'd like to separate the hbase shell build.xml from hbase build.xml. > [Hbase Shell] Log Analysis Examples > ----------------------------------- > > Key: HADOOP-2480 > URL: https://issues.apache.org/jira/browse/HADOOP-2480 > Project: Hadoop > Issue Type: New Feature > Components: contrib/hbase > Affects Versions: 0.16.0 > Environment: All > Reporter: Edward Yoon > Assignee: Edward Yoon > Priority: Trivial > Fix For: 0.16.0 > > Attachments: v01.patch, v02.patch > > > I made an apache log fetcher, log analyzer, social network analyzer using > map/reduce on hbase table for large scale . > - 5 Terra Bytes Logs will be used. You can see at here : > http://shell.hadoop.co.kr/PHPClient.php > *Access_log Entry* > ||Example Data Element||Description|| > |208.177.157.164|IP address of the client requesting the web page| > |-|Identity of the client; typically blank for modern browsers, which hide > this information| > |-|User name with which the client was authenticated; typically always blank > unless authentication is required to access the page| > |[15/Aug/2004:10:59:38 -0800] |Time the request was made| > |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. > Typically in the form of method (GET in this example), resource (the URL > requested), and protocol (HTTP/1.1 in this example)| > |200|Status code for the request. 200 means it was successfully handled| > |-|Number of bytes transferred to the client in response to this request| > |"-"|The URL of the referrer; that is, the URL of the page (or element within > the page) from which the request URL was obtained| > |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of > the client making the request| > *Table schema* > * url family is a historical page-move vector of client. > * row by url is a user by document matrix. > ** cell can be a numeric value of document visit frequency or a incoming > value from specified web. > * ... etc. > {code} > ip <row> http url > ------------------------------------------------------------------- > ip http:agent <agent> url:URL <referrer> > http:protocol <protocol> ... > http:method <method> > http:code <response code> > http:bytesize <bytesize> > {code} > *Log models and Applications* > * Next Page Recommendation > * Page Network Analysis -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.