[ 
https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660968#comment-13660968
 ] 

Todd Grayson commented on HDFS-4829:
------------------------------------

In further testing, this is being seen in any data set being looked at with 
tail.  It looks to be handling of escaping character sequences within the data 
being returned?
                
> Strange loss of data displayed in hadoop fs -tail command
> ---------------------------------------------------------
>
>                 Key: HDFS-4829
>                 URL: https://issues.apache.org/jira/browse/HDFS-4829
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.0.0-alpha
>         Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM 
> running under windows 7)
> Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
>            Reporter: Todd Grayson
>            Priority: Minor
>
> Strange behavior of the hadoop fs -tail command - its default for output 
> seems to be 9 lines of output vs 10 lines of output in the OS version of the 
> command (minor issue).  The strange thing (bug behavior?) appears to drop the 
> initial octect from an IP address when examining a file over HDFS.  
> [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
> .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> *When looking at the original log data outside of HDFS with the os version of 
> the tail command we see the following*
> [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
> [training@localhost hands-on]$ tail access_log 
> 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET 
> /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> When using non ip data seperated by periods, it gets even worse and even more 
> data is masked? (same data subtituting names for IP octects).  Note we loose 
> the first line well into the URI string? *
> [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
> s/javascript_combined.js HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> * and verifying what we are looking at in normal tail matches - note the 
> first line is not represented in the hadoop fs -tail as its only grabbing 9 
> lines instead of 10... as I mentioned before. Align the two text based 
> examples along the javascript_combined line. *
> [training@localhost hands-on]$ tail test_log
> larry.billy.will.amy - - [03/Dec/2011:13:28:06 -0800] "GET 
> /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to