[ https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660968#comment-13660968 ]
Todd Grayson commented on HDFS-4829: ------------------------------------ In further testing, this is being seen in any data set being looked at with tail. It looks to be handling of escaping character sequences within the data being returned? > Strange loss of data displayed in hadoop fs -tail command > --------------------------------------------------------- > > Key: HDFS-4829 > URL: https://issues.apache.org/jira/browse/HDFS-4829 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.0.0-alpha > Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM > running under windows 7) > Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2 > Reporter: Todd Grayson > Priority: Minor > > Strange behavior of the hadoop fs -tail command - its default for output > seems to be 9 lines of output vs 10 lines of output in the OS version of the > command (minor issue). The strange thing (bug behavior?) appears to drop the > initial octect from an IP address when examining a file over HDFS. > [training@localhost hands-on]$ hadoop fs -tail weblog/access_log > .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET > /assets/js/javascript_combined.js HTTP/1.1" 200 20404 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET > /assets/img/home-logo.png HTTP/1.1" 200 3892 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET > /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET > /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET > /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET > /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET > /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET > /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379 > 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET > /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 > *When looking at the original log data outside of HDFS with the os version of > the tail command we see the following* > [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./ > [training@localhost hands-on]$ tail access_log > 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET > /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET > /assets/js/javascript_combined.js HTTP/1.1" 200 20404 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET > /assets/img/home-logo.png HTTP/1.1" 200 3892 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET > /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET > /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET > /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET > /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET > /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET > /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379 > 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET > /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 > When using non ip data seperated by periods, it gets even worse and even more > data is masked? (same data subtituting names for IP octects). Note we loose > the first line well into the URI string? * > [training@localhost hands-on]$ hadoop fs -tail weblog/test_log > s/javascript_combined.js HTTP/1.1" 200 20404 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET > /assets/img/home-logo.png HTTP/1.1" 200 3892 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET > /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET > /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET > /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET > /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET > /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET > /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379 > larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET > /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 > * and verifying what we are looking at in normal tail matches - note the > first line is not represented in the hadoop fs -tail as its only grabbing 9 > lines instead of 10... as I mentioned before. Align the two text based > examples along the javascript_combined line. * > [training@localhost hands-on]$ tail test_log > larry.billy.will.amy - - [03/Dec/2011:13:28:06 -0800] "GET > /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > larry.billy.will.amy - - [03/Dec/2011:13:28:08 -0800] "GET > /assets/js/javascript_combined.js HTTP/1.1" 200 20404 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET > /assets/img/home-logo.png HTTP/1.1" 200 3892 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET > /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET > /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET > /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET > /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET > /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET > /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379 > larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET > /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira