[ 
https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538557
 ] 

Chris Douglas commented on HADOOP-2113:
---------------------------------------

I think I've explained this command poorly. It attempts to render whatever 
exists at a given path as human-readable text. Right now, it includes 
SequenceFile and gzip formats; it's not trying to stuff a framework for 
computation on SequenceFiles into FsShell. I agree that such a toolchain should 
be independent, but this aspires to something else.

While we're on the subject though, I'm not sure I fully understand the 
motivation for this command-line tool. Aren't each of those commands easily 
implemented in map/reduce? As I see it, there are two ways to generalize the 
operations Enis suggests, since all of WritableComparable is fair game. Either 
a) everything is first converted to a string or b) the framework can understand 
that a user-specified InputFormat creating a RecordReader creating a keytype 
comparable to IntWritable should select a comparator for its keys such that the 
user-supplied "70" is greater than "9", (unless the user actually intends a 
lexiographic ordering). Not to reveal my opinion. ;)

In the latter case, code like this belongs in mapred, since merely working out 
the types is going to be either a hack or a significant effort. In the former 
case, for more than a single SequenceFile, such code still seems to belong in 
mapred; that said, piping the output of "text"- as implemented- through a 
general text-processing utility is a reasonable hack for some purposes. For my 
purposes, I only needed to check the first few records for some of the output, 
and this suffices. I don't know why a comparable utility like HADOOP-175 never 
got committed (it would be a good base, though 1) it relies on UTF8 keys which 
are currently deprecated and 2) it solves some problems outside the limited 
domain of this issue), but that no similar utility has been written for the 
last year makes me wary of over-complicating this. It's for human-readability, 
not processing.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to