[
https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538557
]
Chris Douglas commented on HADOOP-2113:
---------------------------------------
I think I've explained this command poorly. It attempts to render whatever
exists at a given path as human-readable text. Right now, it includes
SequenceFile and gzip formats; it's not trying to stuff a framework for
computation on SequenceFiles into FsShell. I agree that such a toolchain should
be independent, but this aspires to something else.
While we're on the subject though, I'm not sure I fully understand the
motivation for this command-line tool. Aren't each of those commands easily
implemented in map/reduce? As I see it, there are two ways to generalize the
operations Enis suggests, since all of WritableComparable is fair game. Either
a) everything is first converted to a string or b) the framework can understand
that a user-specified InputFormat creating a RecordReader creating a keytype
comparable to IntWritable should select a comparator for its keys such that the
user-supplied "70" is greater than "9", (unless the user actually intends a
lexiographic ordering). Not to reveal my opinion. ;)
In the latter case, code like this belongs in mapred, since merely working out
the types is going to be either a hack or a significant effort. In the former
case, for more than a single SequenceFile, such code still seems to belong in
mapred; that said, piping the output of "text"- as implemented- through a
general text-processing utility is a reasonable hack for some purposes. For my
purposes, I only needed to check the first few records for some of the output,
and this suffices. I don't know why a comparable utility like HADOOP-175 never
got committed (it would be a good base, though 1) it relies on UTF8 keys which
are currently deprecated and 2) it solves some problems outside the limited
domain of this issue), but that no similar utility has been written for the
last year makes me wary of over-complicating this. It's for human-readability,
not processing.
> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
> Key: HADOOP-2113
> URL: https://issues.apache.org/jira/browse/HADOOP-2113
> Project: Hadoop
> Issue Type: Improvement
> Components: fs
> Reporter: Chris Douglas
> Assignee: Chris Douglas
> Priority: Minor
> Fix For: 0.16.0
>
> Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.