On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley <kwi...@keithwiley.com> wrote:
> On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
>
>> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <kwi...@keithwiley.com>
>> wrote:
>>>
>>> Is there a particularly good reason for why the "hadoop fs" command
>>> supports
>>> -cat and -tail, but not -head?
>>>
>>
>> Tail is needed to be done efficiently but head you can just do
>> yourself. Most people probably use
>>
>> hadoop dfs -cat file | head -5.
>
>
> I disagree with your use of the word "efficiently".  :-)  To my
> understanding (and perhaps that's the source of my error), the approach you
> suggested reads the entire file over the net from the cluster to your client
> machine.  That file could conceivably be of HDFS scales (100s of GBs, even
> TBs wouldn't be uncommon).
>
> What do you think?  Am I wrong in my interpretation of how
> hadoopCat-pipe-head would work?
>
> Cheers!
>
> ________________________________________________________________________________
> Keith Wiley     kwi...@keithwiley.com     keithwiley.com
>  music.keithwiley.com
>
> "And what if we picked the wrong religion?  Every week, we're just making
> God
> madder and madder!"
>                                           --  Homer Simpson
> ________________________________________________________________________________
>
>

'hadoop dfs -cat' will output the file as it is read. head -5 will
kill the first half of the pipe after 5 lines. With buffering more
might be physically read then 5 lines but this invocation does not
read the enter HDFS file before piping it to head.

Reply via email to