On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley <kwi...@keithwiley.com> wrote: > On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote: > >> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <kwi...@keithwiley.com> >> wrote: >>> >>> Is there a particularly good reason for why the "hadoop fs" command >>> supports >>> -cat and -tail, but not -head? >>> >> >> Tail is needed to be done efficiently but head you can just do >> yourself. Most people probably use >> >> hadoop dfs -cat file | head -5. > > > I disagree with your use of the word "efficiently". :-) To my > understanding (and perhaps that's the source of my error), the approach you > suggested reads the entire file over the net from the cluster to your client > machine. That file could conceivably be of HDFS scales (100s of GBs, even > TBs wouldn't be uncommon). > > What do you think? Am I wrong in my interpretation of how > hadoopCat-pipe-head would work? > > Cheers! > > ________________________________________________________________________________ > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "And what if we picked the wrong religion? Every week, we're just making > God > madder and madder!" > -- Homer Simpson > ________________________________________________________________________________ > >
'hadoop dfs -cat' will output the file as it is read. head -5 will kill the first half of the pipe after 5 lines. With buffering more might be physically read then 5 lines but this invocation does not read the enter HDFS file before piping it to head.