Re: hd fs -head?

2010-09-27 Thread Edward Capriolo
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com wrote:
 Is there a particularly good reason for why the hadoop fs command supports
 -cat and -tail, but not -head?

 
 Keith Wiley     kwi...@keithwiley.com     keithwiley.com
  music.keithwiley.com

 I do not feel obliged to believe that the same God who has endowed us with
 sense, reason, and intellect has intended us to forgo their use.
                                           --  Galileo Galilei
 



Tail is needed to be done efficiently but head you can just do
yourself. Most people probably use

hadoop dfs -cat file | head -5.


Re: hd fs -head?

2010-09-27 Thread Keith Wiley

On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:

On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com  
wrote:
Is there a particularly good reason for why the hadoop fs command  
supports

-cat and -tail, but not -head?



Tail is needed to be done efficiently but head you can just do
yourself. Most people probably use

hadoop dfs -cat file | head -5.



I disagree with your use of the word efficiently.  :-)  To my  
understanding (and perhaps that's the source of my error), the  
approach you suggested reads the entire file over the net from the  
cluster to your client machine.  That file could conceivably be of  
HDFS scales (100s of GBs, even TBs wouldn't be uncommon).


What do you think?  Am I wrong in my interpretation of how hadoopCat- 
pipe-head would work?


Cheers!


Keith Wiley kwi...@keithwiley.com keithwiley.com 
music.keithwiley.com


And what if we picked the wrong religion?  Every week, we're just  
making God

madder and madder!
   --  Homer Simpson




Re: hd fs -head?

2010-09-27 Thread Edward Capriolo
On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley kwi...@keithwiley.com wrote:
 On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:

 On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com
 wrote:

 Is there a particularly good reason for why the hadoop fs command
 supports
 -cat and -tail, but not -head?


 Tail is needed to be done efficiently but head you can just do
 yourself. Most people probably use

 hadoop dfs -cat file | head -5.


 I disagree with your use of the word efficiently.  :-)  To my
 understanding (and perhaps that's the source of my error), the approach you
 suggested reads the entire file over the net from the cluster to your client
 machine.  That file could conceivably be of HDFS scales (100s of GBs, even
 TBs wouldn't be uncommon).

 What do you think?  Am I wrong in my interpretation of how
 hadoopCat-pipe-head would work?

 Cheers!

 
 Keith Wiley     kwi...@keithwiley.com     keithwiley.com
  music.keithwiley.com

 And what if we picked the wrong religion?  Every week, we're just making
 God
 madder and madder!
                                           --  Homer Simpson
 



'hadoop dfs -cat' will output the file as it is read. head -5 will
kill the first half of the pipe after 5 lines. With buffering more
might be physically read then 5 lines but this invocation does not
read the enter HDFS file before piping it to head.


Re: hd fs -head?

2010-09-27 Thread Keith Wiley
On Sep 27, 2010, at 13:46 , Edward Capriolo wrote:

 On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley kwi...@keithwiley.com wrote:
 On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
 
 On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com
 wrote:
 
 Is there a particularly good reason for why the hadoop fs command
 supports
 -cat and -tail, but not -head?
 
 
 Tail is needed to be done efficiently but head you can just do
 yourself. Most people probably use
 
 hadoop dfs -cat file | head -5.
 
 
 I disagree with your use of the word efficiently.  :-)  To my
 understanding (and perhaps that's the source of my error), the approach you
 suggested reads the entire file over the net from the cluster to your client
 machine.  That file could conceivably be of HDFS scales (100s of GBs, even
 TBs wouldn't be uncommon).
 
 What do you think?  Am I wrong in my interpretation of how
 hadoopCat-pipe-head would work?
 
 'hadoop dfs -cat' will output the file as it is read. head -5 will
 kill the first half of the pipe after 5 lines. With buffering more
 might be physically read then 5 lines but this invocation does not
 read the enter HDFS file before piping it to head.


Excellent.  Thank you.


Keith Wiley   kwi...@keithwiley.com   www.keithwiley.com

I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me.
  -- Abe (Grandpa) Simpson