Re: CDH2 or Apache Hadoop

2010-02-23 Thread Dali Kilani
I've had a great experience with CDH2 on various platforms (Ubuntu,
OpenSolaris). Worked as advertised.
My 2 cents.

On Tue, Feb 23, 2010 at 3:13 PM, Ananth Sarathy
ananth.t.sara...@gmail.comwrote:

 Just wanted to get the groups general feelings on what the preferred distro
 is and why? Obviously assuming one didn't have a service agreement with
 cloudera.


 Ananth T Sarathy




-- 
Dali Kilani
===
Twitter :  @dadicool
Phone :  (650) 492-5921 (Google Voice)
E-Fax  :  (775) 552-2982


Re: Multiple disks for DFS

2009-09-14 Thread Dali Kilani
Just specify multiple directories (where the different local partitions are
mounted) for dfs.data.dir (hdfs data) and mapred.local.dir (intermediate
data) in hdfs-site.xml.Data should then be striped across the different
partitions/disks.

See here : http://bit.ly/fbUkr

Dali

On Mon, Sep 14, 2009 at 11:50 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 Thanks for the explanation.

 Any idea if I can re-use this round robin mechanism for local disk writing?

 Or it's DFS only?

 Regards.

 2009/9/14 Jason Venner jason.had...@gmail.com

  When you have multiple partitions specified for hdfs storage, they are
 used
  for block storage in a round robin fashion.
  If a partition has insufficient space it is dropped for the set used for
  storing new blocks.
 
  On Sun, Sep 13, 2009 at 3:01 AM, Stas Oskin stas.os...@gmail.com
 wrote:
 
   Hi.
  
   When I specify multiple disks for DFS, does Hadoop distributes the
   concurrent writings over the multiple disks?
  
   I mean, to prevent an utilization of a single disk?
  
   Thanks for any info on subject.
  
 
 
 
  --
  Pro Hadoop, a book to guide you from beginner to hadoop mastery,
  http://www.amazon.com/dp/1430219424?tag=jewlerymall
  www.prohadoopbook.com a community for Hadoop Professionals
 




-- 
Dali Kilani
===
Phone :  (650) 492-5921 (Google Voice)
E-Fax  :  (775) 552-2982


Re: Ubuntu/Hadoop incompatibilities?

2009-08-17 Thread Dali Kilani
Can you double check that your data node doesn't have the same /etc/hosts
issue mentioned above in the thread? (i.e. machine name resolves to
127.0.0.1)
Dali
On Mon, Aug 17, 2009 at 1:33 PM, CubicDesign cubicdes...@gmail.com wrote:

 Thank you all for your answers.

 My problem with Hadoop on Ubuntu is that I cannot make the DataNode server
 to work properly (at least this is where I think the error is). I get an
 File jobtracker.info could only be replicated to 0 nodes instead of 1
 error message. All other servers are running fine. I am running Hadoop in a
 single (test) machine.


 The results for jps and netstats are:


 jps
 4465 NameNode
 4553 DataNode
 5105 Jps
 4717 JobTracker
 4649 SecondaryNameNode
 4807 TaskTracker


   sudo netstat -plten | grep java
 tcp0  0 0.0.0.0:50722   0.0.0.0:*   LISTEN
  1000   13858   4553/java  tcp0  0
 0.0.0.0:50020   0.0.0.0:*   LISTEN  1000
 15130   4553/java  tcp0  0 127.0.0.1:54310
 0.0.0.0:*   LISTEN  1000   13564   4465/java
  tcp0  0 127.0.0.1:54311 0.0.0.0:*
 LISTEN  1000   14571   4717/java  tcp0  0
 0.0.0.0:59080   0.0.0.0:*   LISTEN  1000
 14547   4717/java  tcp0  0 0.0.0.0:50090
 0.0.0.0:*   LISTEN  1000   14943   4649/java
  tcp0  0 127.0.0.1:40555 0.0.0.0:*
 LISTEN  1000   15057   4807/java  tcp0  0
 0.0.0.0:50060   0.0.0.0:*   LISTEN  1000
 15031   4807/java  tcp0  0 0.0.0.0:47661
 0.0.0.0:*   LISTEN  1000   14247   4649/java
  tcp0  0 0.0.0.0:50030   0.0.0.0:*
 LISTEN  1000   14941   4717/java  tcp0  0
 0.0.0.0:57839   0.0.0.0:*   LISTEN  1000
 13514   4465/java  tcp0  0 0.0.0.0:50070
 0.0.0.0:*   LISTEN  1000   14533   4465/java
  tcp0  0 0.0.0.0:50010   0.0.0.0:*
 LISTEN  1000   14765   4553/java  tcp0  0
 0.0.0.0:50075   0.0.0.0:*   LISTEN  1000
 14946   4553/java




-- 
Dali Kilani
===
Phone :  (650) 492-5921 (Google Voice)
E-Fax  :  (775) 552-2982


Re: Why is Spilled Records always equal to Map output records

2009-07-13 Thread Dali Kilani
If I am not mistaken (I am new to this stuff), that's because you need to
have a checkpoint from which you can restart the reduce jobs that use those
spilled records in case of a reduce task failure.

Dali
On Mon, Jul 13, 2009 at 6:32 PM, Mu Qiao qiao...@gmail.com wrote:

 Thank you. But why need map outputs to be written to disk at least once? I
 think my io.sort.mb is large enough to do in-memory operations. Could you
 provide me some information about it?

 On Tue, Jul 14, 2009 at 1:27 AM, Owen O'Malley omal...@apache.org wrote:

 
  On Jul 12, 2009, at 3:55 AM, Mu Qiao wrote:
 
   I notice it from the web console after I've tried to run serveral jobs.
  Every one of the jobs has the number of Spilled Records equal to Map
  output
  records, even if there are only 5 map output records
 
 
 
  This is good. The map outputs need to be written to disk at least once.
 So
  if they are equal, things are fitting in memory. If multiple passes are
  needed, you'll see 2x or more spilled records.
 
   In the reduce phase, there are also spilled records which is equal to
  reduce
  input records.
 
 
  This is reasonable, although 0.19 and 0.20 don't need to spill the
 records
  in the reduce at all, if you make the buffer big enough.
 
  -- Owen
 



 --
 Best wishes,
 Qiao Mu




-- 
Dali Kilani
===
Phone :  (650) 492-5921 (Google Voice)
E-Fax  :  (775) 552-2982