Looking to hire a Engineer with experience dealing with large scale data sets and map reduce frameworks

2010-05-07 Thread Sudhir Vallamkondu
Primarily with these skill sets

- 4+ years experience in J2EE design and development and 2+ years experience
in working with frameworks for processing vast amounts of data
- 1+ year building systems to process multi-terabyte data sets using
Hadoop/Hive/HBase, Google's Big Table, GridGain or any other large-scale
MapReduce frameworks.
- Design and develop data gathering, aggregation and analysis framework
components and expertise in the design, creation, management and business
use of extremely large datasets
- Passionate about working with huge data sets and someone who loves to
bring datasets together to answer business questions and drive change
- Experience with test-driven development using a unit testing framework
such as JUnit is a must.
- Design and create reusable ETL (extract transform and load) systems that
are operationally stable, scalable and tested

iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Tasktracker appearing from "nowhere"

2010-06-01 Thread Sudhir Vallamkondu
This is exactly why one would need to maintain a list of authorized nodes.
Here¹s the excerpt from O¹Reily ³Hadoop Definitive Guide² book. The below
cites Datanodes but it applies to TaskTrackers as well.

³It is a potential security risk to allow any machine to connect to the
namenode and act as a datanode, since the machine may gain access to data
that it is not authorized to see. Furthermore, since such a machine is not a
real datanode, it is not under your control, and may stop at any time,
causing potential data loss. This scenario is a risk even inside a firewall,
through misconfiguration, so datanodes (and tasktrackers) should be
explicitly managed on all production clusters. Datanodes that are permitted
to connect to the namenode are specified in a file whose name is specified
by the dfs.hosts property. The file resides on the namenode¹s local
filesystem, and it contains a line for each datanode, specified by network
address (as reported by the datanode---you can see what this is by looking
at the namenode¹s web UI). If you need to specify multiple network addresses
for a datanode, put them on one line, separated by whitespace. Similarly,
tasktrackers that may connect to the jobtracker are specified in a file
whose name is specified by the mapred.hosts property. In most cases, there
is one shared file, referred to as the include file, that both dfs.hosts and
mapred.hosts refer to, since nodes in the cluster run both datanode and
tasktracker daemons. The file (or files) specified by the dfs.hosts and
mapred.hosts properties is different from the slaves file. The former is
used by the namenode and jobtracker to determine which worker nodes may
connect. The slaves file is used by the Hadoop control scripts to perform
cluster-wide operations, such as cluster restarts. It is never used by the
Hadoop daemons.²

iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: error:Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzopCodec

2010-08-04 Thread Sudhir Vallamkondu
Couple of things you can try.

Did you restart the task trackers after installing the lzo libs? You can
check if the lzo lib is in the classpath or not by doing

ps auxw | grep tasktracker | grep lzo | wc –l

The above returns 1 if classpath contains the lzo lib

There is a local LZO indexer that you can run on each tasktracker to check
it works right or not

hadoop jar $HADOOP_HOME/lib/hadoop-lzo-0.4.4.jar
com.hadoop.compression.lzo.LzoIndexer test.lzo


On Wed, Jul 28, 2010 at 7:42 AM, Alex Luya  wrote:

> Hello:
>I got source code from http://github.com/kevinweil/hadoop-lzo,compiled
> them successfully,and then
> 1,copy hadoop-lzo-0.4.4.jar to directory:$HADOOP_HOME/lib of each master
> and
> slave
> 2,Copy all files under directory:../Linux-amd64-64/lib to directory:
> $HADDOOP_HOME/lib/native/Linux-amd64-64 of each master and slave
> 3,and upload a file:test.lzo to HDFS
> 4,then run:hadoop jar $HADOOP_HOME/lib/hadoop-lzo-0.4.4.jar
> com.hadoop.compression.lzo.DistributedLzoIndexer test.lzo to test
>
> got errors:
>
> 


-
> 10/07/20 22:37:37 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
> 10/07/20 22:37:37 INFO lzo.LzoCodec: Successfully loaded & initialized
> native-
> lzo library [hadoop-lzo rev 5c25e0073d3dae9ace4bd9eba72e4dc43650c646]
> ##^_^^_^^_^^_^^_^^_^##
> (I think this says all native library got loaded successfully)
> 
> 10/07/20 22:37:37 INFO lzo.DistributedLzoIndexer: Adding LZO file
> target.lz:o
> to indexing list (no index currently exists)
> ...
> attempt_201007202234_0001_m_00_0, Status : FAILED
> java.lang.IllegalArgumentException: Compression codec
> com.hadoop.compression.lzo.LzopCodec
> not found.
>at
>
> 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(Compressio
nCodecFactory.java:96)
>at
>
> 
org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFac
tory.java:134)
>at
>
> 
com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:4
8)
>at
>
> 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java
:418)
>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.ClassNotFoundException:
> com.hadoop.compression.lzo.LzopCodec
>
>at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>at java.lang.Class.forName0(Native Method)
>at java.lang.Class.forName(Class.java:247)
>at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
>at
>
> 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(Compressio
nCodecFactory.java:89)
>... 6 more
>
> 


--
>
>
> There is a installation instruction in this
> link:http://github.com/kevinweil/hadoop-lzo,it says other configurings are
> needed :
>
> Once the libs are built and installed, you may want to add them to the
> class
> paths and library paths. That is, in hadoop-env.sh, set
>
>   (1)export HADOOP_CLASSPATH=/path/to/your/hadoop-lzo-lib.jar
>
> Question:I have copied hadoop-lzo-0.4.4.jar to $HADOOP_HOME/lib,
> ,should I do set this entry like this again? actually, after I add this:
> export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/hbase-0.20.4.jar:
> $HABSE_HOME/config:$ZOOKEEPER_HOME/zookeeper-3.3.1.jar:$HADOOP_HOME/lib
> /hadoop-lzo-0.4.4.jar,redo 1-4 as above,same problem as before,so:
>  how can I
> get hadoop to load hadoop-lzo-0.4.4.jar?)
>
>
>(2),export JAVA_LIBRARY_PATH=/path/to/hadoop-lzo-native-
> libs:/path/to/standard-hadoop-native-libs
>Note that there seems to be a bug in /path/to/hadoop/bin/hadoop; comment
> out the line
>(3)JAVA_LIBRARY_PATH=''
>
>
> Question:since native library got loaded successfully,aren't these
> operation(2)(3) needed?
>
>
> ---
> I am using hadoop 0.20.2
> core-site.xml
>
> -
> 
>
>fs.default.name
>hdfs://hadoop:8020
>
>
>hadoop.tmp.dir
>/

Re: common-user Digest 23 Aug 2010 21:21:26 -0000 Issue 1518

2010-08-23 Thread Sudhir Vallamkondu
Looking at the codebase it seems to suggest that it ignores a editlog
storage directory if it encounters an error

http://www.google.com/codesearch/p?hl=en#GLh8vwsjDqs/trunk/src/hdfs/org/apac
he/hadoop/hdfs/server/namenode/FSEditLog.java&q=namenode%20editlog&sa=N&cd=2
0&ct=rc

Check lines:
Code in line 334
comment: 387 - 390
comment: 411 - 414
Comment: 433 - 436

The processIOError method is called throughout the code if it encounters an
IOException.  

A fatal error is only thrown if none of the storage directories is
accessible. Lines 394, 420

- Sudhir



On Aug/23/ 2:21 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Michael Segel 
> Date: Mon, 23 Aug 2010 14:05:05 -0500
> To: 
> Subject: RE: what will happen if a backup name node folder becomes
> unaccessible?
> 
> 
> Ok... 
> 
> Now you have me confused.
> Everything we've seen says that writing to both a local disk and to an NFS
> mounted disk would be the best way to prevent a problem.
> 
> Now you and Harsh J say that this could actually be problematic.
> 
> Which is it?
> Is this now a defect that should be addressed, or should we just not use an
> NFS mounted drive?
> 
> Thx
> 
> -Mike
> 
> 
>> Date: Mon, 23 Aug 2010 11:42:59 -0700
>> From: licht_ji...@yahoo.com
>> Subject: Re: what will happen if a backup name node folder becomes
>> unaccessible?
>> To: common-user@hadoop.apache.org
>> 
>> This makes a good argument. Actually, after seeing the previous reply, I
>> kindof convinced that I should go back to "sync" the meta data to a backup
>> location instead of using this feature, which as David mentioned, introduced
>> a 2nd single point of failure to hadoop, which degrades the availability of
>> hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can someone
>> confirm whether a name node will shut down given that a backup folder listed
>> in "dfs.name.dir" becomes unavailable in this version?
>> 
>> Thanks,
>> 
>> Michael
>> 
>> --- On Sun, 8/22/10, David B. Ritch  wrote:
>> 
>> From: David B. Ritch 
>> Subject: Re: what will happen if a backup name node folder becomes
>> unaccessible?
>> To: common-user@hadoop.apache.org
>> Date: Sunday, August 22, 2010, 11:34 PM
>> 
>>  Which version of Hadoop was this?  The folks at Cloudera have assured
>> me that the namenode in CDH2 will continue as long as one of the
>> directories is still writable.
>> 
>> It *does* seem a bit of a waste if an availability feature - the ability
>> to write to multiple directories - actually reduces availability by
>> providing an additional single point of failure.
>> 
>> Thanks!
>> 
>> dbr
>> 
>> On 8/20/2010 5:27 PM, Harsh J wrote:
>>> Whee, lets try it out:
>>> 
>>> Start with both paths available. ... Starts fine.
>>> Store some files. ... Works.
>>> rm -r the second path. ... Ouch.
>>> Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
>>> stuff back yet]
>>> Wait for checkpoint to hit.
>>> And ...
>>> Boom!
>>> 
>>> 2010-08-21 02:42:00,385 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
>>> from 127.0.0.1
>>> 2010-08-21 02:42:00,385 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>>> transactions: 37 Total time for transactions(ms): 6Number of
>>> transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
>>> 307 277
>>> 2010-08-21 02:42:00,439 FATAL
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
>>> storage directories are inaccessible.
>>> 2010-08-21 02:42:00,440 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>>> /
>>> SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
>>> /
>>> 
>>> So yes, as Edward says - never let this happen!
>>> 
>>> On Sat, Aug 21, 2010 at 2:26 AM, jiang licht  wrote:
 Using nfs folder to back up dfs meta information as follows,
 
 
 dfs.name.dir
 /hadoop/dfs/name,/hadoop-backup/dfs/name
 
 
 where /hadoop-backup is on a backup machine and mounted on the master node.
 
 I have a question: if somehow, the backup folder becomes unavailable, will
 it freeze master node? That is, will write operation simply hang up on this
 condition on the master node? Or will master node log the problem and
 continues to work?
 
 Thanks,
 
 Michael
 
 
 
>>> 
>>> 
>> 
>> 
>> 
>> 
>>   


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Control the file splits size

2010-08-24 Thread Sudhir Vallamkondu
Have you tried the JobConf.setOutputKeyComparatorClass(Class)?

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Jo
bConf.html#setOutputKeyComparatorClass%28java.lang.Class%29


On Aug/24/ 5:08 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Teodor Macicas 
> Date: Mon, 23 Aug 2010 23:23:58 +0200
> To: 
> Subject: Re: Control the file splits size
> 
> Thanks guys for your replies.
> I seemed that my problem wasn't this. Using computeSplitSize() by
> overwriting the variable size forcing to be a multiple of my object size
> worked.
> 
> But now I have another question:. How can I handle the comparators used
> by the sorting algorithms ? I mean the sorting of the keys before a
> reducer starts. Since I have objects I want a custom comparator to
> distingush them.
> 
> Best,
> Teodor
> 
> On 08/23/2010 09:32 PM, Harsh J wrote:
>> > Ah yes I overlooked that part, sorry. I haven't tried out custom
>> > splits yet, so can't comment further on what may be going down.
>> >
>> > On Tue, Aug 24, 2010 at 12:44 AM, Michael Segel
>> >   wrote:
>> >
>>> >>
>>> >> Uhm...
>>> >>
>>> >> There may be more to the initial question.
>>> >>
>>> >> The OP indicated that this was a 'binary file' and that the records may
>>> not be based on an end-of-line.
>>> >> So he may want to look at how to handle different types of input too.
>>> >>
>>> >>
>>> >>  
 >>> From: qwertyman...@gmail.com
 >>> Date: Mon, 23 Aug 2010 18:39:48 +0530
 >>> Subject: Re: Control the file splits size
 >>> To: common-user@hadoop.apache.org
 >>>
 >>> 
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/F
 ileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,
 >>> org.apache.hadoop.fs.Path)
 >>>
 >>> The isSplitable is the method you're looking for -- return false for
 >>> this in your custom input format (derived from FIF or etc.).
 >>>
 >>> On Mon, Aug 23, 2010 at 4:08 PM, Teodor Macicas
 wrote:
 >>>
>  Hi all,
> 
>  Can anyone please tell me how to control the splits size ? I have one
big
>  file which will be splitted by the number of maps. The input file is
> binary
>  and contains some objects. I do not want to split an object into 2
> separate
>  files, for sure.
>  I overwrite the computeSplitSize() file and I forced the size to be a
>  multiple of my objects size. It worked, but it seems that on certain
> points
>  of the output file objects are missing. And now I am thinking that
this
>  could be my problem.
>  
>> > Your output file is a result of MR if am correct? Can you verify at
>> > the input of your mapper if your objects are being read properly based
>> > on the split you've computed for it?
>> >
>  Have anyone faced this problem before ?
> 
>  Thank you.
>  Regards,
>  Teodor
> 
>  
 >>>
 >>>
 >>> --
 >>> Harsh J
 >>> www.harshj.com
 >>>
>>> >>  
>> >
>> >
>> >


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: what will happen if a backup name node folder becomes unaccessible?

2010-08-24 Thread Sudhir Vallamkondu
darynamenode, logging to
> /home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode
> -hadoop.out
> 
> starting jobtracker, logging to
> /home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-hadoop
> .out
> 
> 
> localhost: starting tasktracker, logging to
> /home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-hadoo
> p.out
> *hadoop 11:39:30 ~/.hadoop $* hadoop dfs -ls
> Found 6 items
> 
> -rw-r--r--   1 hadoop supergroup 411536 2010-08-18 15:50
> /user/hadoop/data
> drwxr-xr-x   - hadoop supergroup  0 2010-08-18 16:02
> /user/hadoop/dataout
> -rw-r--r--   1 hadoop supergroup497 2010-08-24 11:14
> /user/hadoop/profile1
> -rw-r--r--   1 hadoop supergroup497 2010-08-24 11:14
> /user/hadoop/profile2
> -rw-r--r--   1 hadoop supergroup497 2010-08-24 11:14
> /user/hadoop/profile3
> -rw-r--r--   1 hadoop supergroup497 2010-08-24 11:14
> /user/hadoop/profile4
> 
> 
> 
> On Tue, Aug 24, 2010 at 10:49 AM, Sudhir Vallamkondu <
> sudhir.vallamko...@icrossing.com> wrote:
>> > Looking at the codebase it seems to suggest that it ignores a editlog
>> > storage directory if it encounters an error
>> >
>> >
> http://www.google.com/codesearch/p?hl=en#GLh8vwsjDqs/trunk/src/hdfs/org/apac
>> >
> he/hadoop/hdfs/server/namenode/FSEditLog.java&q=namenode%20editlog&sa=N&cd=2
>> > 0&ct=rc
>> >
>> > Check lines:
>> > Code in line 334
>> > comment: 387 - 390
>> > comment: 411 - 414
>> > Comment: 433 - 436
>> >
>> > The processIOError method is called throughout the code if it encounters
> an
>> > IOException.
>> >
>> > A fatal error is only thrown if none of the storage directories is
>> > accessible. Lines 394, 420
>> >
>> > - Sudhir
>> >
>> >
>> >
>> > On Aug/23/ 2:21 PM, "common-user-digest-h...@hadoop.apache.org"
>> >  wrote:
>> >
>>> >> From: Michael Segel 
>>> >> Date: Mon, 23 Aug 2010 14:05:05 -0500
>>> >> To: 
>>> >> Subject: RE: what will happen if a backup name node folder becomes
>>> >> unaccessible?
>>> >>
>>> >>
>>> >> Ok...
>>> >>
>>> >> Now you have me confused.
>>> >> Everything we've seen says that writing to both a local disk and to an
> NFS
>>> >> mounted disk would be the best way to prevent a problem.
>>> >>
>>> >> Now you and Harsh J say that this could actually be problematic.
>>> >>
>>> >> Which is it?
>>> >> Is this now a defect that should be addressed, or should we just not use
> an
>>> >> NFS mounted drive?
>>> >>
>>> >> Thx
>>> >>
>>> >> -Mike
>>> >>
>>> >>
>>>> >>> Date: Mon, 23 Aug 2010 11:42:59 -0700
>>>> >>> From: licht_ji...@yahoo.com
>>>> >>> Subject: Re: what will happen if a backup name node folder becomes
>>>> >>> unaccessible?
>>>> >>> To: common-user@hadoop.apache.org
>>>> >>>
>>>> >>> This makes a good argument. Actually, after seeing the previous reply,
I
>>>> >>> kindof convinced that I should go back to "sync" the meta data to a
> backup
>>>> >>> location instead of using this feature, which as David mentioned,
> introduced
>>>> >>> a 2nd single point of failure to hadoop, which degrades the
>>>> availability
> of
>>>> >>> hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can
> someone
>>>> >>> confirm whether a name node will shut down given that a backup folder
> listed
>>>> >>> in "dfs.name.dir" becomes unavailable in this version?
>>>> >>>
>>>> >>> Thanks,
>>>> >>>
>>>> >>> Michael
>>>> >>>
>>>> >>> --- On Sun, 8/22/10, David B. Ritch  wrote:
>>>> >>>
>>>> >>> From: David B. Ritch 
>>>> >>> Subject: Re: what will happen if a backup name node folder becomes
>>>> >>> unaccessible?
>>>> >>> To: common-user@hadoop.apache.org
&

RE: how to revert from a new version to an older one (CDH3)?

2010-08-24 Thread Sudhir Vallamkondu
More specifics on Michael¹s comment. You can use the yum remove or apt-get
purge to remove the existing install.

For Red Hat systems, run this command:
# yum remove hadoop -y

For Debian systems, run this command:
# apt-get purge hadoop

Verify that you have no Hadoop packages installed on your cluster.

For Red Hat systems, run this command which should return no packages:
$ rpm -qa | grep hadoop

For Debian systems, run this command which should return no packages:
$ dpkg -l | grep hadoop

References:
https://docs.cloudera.com/display/DOC/Hadoop+Upgrade+from+CDH2+to+CDH3

On Aug/24/ 5:08 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Michael Segel 
> Date: Tue, 24 Aug 2010 06:21:30 -0500
> To: 
> Subject: RE: how to revert from a new version to an older one (CDH3)?
> 
> 
> Not sure if you got your question answered...
> 
> You need to delete the current version (via yum) and then specifically
> re-install the version you want by specifying the full name including version.
> 
> HTH 
> -Mike
> 
> 
>> > Date: Mon, 23 Aug 2010 15:00:39 -0700
>> > From: licht_ji...@yahoo.com
>> > Subject: how to revert from a new version to an older one (CDH3)?
>> > To: common-user@hadoop.apache.org
>> > 
>> > I want to replace a new CDH version 0.20.2+320 with an older one
>> 0.20.2+228. 
>> > 
>> > "yum downgrade" reports that version can only be upgraded. I also didn't
>> find a way to yum install the older version.
>> > 
>> > I guess I can download tar ball of the old version and extract it to where
>> the new version is installed and overwrite it. But seems not a good solution
>> because it might have negative impact on upgrading in the future.
>> > 
>> > So, what is the best way to do this?
>> > 
>> > Thanks,
>> > 
>> > Michael
>> > 
>> > 
>> >   
> 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Hadoop sorting algorithm on equal keys

2010-08-24 Thread Sudhir Vallamkondu
You can specify a custom sort option using the JobConf.

One way to do this is to provide a custom class that implements
WritableComparable and use that as the jey class. Another way is to specify
a CustomComparator in the job configuration via the
setOutputKeyComparatorClass() method on the JobConf object.

- Sudhir

On Aug/24/ 5:08 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Teodor Macicas 
> Date: Tue, 24 Aug 2010 11:21:39 +0200
> To: "common-user@hadoop.apache.org" 
> Subject: Hadoop sorting algorithm on equal keys
> 
> Hello,
> 
> Let's say that we have two maps outputs which will be sorted before the
> reducer will start. Doesn't matter what {a,b0,b1,c} mean, but let's
> assume that b0=b1.
> Map output1 : a, b0
> Map output2:  c, b1
> In this case we can have 2 different sets of sorted data:
> 1. {a,b0,b1,c}  and
> 2. {a,b1,b0,c}  since b0=b1 .
> 
> In my particular problem I want to distingush between b0 and b1.
> Basically, they are numbers but I have extra-info on which my comparison
> will be made.
> Now, the question is: how can I change Hadoop default behaviour in order
> to control the sorting algorithm on equal keys ?
> 
> Thank you in advance.
> Best,
> Teodor


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: what will happen if a backup name node folder becomes unaccessible?

2010-08-24 Thread Sudhir Vallamkondu
.
New Image Size: 4671

--- after 30 mins

had...@training-vm:~$ jps
12426 NameNode
12647 SecondaryNameNode
12730 JobTracker
16256 Jps
12535 DataNode
12826 TaskTracker








On Aug/24/ 11:05 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: jiang licht 
> Date: Tue, 24 Aug 2010 10:38:32 -0700 (PDT)
> To: 
> Subject: Re: what will happen if a backup name node folder becomes
> unaccessible?
> 
> Sudhir,
> 
> Look forward to your results, if possible with different CDH releases.
> 
> Thanks,
> 
> Michael
> 
> --- On Tue, 8/24/10, Sudhir Vallamkondu 
> wrote:
> 
> From: Sudhir Vallamkondu 
> Subject: Re: what will happen if a backup name node folder becomes
> unaccessible?
> To: common-user@hadoop.apache.org
> Date: Tuesday, August 24, 2010, 10:47 AM
> 
> Harsh
> 
> You seem to be getting an ³all storage directories inaccessible² error.
> Strange coz as per code that gets thrown when all dirs are inaccessible.
> Anycase will test it on cloudera distribution today and publish results.
> 
> - Sudhir
> 
> 
> On Aug/24/ 5:08 AM, "common-user-digest-h...@hadoop.apache.org"
>  wrote:
> 
>> From: Harsh J 
>> Date: Tue, 24 Aug 2010 11:41:48 +0530
>> To: 
>> Subject: Re: common-user Digest 23 Aug 2010 21:21:26 - Issue 1518
>> 
>> Hello Sudhir,
>> 
>> You're right about this, but I don't seem to be getting the warning for the
>> edit log IOException at all in the first place. Here's my steps to get to
>> what I described earlier (note that am just using two directories on the
>> same disk, not two different devices or nfs, etc.) Its my personal computer
>> so I don't mind doing this again for now (as the other directory remains
>> untouched).
>> 
>> *hadoop 11:13:00 ~/.hadoop $* jps
>> 
>> 4954 SecondaryNameNode
>> 
>> 5911 Jps
>> 
>> 5158 TaskTracker
>> 
>> 4592 NameNode
>> 
>> 5650 JobTracker
>> 
>> 4768 DataNode
>> 
>> *hadoop 11:13:02 ~/.hadoop $* hadoop dfs -ls
>> 
>> Found 2 items
>> 
>> -rw-r--r--   1 hadoop supergroup     411536 2010-08-18 15:50
>> /user/hadoop/data
>> drwxr-xr-x   - hadoop supergroup          0 2010-08-18 16:02
>> /user/hadoop/dataout
>> hadoop 11:13:07 ~/.hadoop $ tail -n 10 conf/hdfs-site.xml
>> 
>>   
>> 
>>     *dfs.name.dir*
>> 
>>     /home/hadoop/.dfs/name,*/home/hadoop/.dfs/testdir*
>> 
>>     true
>> 
>>   
>> 
>>   
>> 
>>     dfs.datanode.max.xcievers
>> 
>>     2047
>> 
>>   
>> 
>> 
>> 
>> *hadoop 11:13:25 ~/.hadoop $* ls ~/.dfs/
>> 
>> data  name  testdir
>> 
>> *hadoop 11:13:36 ~/.hadoop $ rm -r ~/.dfs/testdir  *
>> 
>> *hadoop 11:13:49 ~/.hadoop $* jps
>> 
>> 6135 Jps
>> 
>> 4954 SecondaryNameNode
>> 
>> 5158 TaskTracker
>> 
>> 4592 NameNode
>> 
>> 5650 JobTracker
>> 
>> 4768 DataNode
>> 
>> *hadoop 11:13:56 ~/.hadoop $* hadoop dfs -put /etc/profile profile1
>> 
>> *hadoop 11:14:10 ~/.hadoop $* hadoop dfs -put /etc/profile profile2
>> 
>> *hadoop 11:14:12 ~/.hadoop $* hadoop dfs -put /etc/profile profile3
>> 
>> *hadoop 11:14:15 ~/.hadoop $* hadoop dfs -put /etc/profile profile4
>> 
>> 
>> *hadoop 11:17:21 ~/.hadoop $* jps
>> 4954 SecondaryNameNode
>> 
>> 5158 TaskTracker
>> 
>> 4592 NameNode
>> 
>> 5650 JobTracker
>> 
>> 4768 DataNode
>> 
>> 6954 Jps
>> 
>> *hadoop 11:17:23 ~/.hadoop $* tail -f
>> hadoop-0.20.2/logs/hadoop-hadoop-namenode-hadoop.log
>> 2010-08-24 11:14:17,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>> NameSystem.allocateBlock: /user/hadoop/profile4. blk_28644972299224370_1019
>> 
>> 2010-08-24 11:14:17,709 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>> NameSystem.addStoredBlock: blockMap updated: 192.168.1.8:50010 is added to
>> blk_28644972299224370_1019 size 497
>> 2010-08-24 11:14:17,713 INFO org.apache.hadoop.hdfs.StateChange: DIR*
>> NameSystem.completeFile: file /user/hadoop/profile4 is closed by
>> DFSClient_-2054565417
>> 2010-08-24 11:17:31,187 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
>> 192.168.1.8
>> 
>> 2010-08-24 11:17:31,187 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>> 19 Total time for transactions(ms): 4Number of transactions batched in
>> Syncs: 0 Number of syncs: 14 SyncTimes(ms): 183

Re: common-user Digest 25 Aug 2010 07:36:47 -0000 Issue 1523

2010-08-25 Thread Sudhir Vallamkondu
You should use the FileStatus API to access file metadata. See below a
example. 

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSt
atus.html

Configuration conf = new Configuration(); // takes default conf
FileSystem fs = FileSystem.get(conf);
Path dir = new Path("/dir");
FileStatus stat = fs.getFileStatus(dir);
stat.getPath().toUri().getPath(); // gives directory name
stat.isDir(); 
stat.getLen(); 
stat.getModificationTime();
stat.getReplication();
stat.getBlockSize();
stat.getOwner();
stat.getGroup();
stat.getPermission().toString();
  


> From: Denim Live 
> Date: Wed, 25 Aug 2010 07:36:11 + (GMT)
> To: 
> Subject: How to enumerate files in the directories?
> 
> Hello, how can one determine the names of the files in a particular hadoop
> directory, programmatically?


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: How to enumerate files in the directories?

2010-08-25 Thread Sudhir Vallamkondu
You should use the FileStatus API to access file metadata. See below a
example. 

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSt
atus.html

Configuration conf = new Configuration(); // takes default conf
FileSystem fs = FileSystem.get(conf);
Path dir = new Path("/dir");
FileStatus[] stats = fs.listStatus(dir);
foreach(FileStatus stat : stats)
{ 
stat.getPath().toUri().getPath(); // gives directory name
stat.getModificationTime();
stat.getReplication();
stat.getBlockSize();
stat.getOwner();
stat.getGroup();
stat.getPermission().toString();
} 
  


> From: Denim Live 
> Date: Wed, 25 Aug 2010 07:36:11 + (GMT)
> To: 
> Subject: How to enumerate files in the directories?
> 
> Hello, how can one determine the names of the files in a particular hadoop
> directory, programmatically?


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Multiple dirs in mapred.local.dir property

2010-08-30 Thread Sudhir Vallamkondu
We are testing our cluster map-reduce performance by specifying one
dir vs multiple dirs in property “mapred.local.dir”. The documentation
for this property says “ The local directory where MapReduce stores
intermediate data files. May be a comma-separated list of directories
on different devices in order to spread disk i/o”. So I was expecting
a performance boost when specifying two local dirs vs one dir for
property “mapred.local.dir”. We did a sort test and saw the opposite.

- One dir config defaults to ${hadoop.tmp.dir}/mapred/local.
“hadoop.tmp.dir” is set to   "/var/lib/hadoop-0.20/cache/${user.name}"

- Two dir config explicitly sets the “mapred.local.dir” in mapred-site.xml
   
   mapred.local.dir
   /data1/hadoop/mapred/local,/data2/hadoop/mapred/local
  

/data1, /data2 are separate drives on each hadoop cluster instance box

$ df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda3 330G  128G  186G  41% /
/dev/sda1 190M   18M  163M  10% /boot
tmpfs 7.8G 0  7.8G   0% /dev/shm
/dev/sdb1 2.2T  442G  1.8T  20% /data1
/dev/sdc1 2.2T  484G  1.8T  22% /data2

-- Sort test with one vs two dir 

1. For 40 GB Data :

a. Results when one dir was specified in “mapred.local.dir”  :
Time taken for random-data generation : 39 mins 3 sec
Time taken for random-data Sort :  1hrs, 8mins, 5sec
Time taken for sorted-data Validation : 3 mins 22 sec

b. Results when two dirs were specified in mapred.local.dir :
Time taken for random-data generation : 36mins, 51sec
Time taken for random-data Sort : 1hrs, 38mins, 33sec
Time taken for sorted-data Validation : 3mins, 31sec

2. For 100 GB Data :

a.  Results when one dir was specified in “mapred.local.dir”  :
Time taken for random-data generation : 1hrs, 33mins, 28sec
Time taken for random-data Sort : 3hrs, 50mins, 39sec
Time taken for sorted-data Validation : 7mins, 33sec

b. Results when two dirs were specified in mapred.local.dir :
Time taken for random-data generation : 1hrs, 27mins, 17sec
Time taken for random-data Sort : 6hrs, 35mins, 20sec
Time taken for sorted-data Validation : 8mins, 52sec

The random data generation time had a slight performance gain however
the sort job (which is also I/O intensive) almost doubled in both
instances. Any reason why this is happening?


Re: common-user Digest 26 Sep 2010 04:20:50 -0000 Issue 1548

2010-09-25 Thread Sudhir Vallamkondu
The exceptions below before shutdown are because of inability to access the
jobtracker and namenode. I am guessing jobtracker and namenode are getting
shutdown before these tasktracker and datanodes get shutdown. The errors
below are with connecting to port 9000 and 9001 which are default ports for
namenode and jobtracker (one of the default options)

http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/

- Sudhir


On Sep/25/ 9:20 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: shangan 
> Date: Sun, 26 Sep 2010 11:08:22 +0800
> To: hadoop-user 
> Subject: datanode and tasktracker shutdown error
> 
> previously I have a cluster containing 8 nodes and it woks well. I add 24 new
> datanodes to the cluster, tasktracker and datanode deamons can start but when
> I shutdown the cluster I find those errors on these new added datanodes. Can
> anyone explain it?
> 
> log from tasktracker
> 
> 2010-09-26 09:52:21,672 INFO org.apache.hadoop.mapred.TaskTracker:
> STARTUP_MSG: 
> /
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = localhost/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707;
> compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> /
> 2010-09-26 09:52:21,876 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2010-09-26 09:52:22,006 INFO org.apache.hadoop.http.HttpServer: Port returned
> by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50060
> 2010-09-26 09:52:22,014 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50060
> webServer.getConnectors()[0].getLocalPort() returned 50060
> 2010-09-26 09:52:22,014 INFO org.apache.hadoop.http.HttpServer: Jetty bound to
> port 50060
> 2010-09-26 09:52:22,014 INFO org.mortbay.log: jetty-6.1.14
> 2010-09-26 09:52:42,715 INFO org.mortbay.log: Started
> selectchannelconnec...@0.0.0.0:50060
> 2010-09-26 09:52:42,722 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2010-09-26 09:52:42,737 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=TaskTracker, port=28404
> 2010-09-26 09:52:42,793 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2010-09-26 09:52:42,793 INFO org.apache.hadoop.ipc.Server: IPC Server listener
> on 28404: starting
> 2010-09-26 09:52:42,794 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 0 on 28404: starting
> 2010-09-26 09:52:42,795 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 1 on 28404: starting
> 2010-09-26 09:52:42,795 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 3 on 28404: starting
> 2010-09-26 09:52:42,795 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 2 on 28404: starting
> 2010-09-26 09:52:42,795 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 4 on 28404: starting
> 2010-09-26 09:52:42,795 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 5 on 28404: starting
> 2010-09-26 09:52:42,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 6 on 28404: starting
> 2010-09-26 09:52:42,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 7 on 28404: starting
> 2010-09-26 09:52:42,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 8 on 28404: starting
> 2010-09-26 09:52:42,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 9 on 28404: starting
> 2010-09-26 09:52:42,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 10 on 28404: starting
> 2010-09-26 09:52:42,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 11 on 28404: starting
> 2010-09-26 09:52:42,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 12 on 28404: starting
> 2010-09-26 09:52:42,797 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 13 on 28404: starting
> 2010-09-26 09:52:42,797 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 14 on 28404: starting
> 2010-09-26 09:52:42,797 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 15 on 28404: starting
> 2010-09-26 09:52:42,797 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker
> up at: localhost/127.0.0.1:28404
> 2010-09-26 09:52:42,797 INFO org.apache.hadoop.mapred.TaskTracker: Starting
> tracker tracker_localhost:localhost/127.0.0.1:28404
> 2010-09-26 09:52:55,025 INFO org.apache.hadoop.mapred.TaskTracker: Starting
> thread: Map-events fetcher for all reduce tasks on
> tracker_localhost:localhost/127.0.0.1:28404
> 2010-09-26 09:52:55,027 INFO org.apache.hadoop.mapred.TaskTracker:  Using
> MemoryCalculatorPlugin :
> org.apache.hadoop.util.linuxmemorycalculatorplu...@2de12f6d
> 2010-09-26 09:52:55,031 WARN org.apache.hadoop.mapred.TaskTracker:
> TaskTracker's tot

Re: Total Space Available on Hadoop Cluster Or Hadoop version of "df"

2010-10-04 Thread Sudhir Vallamkondu
fs -du has a 'h' option for human readble values but it doesn't seem to
work. Instead you can use something like this to print in gigs. Adjust the
1024 multiplier for other forms.

hadoop fs -du / | awk '{print ($1/(1024*1024*1024))"g" "\t" $2}'



On 10/4/10 2:04 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Sandhya E 
> Date: Sat, 2 Oct 2010 23:36:55 +0530
> To: 
> Subject: Re: Total Space Available on Hadoop Cluster Or Hadoop version of
> "df".
> 
> There is a fs -du command that can be useful. Or the Hadoop DFS
> website shows the stats also.
> 
> On Sat, Oct 2, 2010 at 9:44 AM, rahul  wrote:
>> Hi,
>> 
>> I am using Hadoop 0.20.2 version for data processing by setting up Hadoop
>> Cluster on two nodes.
>> 
>> And I am continuously adding more space to the nodes.
>> 
>> Can some body let me know how to get the total space available on the hadoop
>> cluster using command line.
>> 
>>  or
>> 
>> Hadoop version "df", Unix command.
>> 
>> Any input is helpful.
>> 
>> Thanks
>> Rahul


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: hadoop or nutch problem?

2010-10-04 Thread Sudhir Vallamkondu
Worth checking if this is caused by the open file limit issue

http://sudhirvn.blogspot.com/2010/07/hadoop-error-logs-orgapachehadoophdfsse
.html


On 10/4/10 2:04 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: AJ Chen 
> Date: Sat, 2 Oct 2010 10:28:29 -0700
> To: , nutch-user 
> Subject: Re: hadoop or nutch problem?
> 
> More observations: during hadoop job running, this "filesystem closed" error
> happens consistently.
> 2010-10-02 05:29:58,951 WARN  mapred.TaskTracker - Error running child
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
> at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:67)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:1678)
> at java.io.FilterInputStream.close(FilterInputStream.java:155)
> at
> org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1584)
> at
> org.apache.hadoop.mapred.SequenceFileRecordReader.close(SequenceFileRecordRead
> er.java:125)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:198)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:362)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 2010-10-02 05:29:58,979 WARN  mapred.TaskRunner - Parent died.  Exiting
> attempt_201009301134_0006_m_74_1
> 
> Could this error turns on the safemode in hadoop? I suspect this because the
> next hadoop job is supposed to create a segment directory and write out
> segment results, but it does not create the directory.  Anything else could
> happen to hdfs?
> 
> thanks,
> -aj
> 
> On Tue, Sep 28, 2010 at 4:40 PM, AJ Chen  wrote:
> 
>> I'm doing web crawling using nutch, which runs on hadoop in distributed
>> mode. When the crawldb has tens of millions of urls, I have started to see
>> strange failure in generating new segment and updating crawldb.
>> For generating segment, the hadoop job for select is completed successfully
>> and generate-temp-1285641291765 is created. but it does not start the
>> partition job and the segment is not created in segments directory. I try to
>> understand where it fails. There is no error message except for a few WARN
>> messages about connection reset by peer. Hadoop fsck and dfsadmin show the
>> nodes and directories are healthy. Is this a hadoop problem or nutch
>> problem? I'll appreciate any suggestion for how to debug this fatal
>> problem.
>> 
>> Similar problem is seen for updatedb step, which creates the temp dir but
>> never actually update the crawldb.
>> 
>> thanks,
>> aj
>> --
>> AJ Chen, PhD
>> Chair, Semantic Web SIG, sdforum.org
>> web2express.org
>> twitter: @web2express
>> Palo Alto, CA, USA
>> 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Help!!The problem about Hadoop

2010-10-05 Thread Sudhir Vallamkondu
You should try implementing some suggestions from this blog post

http://www.cloudera.com/blog/2009/02/the-small-files-problem/

In general just google for tuning map/reduce programs and you will see some
good articles like these

http://www.docstoc.com/docs/3766688/Hadoop-Map-Reduce-Tuning-and-Debugging-A
run-C-Murthy-acmurthy






> From: Jander <442950...@163.com>
> Date: Tue, 5 Oct 2010 16:43:49 +0800 (CST)
> To: 
> Subject: Re:Re: Help!!The problem about Hadoop
> 
> Hi Jeff,

Thank you very much for your reply sincerely.

I exactly know hadoop
> has overhead, but is it too large in my problem?

The 1GB text input has about
> 500 map tasks because the input is composed of little text file. And the time
> each map taken is from 8 seconds to 20 seconds. I use compression like
> conf.setCompressMapOutput(true).

Thanks,
Jander






At 2010-10-05 > 16:28:55,
"Jeff Zhang"  wrote:



>Hi Jander,


>
>Hadoop has > overhead compared to single-machine solution. How many
task
>have you get when
> you run your hadoop job ? And what is time consuming
>for each map and reduce
> task ?
>
>There's lots of tips for performance tuning of hadoop. Such
> as
>compression and jvm reuse.


>
>
>2010/10/5 Jander <442950...@163.com>:
>>
> Hi, all

>> I do an application using hadoop.

>> I take 1GB text data as input
> the result as follows:
>>(1) the cluster of 3 PCs: the time consumed is
> 1020 seconds.
>>(2) the cluster of 4 PCs: the time is about 680
> seconds.
>> But the application before I use Hadoop takes about 280 seconds,
> so as the speed above, I must use 8 PCs in order to have the same speed as
> before. Now the problem: whether it is correct?
>>
>> Jander,
>>
> Thanks.

>>
>>
>>
>
>
>
>-- 
>Best Regards
>
>Jeff Zhang


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Datanode Registration DataXceiver java.io.EOFException

2010-10-05 Thread Sudhir Vallamkondu
We use gangalia for monitoring our cluster and use a nagios plugin that
interfaces with gmeta node to setup various rules around number of
datanodes, missing/corrupted blocks etc

http://www.cloudera.com/blog/2009/03/hadoop-metrics/

http://exchange.nagios.org/directory/Plugins/Network-and-Systems-Management/
Others/check_ganglia/details




> From: Arthur Caranta 
> Date: Mon, 04 Oct 2010 15:46:19 +0200
> To: 
> Subject: Re: Datanode Registration DataXceiver java.io.EOFException
> 
>  On 04/10/10 15:42, Steve Loughran wrote:
>> On 04/10/10 14:30, Arthur Caranta wrote:
>>>   Damn I found the answer to this problem, thanks to someone on the
>>> #hadoop IRC channel ...
>>> 
>>> It was a network check I added for our supervision ... therefore every 5
>>> minutes the supervision connects to the datanode port to check if it is
>>> alive and then disconnects ...
>>> 
>> 
>> why not just GET the various local pages and let your HTTP monitoring
>> tools do the work.
>> 
>> 
> True ... however the tcp method was the fastest to implement and script
> with our current supervision system.
> but I think I might be switching monitoring method.


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: java.net.SocketException: Broken pipe

2010-10-18 Thread Sudhir Vallamkondu
Looks like an network error on Datanode during checkDiskError operation.
Does your datanode use network mounts for storage? If yes then worth
checking mounts.  


On 10/16/10 8:44 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: "Sharma, Avani" 
> Date: Sat, 16 Oct 2010 07:40:51 -0700
> To: "common-user@hadoop.apache.org" 
> Subject: java.net.SocketException: Broken pipe
> 
> I get the below error when dumping a 50G file on one of my Hadoop (0.20.2)
> clusters. It worked fine on another one though. I researched and this seems
> more like a network problem? I want to know how can I go about resolving this.
> What all should I look for on my cluster to debug this.
> 
> 2010-10-15 06:01:50,014 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder blk_-1170040697541244894_3431 1 Exception
> java.net.SocketExcept
> ion: Broken pipe
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTra
> nsferProtocol.java:132)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(Block
> Receiver.java:899)
> at java.lang.Thread.run(Thread.java:619)
> 
> 2010-10-15 06:01:50,016 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> IOException in BlockReceiver.run():
> java.net.SocketException: Broken pipe
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTra
> nsferProtocol.java:132)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(Block
> Receiver.java:1001)
> at java.lang.Thread.run(Thread.java:619)
> 2010-10-15 06:01:50,017 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> checkDiskError: exception:
> java.net.SocketException: Broken pipe
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTra
> nsferProtocol.java:132)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(Block
> Receiver.java:1001)
> at java.lang.Thread.run(Thread.java:619)
> 
> Thanks,
> Avani Sharma


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Unable to access hdfs file system from command terminal

2010-10-19 Thread Sudhir Vallamkondu
You can use the hadoop command line utility to access the file system. You
can specify the namenode to use via the hadoop fs -fs property.

$ hadoop fs --help
-help: Unknown command
Usage: java FsShell
   [-ls ]
   [-lsr ]
   [-df []]
   [-du ]
   [-dus ]
   [-count[-q] ]
   [-mv  ]
   [-cp  ]
   [-rm [-skipTrash] ]
   [-rmr [-skipTrash] ]
   [-expunge]
   [-put  ... ]
   [-copyFromLocal  ... ]
   [-moveFromLocal  ... ]
   [-get [-ignoreCrc] [-crc]  ]
   [-getmerge   [addnl]]
   [-cat ]
   [-text ]
   [-copyToLocal [-ignoreCrc] [-crc]  ]
   [-moveToLocal [-crc]  ]
   [-mkdir ]
   [-setrep [-R] [-w]  ]
   [-touchz ]
   [-test -[ezd] ]
   [-stat [format] ]
   [-tail [-f] ]
   [-chmod [-R]  PATH...]
   [-chown [-R] [OWNER][:[GROUP]] PATH...]
   [-chgrp [-R] GROUP PATH...]
   [-help [cmd]]

Generic options supported are
-conf  specify an application configuration file
-D use value for given property
-fs   specify a namenode
-jt specify a job tracker
-files specify comma separated files to
be copied to the map reduce cluster
-libjars specify comma separated jar files
to include in the classpath.
-archives specify comma separated
archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Here's more details with examples

http://hadoop.apache.org/common/docs/r0.18.3/hdfs_shell.html

Alternatively you can look into using Fuse to mount dfs as a standard file
system

http://wiki.apache.org/hadoop/MountableHDFS

On 10/17/10 10:06 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: siddharth raghuvanshi 
> Date: Sat, 16 Oct 2010 21:46:49 +0530
> To: 
> Subject: Unable to access hdfs file system from command terminal
> 
> Hi,
> 
> Can I access the hadoop filesystem from the terminal like
>  hdfs://cs-sy-230.cse.iitkgp.ernet.in:54310/user/user/blog-hadoop
> 
> It should be noted that I am able to open the following link using firefox
> web browser
> http://cs-sy-230.cse.iitkgp.ernet.in:50075/browseDirectory.jsp?dir=%2Fuser%2Fu
> ser%2Fblog-hadoop&namenodeInfoPort=50070
> 
> Regards
> Siddharth


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Hadoop NameNode Startup Problem

2010-10-30 Thread Sudhir Vallamkondu
Do you run secondary name node? You can use the copy of fsimage and the
editlog from the SNN to recover. Remember that it will be (roughly) an hour
old (default checkpoint config for SNN). The process
for recovery is to copy the fsimage and editlog to a new machine,
place them in the dfs.name.dir/current directory, and start all the
daemons. For cases like these you should configure the namenode to write to
multiple directories, including one over a network filesystem or SAN  so
that you always have a fresh copy.

- Sudhir



On 10/30/10 3:33 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Manish Nene 
> Date: Sat, 30 Oct 2010 21:07:58 +0530
> To: 
> Subject: Hadoop NameNode Startup Problem
> 
> Hi All,
> 
> I'm running a 3 node Cluster, the NameNode basically ran out of space &
> Cluster basically crashed. We freedup space & tried to start the NN, but it
> won't come up.
> 
> Its giving following exception while coming up .
> 
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.18.3
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250;
> compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> /
> 2010-10-30 12:07:57,626 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=54310
> 2010-10-30 12:07:57,632 INFO org.apache.hadoop.dfs.NameNode: Namenode up at:
> red.hoonur.com/192.168.100.122:54310
> 2010-10-30 12:07:57,635 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2010-10-30 12:07:57,651 INFO org.apache.hadoop.dfs.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2010-10-30 12:07:57,739 INFO org.apache.hadoop.fs.FSNamesystem:
> fsOwner=hadoop,hadoop
> 2010-10-30 12:07:57,740 INFO org.apache.hadoop.fs.FSNamesystem:
> supergroup=supergroup
> 2010-10-30 12:07:57,740 INFO org.apache.hadoop.fs.FSNamesystem:
> isPermissionEnabled=false
> 2010-10-30 12:07:57,755 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
> Initializing FSNamesystemMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2010-10-30 12:07:57,756 INFO org.apache.hadoop.fs.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2010-10-30 12:07:57,900 INFO org.apache.hadoop.dfs.Storage: Number of files
> = 2988433
> 2010-10-30 12:09:05,014 INFO org.apache.hadoop.dfs.Storage: Number of files
> under construction = 49
> 2010-10-30 12:09:05,315 INFO org.apache.hadoop.dfs.Storage: Image file of
> size 395864924 loaded in 67 seconds.
> 2010-10-30 12:09:05,351 INFO org.apache.hadoop.dfs.Storage: Edits file edits
> of size 22024 edits # 215 loaded in 0 seconds.
> 2010-10-30 12:09:05,379 ERROR org.apache.hadoop.fs.FSNamesystem:
> FSNamesystem initialization failed.
> java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.Text.readString(Text.java:412)
> at
> org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.j
> ava:84)
> at
> 
org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:98>
)
> at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:483)
> at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
> at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
> at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:273)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:193)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:179)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
> 2010-10-30 12:09:05,380 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 54310
> 2010-10-30 12:09:05,384 ERROR org.apache.hadoop.dfs.NameNode:
> java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.Text.readString(Text.java:412)
> at
> org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.j
> ava:84)
> at
> 
org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:98>
)
> at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:483)
> at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:849)
> at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
> at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
> at org.apache.hadoop.dfs.FSNamesystem.init

Re: what does it mean -- java.io.IOException: Filesystem closed

2010-11-03 Thread Sudhir Vallamkondu
Maybe an issue with OS open file limit. Worth checking

http://sudhirvn.blogspot.com/2010/07/hadoop-error-logs-orgapachehadoophdfsse
.html


On 11/3/10 6:21 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Oleg Ruchovets 
> Date: Tue, 2 Nov 2010 20:38:28 +0200
> To: 
> Subject: what does it mean -- java.io.IOException: Filesystem closed
> 
> Hi ,
>   Running jadoop job from time to time I got such exception (from one of the
> reducers):
> 
> The questions are :
> 1) What does this exception means for the data integrity?
> 2) Does it mean that part of the data which reducer responsible for (and got
> exception) are lost?
> 3) What could cause for such exception?
> 
>   java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:222)
> at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:66)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:294
> 8)
> at
> 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150>
)
> at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
> at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
> at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
> at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream
> .java:49)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write
> Object(TextOutputFormat.java:78)
> at
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write
> (TextOutputFormat.java:99)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContex
> t.java:80)
> at
> com.analytics.hbase.internals.MergeMapReduceHDFSInserter$MergeMapReduceHDFSIns
> erterReducer.reduce(Unknown
> Source)
> at
> com.analytics.hbase.internals.MergeMapReduceHDFSInserter$MergeMapReduceHDFSIns
> erterReducer.reduce(Unknown
> Source)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:563)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> Thanks in Advance
> Oleg.


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Hadoop partitions Problem

2010-11-08 Thread Sudhir Vallamkondu
Just curious as to why this would happen. There are other posts that suggest
that datanode is responsible to enforce round-robin write strategy among the
various disks specified using the "dfs.data.dir" property

http://www.quora.com/Can-Hadoop-deal-with-dfs.data.dir-devices-of-different-
sizes
http://hadoop.apache.org/common/docs/r0.20.1/hdfs-default.html

Couple of reasons I can think of

- The other mount points were added later on although as per the balancing
logic datanode should eventually ensure that all disks are balanced

- Mount points were unavailable at some point. As per the "dfs.data.dir" doc
it says "Directories that do not exist are ignored". Unsure if unvaialable
disks are reset on start of datanode or does it check every time there is a
write.

- older version of hadoop?
  


On 11/7/10 2:19 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Shavit Netzer 
> Date: Fri, 5 Nov 2010 10:12:16 -0700
> To: "common-user@hadoop.apache.org" 
> Cc: "common-user@hadoop.apache.org" 
> Subject: Re: Hadoop partitions Problem
> 
> Yes
> 
> Sent from my mobile
> 
> On 05/11/2010, at 19:09, "Harsh J"  wrote:
> 
>> Hi,
>> 
>> On Fri, Nov 5, 2010 at 9:03 PM, Shavit Netzer 
>> wrote:
>>> Hi,
>>> 
>>> I have hadoop cluster with 24 nodes.
>>> 
>>> Each node have 4 mount disks mnt - mnt3.
>> 
>> Just to confirm -- You've configured all DataNodes to utilize ALL
>> these mount points via the dfs.data.dir property, yes?
>> 
>> 
>> 
>> -- 
>> Harsh J
>> www.harshj.com


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Eclipse - MapReduce Documentation plugin

2010-11-08 Thread Sudhir Vallamkondu
I don't think its related to the plugin. In general for the hover/F2 to work
in eclipse it needs to know the location of java docs for the corresponding
entity. See link below on how to setup javadoc location

http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.user/ref
erence/ref-dialog-javadoc-location.htm

Depending on the hadoop version you are using, you can link to one of the
below locations

http://hadoop.apache.org/common/docs/r0.20.1/api
http://hadoop.apache.org/common/docs/r0.20.2/api/
http://hadoop.apache.org/common/docs/r0.21.0/api/

- Sudhir



On 11/8/10 5:20 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: bichonfrise74 
> Date: Mon, 8 Nov 2010 12:15:18 -0800
> To: 
> Subject: Eclipse - MapReduce Documentation plugin
> 
> I have installed the Eclipse plugin for MapReduce by following this link:
> 
> http://code.google.com/edu/parallel/tools/hadoopvm/index.html
> 
> Typically on Eclipse, when I hover on a class or method, it will show me the
> relevant documentation. When I do the same with Hadoop classes / methods,
> nothing shows up. Is this related to the plugin?
> 
> Thanks.


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Problem identifying cause of a failed job

2010-11-16 Thread Sudhir Vallamkondu
Try upgrading to JVM 6.0_21. We have had JVM issues with 6.0.18 and Hadoop.


On 11/16/10 4:58 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Greg Langmead 
> Date: Tue, 16 Nov 2010 17:50:17 -0500
> To: 
> Subject: Problem identifying cause of a failed job
> 
> Newbie alert.
> 
> I have a Pig script I tested on small data and am now running it on a larger
> data set (85GB). My cluster is two machines right now, each with 16 cores
> and 32G of ram. I configured Hadoop to have 15 tasktrackers on each of these
> nodes. One of them is the namenode, one is the secondary name node. I¹m
> using Pig 0.7.0 and Hadoop 0.20.2 with Java 1.6.0_18 on Linux Fedora Core
> 12, 64-bit.
> 
> My Pig job starts, and eventually a reduce task fails. I¹d like to find out
> why. Here¹s what I know:
> 
> The webUI lists the failed reduce tasks and indicates this error:
> 
> java.io.IOException: Task process exit with nonzero status of 134.
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> 
> The userlog userlogs/attempt_201011151350_0001_r_63_0/stdout says this:
> 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ff74158463c, pid=27109, tid=140699912791824
> #
> # JRE version: 6.0_18-b07
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
> linux-amd64 )
> [thread 140699484784400 also had an error]# Problematic frame:
> 
> # V  [libjvm.so+0x62263c]
> #
> # An error report file with more information is saved as:
> # 
> /tmp/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_201011151350_0001/a
> ttempt_201011151350_0001_r_63_0/work/hs_err_pid27109.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> 
> My mapred-site.xml already includes this:
> 
> 
> keep.failed.task.files
> true
> 
> 
> So I was hoping that the file hs_err_pid27109.log would exist but it
> doesn¹t. I was sure to check the /tmp dir on both tasktrackers. In fact
> there is no dir  
> 
>   jobcache/job_201011151350_0001/attempt_201011151350_0001_r_63_0
> 
> only
> 
>   
> jobcache/job_201011151350_0001/attempt_201011151350_0001_r_63_0.cleanup
> 
> I¹d like to find the source of the segfault, can anyone point me in the
> right direction? 
> 
> Of course let me know if you need more information!


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: common-user Digest 8 Dec 2010 00:07:01 -0000 Issue 1611

2010-12-07 Thread Sudhir Vallamkondu
There is a proper decommissioning process to remove dead nodes. See the FAQ
link here: 
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_
taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

For a fact $HADOOP_HOME/conf/slaves is not used by the name node to keep
track of datanodes/tasktracker. It is merely used by the stop/start hadoop
scripts to know which nodes to start datanode / tasktracker services.
Similarly there is confusion regarding understanding the
$HADOOP_HOME/conf/master file. That file contains the details of the machine
where secondary name node is running, not the name node/job tracker.

With regards to not all java/hadoop processes getting killed, this may be
happening due to hadoop loosing track of pid files. By default the pid files
are configured to be created in the /tmp directory. If these pid files get
deleted then stop/start scripts cannot detect running hadoop processes. I
suggest changing location of pid files to a persistent location like
/var/hadoop/. The $HADOOP_HOME/conf/hadoop-env.sh file has details on
configuring the PID location

- Sudhir


On 12/7/10 5:07 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Tali K 
> Date: Tue, 7 Dec 2010 10:40:16 -0800
> To: 
> Subject: Help: 1) Hadoop processes still are running after we stopped
> hadoop.2)  How to exclude a dead node?
> 
> 
> 1)When I stopped hadoop, we checked all the nodes and found that 2 or 3
> java/hadoop processes were still running on each node.  So we went to each
> node and did a 'killall java' - in some cases I had to do 'killall -9 java'.
> My question : why is is this happening and what would be recommendations , how
> to make sure that there is no hadoop processes running after I stopped hadoop
> with stop-all.sh?
>  
> 2) Also we have a dead node. We  removed this node  from
> $HADOOP_HOME/conf/slaves.  This file is supposed to tell the namenode
>  which machines are supposed to be datanodes/tasktrackers.
> We  started hadoop again, and were surprised to see a dead node in  hadoop
> 'report' ("$HADOOP_HOME/bin/hadoop dfsadmin -report|less")
> It is only after blocking a deadnode and restarting hadoop, deadnode no longer
> showed up in hreport.
> Any recommendations, how to deal with dead nodes?


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: common-user Digest 8 Dec 2010 00:07:01 -0000 Issue 1611

2010-12-07 Thread Sudhir Vallamkondu
I second Ed's answer. Try unistalling whatever you installed and start
fresh. Whenever I see this error when trying to installing a native bridge,
this solution always worked for me.


On 12/7/10 5:07 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Edward Capriolo 
> Date: Tue, 7 Dec 2010 17:22:03 -0500
> To: 
> Subject: Re: HDFS and libhfds
> 
> 2010/12/7 Petrucci Andreas :
>> 
>> hello there, im trying to compile libhdfs in order  but there are some
>> problems. According to http://wiki.apache.org/hadoop/MountableHDFS  i have
>> already installes fuse. With ant compile-c++-libhdfs -Dlibhdfs=1 the buils is
>> successful.
>> 
>> However when i try ant package -Djava5.home=... -Dforrest.home=... the build
>> fails and the output is the below :
>> 
>>  [exec]
>>     [exec] Exception in thread "main" java.lang.UnsupportedClassVersionError:
>> Bad version number in .class file
>>     [exec]     at java.lang.ClassLoader.defineClass1(Native Method)
>>     [exec]     at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
>>     [exec]     at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
>>     [exec]     at
>> java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>>     [exec]     at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
>>     [exec]     at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>>     [exec]     at java.security.AccessController.doPrivileged(Native Method)
>>     [exec]     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>     [exec]     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>     [exec]     at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)
>>     [exec]     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>     [exec]     at
>> org.apache.avalon.excalibur.logger.DefaultLogTargetFactoryManager.configure(D
>> efaultLogTargetFactoryManager.java:113)
>>     [exec]     at
>> org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.j
>> ava:201)
>>     [exec]     at
>> org.apache.avalon.excalibur.logger.LogKitLoggerManager.setupTargetFactoryMana
>> ger(LogKitLoggerManager.java:436)
>>     [exec]     at
>> org.apache.avalon.excalibur.logger.LogKitLoggerManager.configure(LogKitLogger
>> Manager.java:400)
>>     [exec]     at
>> org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.j
>> ava:201)
>>     [exec]     at
>> org.apache.cocoon.core.CoreUtil.initLogger(CoreUtil.java:607)
>>     [exec]     at org.apache.cocoon.core.CoreUtil.init(CoreUtil.java:169)
>>     [exec]     at org.apache.cocoon.core.CoreUtil.(CoreUtil.java:115)
>>     [exec]     at
>> org.apache.cocoon.bean.CocoonWrapper.initialize(CocoonWrapper.java:128)
>>     [exec]     at
>> org.apache.cocoon.bean.CocoonBean.initialize(CocoonBean.java:97)
>>     [exec]     at org.apache.cocoon.Main.main(Main.java:310)
>>     [exec] Java Result: 1
>>     [exec]
>>     [exec]   Copying broken links file to site root.
>>     [exec]
>>     [exec]
>>     [exec] BUILD FAILED
>>     [exec] /apache-forrest-0.8/main/targets/site.xml:175: Warning: Could not
>> find file /hadoop-0.20.2/src/docs/build/tmp/brokenlinks.xml to copy.
>>     [exec]
>>     [exec] Total time: 4 seconds
>> 
>> BUILD FAILED
>> /hadoop-0.20.2/build.xml:867: exec returned: 1
>> 
>> 
>> any ideas what's wrong???
>> 
> 
> I never saw this usage:
> -Djava5.home
> Try
> export JAVA_HOME=/usr/java
> 
> " Bad version number in .class file " means you are mixing and
> matching java versions somehow.


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




RE: HDFS and libhfds

2010-12-07 Thread Sudhir Vallamkondu
Try this and see if it works

Open the build.xml file and the add env JAVA_HOME in compile-core-native
target. After adding the change should look like below


 
 
 
 
 
 



On 12/7/10 5:07 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Petrucci Andreas 
> Date: Wed, 8 Dec 2010 02:06:26 +0200
> To: 
> Subject: RE: HDFS and libhfds
> 
> 
> yes, my JAVA_HOME is properly set. however in hadoop 0.20.2 that i'm using
> when i run from HADOOP_HOME the command ant compile-contrib -Dlibhdfs=1
> -Dcompile.c++=1  then the tail of the output is the following :
> 
> [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c: In
> function 'hdfsUtime':
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1488:
> error: 'JNIEnv' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1488:
> error: 'env' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1490:
> error: 'errno' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1494:
> error: 'jobject' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1494:
> error: expected ';' before 'jFS'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1497:
> error: expected ';' before 'jPath'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1498:
> error: 'jPath' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1503:
> error: 'jlong' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1503:
> error: expected ';' before 'jmtime'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1504:
> error: expected ';' before 'jatime'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1507:
> error: 'jthrowable' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1507:
> error: expected ';' before 'jExc'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1508:
> error: 'jExc' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1508:
> error: 'jFS' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1510:
> error: 'jmtime' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1510:
> error: 'jatime' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c: In
> function 'hdfsGetHosts':
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1533:
> error: 'JNIEnv' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1533:
> error: 'env' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1535:
> error: 'errno' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1539:
> error: 'jobject' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1539:
> error: expected ';' before 'jFS'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1542:
> error: expected ';' before 'jPath'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1543:
> error: 'jPath' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1547:
> error: 'jvalue' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1547:
> error: expected ';' before 'jFSVal'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1548:
> error: 'jthrowable' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1548:
> error: expected ';' before 'jFSExc'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1549:
> error: 'jFSVal' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1549:
> error: 'jFSExc' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1549:
> error: 'jFS' undeclared (first use in this function)
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1559:
> error: expected ';' before 'jFileStatus'
>  [exec] /home/hy59045/sfakiana/hadoop-0.20.2/src/c++/libhdfs/hdfs.c:1563:
> error: 'jobjectArray' undeclared (first use i

Re: Help: 1) Hadoop processes still are running after we stopped > hadoop.2) How to exclude a dead node?

2010-12-07 Thread Sudhir Vallamkondu
There is a proper decommissioning process to remove dead nodes. See the FAQ
link here: 
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_
taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

For a fact $HADOOP_HOME/conf/slaves is not used by the name node to keep
track of datanodes/tasktracker. It is merely used by the stop/start hadoop
scripts to know which nodes to start datanode / tasktracker services.
Similarly there is confusion regarding understanding the
$HADOOP_HOME/conf/master file. That file contains the details of the machine
where secondary name node is running, not the name node/job tracker.

With regards to not all java/hadoop processes getting killed, this may be
happening due to hadoop loosing track of pid files. By default the pid files
are configured to be created in the /tmp directory. If these pid files get
deleted then stop/start scripts cannot detect running hadoop processes. I
suggest changing location of pid files to a persistent location like
/var/hadoop/. The $HADOOP_HOME/conf/hadoop-env.sh file has details on
configuring the PID location

- Sudhir


On 12/7/10 5:07 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Tali K 
> Date: Tue, 7 Dec 2010 10:40:16 -0800
> To: 
> Subject: Help: 1) Hadoop processes still are running after we stopped
> hadoop.2)  How to exclude a dead node?
> 
> 
> 1)When I stopped hadoop, we checked all the nodes and found that 2 or 3
> java/hadoop processes were still running on each node.  So we went to each
> node and did a 'killall java' - in some cases I had to do 'killall -9 java'.
> My question : why is is this happening and what would be recommendations , how
> to make sure that there is no hadoop processes running after I stopped hadoop
> with stop-all.sh?
>  
> 2) Also we have a dead node. We  removed this node  from
> $HADOOP_HOME/conf/slaves.  This file is supposed to tell the namenode
>  which machines are supposed to be datanodes/tasktrackers.
> We  started hadoop again, and were surprised to see a dead node in  hadoop
> 'report' ("$HADOOP_HOME/bin/hadoop dfsadmin -report|less")
> It is only after blocking a deadnode and restarting hadoop, deadnode no longer
> showed up in hreport.
> Any recommendations, how to deal with dead nodes?


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: HDFS and libhfds

2010-12-07 Thread Sudhir Vallamkondu
I second Ed's answer. Try unistalling whatever you installed and start
fresh. Whenever I see this error when trying to installing a native bridge,
this solution always worked for me.


On 12/7/10 5:07 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Edward Capriolo 
> Date: Tue, 7 Dec 2010 17:22:03 -0500
> To: 
> Subject: Re: HDFS and libhfds
> 
> 2010/12/7 Petrucci Andreas :
>> 
>> hello there, im trying to compile libhdfs in order  but there are some
>> problems. According to http://wiki.apache.org/hadoop/MountableHDFS  i have
>> already installes fuse. With ant compile-c++-libhdfs -Dlibhdfs=1 the buils is
>> successful.
>> 
>> However when i try ant package -Djava5.home=... -Dforrest.home=... the build
>> fails and the output is the below :
>> 
>>  [exec]
>>     [exec] Exception in thread "main" java.lang.UnsupportedClassVersionError:
>> Bad version number in .class file
>>     [exec]     at java.lang.ClassLoader.defineClass1(Native Method)
>>     [exec]     at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
>>     [exec]     at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
>>     [exec]     at
>> java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>>     [exec]     at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
>>     [exec]     at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>>     [exec]     at java.security.AccessController.doPrivileged(Native Method)
>>     [exec]     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>     [exec]     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>     [exec]     at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)
>>     [exec]     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>     [exec]     at
>> org.apache.avalon.excalibur.logger.DefaultLogTargetFactoryManager.configure(D
>> efaultLogTargetFactoryManager.java:113)
>>     [exec]     at
>> org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.j
>> ava:201)
>>     [exec]     at
>> org.apache.avalon.excalibur.logger.LogKitLoggerManager.setupTargetFactoryMana
>> ger(LogKitLoggerManager.java:436)
>>     [exec]     at
>> org.apache.avalon.excalibur.logger.LogKitLoggerManager.configure(LogKitLogger
>> Manager.java:400)
>>     [exec]     at
>> org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.j
>> ava:201)
>>     [exec]     at
>> org.apache.cocoon.core.CoreUtil.initLogger(CoreUtil.java:607)
>>     [exec]     at org.apache.cocoon.core.CoreUtil.init(CoreUtil.java:169)
>>     [exec]     at org.apache.cocoon.core.CoreUtil.(CoreUtil.java:115)
>>     [exec]     at
>> org.apache.cocoon.bean.CocoonWrapper.initialize(CocoonWrapper.java:128)
>>     [exec]     at
>> org.apache.cocoon.bean.CocoonBean.initialize(CocoonBean.java:97)
>>     [exec]     at org.apache.cocoon.Main.main(Main.java:310)
>>     [exec] Java Result: 1
>>     [exec]
>>     [exec]   Copying broken links file to site root.
>>     [exec]
>>     [exec]
>>     [exec] BUILD FAILED
>>     [exec] /apache-forrest-0.8/main/targets/site.xml:175: Warning: Could not
>> find file /hadoop-0.20.2/src/docs/build/tmp/brokenlinks.xml to copy.
>>     [exec]
>>     [exec] Total time: 4 seconds
>> 
>> BUILD FAILED
>> /hadoop-0.20.2/build.xml:867: exec returned: 1
>> 
>> 
>> any ideas what's wrong???
>> 
> 
> I never saw this usage:
> -Djava5.home
> Try
> export JAVA_HOME=/usr/java
> 
> " Bad version number in .class file " means you are mixing and
> matching java versions somehow.


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Help: 1) Hadoop processes still are running after we stopped hadoop.2) How to exclude a dead node?

2010-12-08 Thread Sudhir Vallamkondu
Yes.

Reference: I couldn't find a apache hadoop page describing this but see
below link 
http://serverfault.com/questions/115148/hadoop-slaves-file-necessary


On 12/7/10 11:59 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: li ping 
> Date: Wed, 8 Dec 2010 14:17:40 +0800
> To: 
> Subject: Re: Help: 1) Hadoop processes still are running after we stopped >
> hadoop.2) How to exclude a dead node?
> 
> I am not sure I have fully understand your post.
> You mean the conf/slaves only be used for stop/start script to start or stop
> the datanode/tasktracker?
> And the conf/master only contains the information about the secondary
> namenode?
> 
> Thanks
> 
> On Wed, Dec 8, 2010 at 1:44 PM, Sudhir Vallamkondu <
> sudhir.vallamko...@icrossing.com> wrote:
> 
>> There is a proper decommissioning process to remove dead nodes. See the FAQ
>> link here:
>> 
>> http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_
>> taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
>> 
>> For a fact $HADOOP_HOME/conf/slaves is not used by the name node to keep
>> track of datanodes/tasktracker. It is merely used by the stop/start hadoop
>> scripts to know which nodes to start datanode / tasktracker services.
>> Similarly there is confusion regarding understanding the
>> $HADOOP_HOME/conf/master file. That file contains the details of the
>> machine
>> where secondary name node is running, not the name node/job tracker.
>> 
>> With regards to not all java/hadoop processes getting killed, this may be
>> happening due to hadoop loosing track of pid files. By default the pid
>> files
>> are configured to be created in the /tmp directory. If these pid files get
>> deleted then stop/start scripts cannot detect running hadoop processes. I
>> suggest changing location of pid files to a persistent location like
>> /var/hadoop/. The $HADOOP_HOME/conf/hadoop-env.sh file has details on
>> configuring the PID location
>> 
>> - Sudhir
>> 
>> 
>> On 12/7/10 5:07 PM, "common-user-digest-h...@hadoop.apache.org"
>>  wrote:
>> 
>>> From: Tali K 
>>> Date: Tue, 7 Dec 2010 10:40:16 -0800
>>> To: 
>>> Subject: Help: 1) Hadoop processes still are running after we stopped
>>> hadoop.2)  How to exclude a dead node?
>>> 
>>> 
>>> 1)When I stopped hadoop, we checked all the nodes and found that 2 or 3
>>> java/hadoop processes were still running on each node.  So we went to
>> each
>>> node and did a 'killall java' - in some cases I had to do 'killall -9
>> java'.
>>> My question : why is is this happening and what would be recommendations
>> , how
>>> to make sure that there is no hadoop processes running after I stopped
>> hadoop
>>> with stop-all.sh?
>>> 
>>> 2) Also we have a dead node. We  removed this node  from
>>> $HADOOP_HOME/conf/slaves.  This file is supposed to tell the namenode
>>>  which machines are supposed to be datanodes/tasktrackers.
>>> We  started hadoop again, and were surprised to see a dead node in
>>  hadoop
>>> 'report' ("$HADOOP_HOME/bin/hadoop dfsadmin -report|less")
>>> It is only after blocking a deadnode and restarting hadoop, deadnode no
>> longer
>>> showed up in hreport.
>>> Any recommendations, how to deal with dead nodes?
>> 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Regarding decommission progress status for datanode

2010-12-17 Thread Sudhir Vallamkondu
you can write a simple html page scrape to get the results

On 12/17/10 9:32 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: sandeep 
> Date: Fri, 17 Dec 2010 11:35:52 +0530
> To: 
> Subject: Regarding decommission progress status for datanode
> 
> Hi All,
> 
>  
> 
> Is there anyway to know while decommission in progress  that a given
> datanode is decommissioned or not using java API .I need it because i want
> to automate this instead of manual intervention
> 
>  
> 
> Right now we are checking  manually in Namenode-UI  /LiveNodesLink  and
> using hadoop dfsadmin -report
> 
>  
> 
> Please let me know
> 
>  
> 
> Thanks
> 
> sandeep


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Hadoop/Elastic MR on AWS

2010-12-27 Thread Sudhir Vallamkondu
We recently crossed this bridge and here are some insights. We did an
extensive study comparing costs and benchmarking local vs EMR for our
current needs and future trend.

- Scalability you get with EMR is unmatched although you need to look at
your requirement and decide this is something you need.

- When using EMR its cheaper to use reserved instances vs nodes on the fly.
You can always add more nodes when required. I suggest looking at your
current computing needs and reserve instances for a year or two and use
these to run EMR and add nodes at peak needs. In your cost estimation you
will need to factor in the data transfer time/costs unless you are dealing
with public datasets on S3

- EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
benchmark). For IO intensive jobs you will need to add more nodes to
compensate this.

- When compared to local cluster, you will need to factor the time it takes
for the EMR cluster to setup when starting a job. This like data transfer
time, cluster replication time etc

- EMR API is very flexible however you will need to build a custom interface
on top of it to suit your job management and monitoring needs

- EMR bootstrap actions can satisfy most of your native lib needs so no
drawbacks there.


-- Sudhir


On 12/26/10 5:26 AM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Otis Gospodnetic 
> Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
> To: 
> Subject: Re: Hadoop/Elastic MR on AWS
> 
> Hello Amandeep,
> 
> 
> 
> - Original Message 
>> From: Amandeep Khurana 
>> To: common-user@hadoop.apache.org
>> Sent: Fri, December 10, 2010 1:14:45 AM
>> Subject: Re: Hadoop/Elastic MR on AWS
>> 
>> Mark,
>> 
>> Using EMR makes it very easy to start a cluster and add/reduce  capacity as
>> and when required. There are certain optimizations that make EMR  an
>> attractive choice as compared to building your own cluster out. Using  EMR
> 
> 
> Could you please point out what optimizations you are referring to?
> 
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
>> also ensures you are using a production quality, stable system backed by  the
>> EMR engineers. You can always use bootstrap actions to put your own  tweaked
>> version of Hadoop in there if you want to do that.
>> 
>> Also, you  don't have to tear down your cluster after every job. You can set
>> the alive  option when you start your cluster and it will stay there even
>> after your  Hadoop job completes.
>> 
>> If you face any issues with EMR, send me a mail  offline and I'll be happy to
>> help.
>> 
>> -Amandeep
>> 
>> 
>> On Thu, Dec 9,  2010 at 9:47 PM, Mark   wrote:
>> 
>>> Does anyone have any thoughts/experiences on running Hadoop  in AWS? What
>>> are some pros/cons?
>>> 
>>> Are there any good  AMI's out there for this?
>>> 
>>> Thanks for any advice.
>>> 
>> 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Hadoop/Elastic MR on AWS

2010-12-28 Thread Sudhir Vallamkondu
Unfortunately I can't publish the exact numbers however here are the various
things we considered

First off our data trends. We gathered our current data size and plotted a
future growth trend for the next few years. We then finalized on a archival
strategy to understand how much data needs to be on the cluster on a
rotating basis. We crunch our data often (meaning as we get them) so
computing power is not an issue and the cluster size was mainly driven by
our data size that needs to be readily available and replication strategy.
We factored in compression use on older rotating data.

Once we had the above numbers we could decide on our cluster infrastructure
size and type of hardware needed.

For local cluster we factored in hardware, warranty, regular networking
stuff for cluster that size, data center costs, support manpower. We also
factored in a NAS and bandwidth costs to replicate cluster data to another
data center for active replication.

For EMR costs we compared a reserved instance cluster (nodes reserved for
3years with similar hardware config as above) with above cluster size vs
nodes on the fly. We factored in S3 costs to store the above calculated
rotating data and bandwidth costs for data coming in and coming out. One
thing to note is Amazon EMR costs are above normal EC2 instance costs. For
example if you run a job in EMR with 4 nodes and the job overall takes 1hr
then total EMR cost (excluding any data transfer costs) = 4*1*{EMR /hour} +
4*1*EC2 /hour cost. Hopefully that makes sense.

I am sure missing a few things above but that's the jist of it.

- Sudhir

  




On 12/27/10 9:22 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Dave Viner 
> Date: Mon, 27 Dec 2010 10:23:37 -0800
> To: 
> Subject: Re: Hadoop/Elastic MR on AWS
> 
> Hi Sudhir,
> 
> Can you publish your findings around pricing, and how you calculated the
> various aspects?
> 
> This is great information.
> 
> Thanks
> Dave Viner
> 
> 
> On Mon, Dec 27, 2010 at 10:17 AM, Sudhir Vallamkondu <
> sudhir.vallamko...@icrossing.com> wrote:
> 
>> We recently crossed this bridge and here are some insights. We did an
>> extensive study comparing costs and benchmarking local vs EMR for our
>> current needs and future trend.
>> 
>> - Scalability you get with EMR is unmatched although you need to look at
>> your requirement and decide this is something you need.
>> 
>> - When using EMR its cheaper to use reserved instances vs nodes on the fly.
>> You can always add more nodes when required. I suggest looking at your
>> current computing needs and reserve instances for a year or two and use
>> these to run EMR and add nodes at peak needs. In your cost estimation you
>> will need to factor in the data transfer time/costs unless you are dealing
>> with public datasets on S3
>> 
>> - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
>> benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
>> benchmark). For IO intensive jobs you will need to add more nodes to
>> compensate this.
>> 
>> - When compared to local cluster, you will need to factor the time it takes
>> for the EMR cluster to setup when starting a job. This like data transfer
>> time, cluster replication time etc
>> 
>> - EMR API is very flexible however you will need to build a custom
>> interface
>> on top of it to suit your job management and monitoring needs
>> 
>> - EMR bootstrap actions can satisfy most of your native lib needs so no
>> drawbacks there.
>> 
>> 
>> -- Sudhir
>> 
>> 
>> On 12/26/10 5:26 AM, "common-user-digest-h...@hadoop.apache.org"
>>  wrote:
>> 
>>> From: Otis Gospodnetic 
>>> Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
>>> To: 
>>> Subject: Re: Hadoop/Elastic MR on AWS
>>> 
>>> Hello Amandeep,
>>> 
>>> 
>>> 
>>> - Original Message 
>>>> From: Amandeep Khurana 
>>>> To: common-user@hadoop.apache.org
>>>> Sent: Fri, December 10, 2010 1:14:45 AM
>>>> Subject: Re: Hadoop/Elastic MR on AWS
>>>> 
>>>> Mark,
>>>> 
>>>> Using EMR makes it very easy to start a cluster and add/reduce  capacity
>> as
>>>> and when required. There are certain optimizations that make EMR  an
>>>> attractive choice as compared to building your own cluster out. Using
>>  EMR
>>> 
>>> 
>>> Could you please point out what optimizations you are referring to?
>>> 
>>> Thanks,
>>> Otis
>>> 
>>> Sematext :: http://sematext.

Re: UI doesn't work

2010-12-28 Thread Sudhir Vallamkondu
I recently had this issue. UI links were working for some nodes meaning when
I go to dfsHealth.jsp page and following some cluster data node links some
would work and some would show a 404 error.

I started tracing them all the way from the listening ports. Data nodes port
is 50010 so do a netsat on that port to find what process is listening in.
Then check that process to see if its the data node. The issue I had was
somehow when I did the hadoop upgrade I had a older instance and a new
instance of data node running and it was all messed up so I had to kill all
hadoop processes and do a clean start.


On 12/27/10 9:22 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Harsh J 
> Date: Tue, 28 Dec 2010 09:51:11 +0530
> To: 
> Subject: Re: UI doesn't work
> 
> I remember facing such an issue with the JT (50030) once. None of the
> jsp pages would load, 'cept the index. It was some odd issue with the
> webapps not getting loaded right while startup. Don't quite remember
> how it got solved.
> 
> Did you do any ant operation on your release copy of Hadoop prior to
> starting it, by the way?
> 
> On Tue, Dec 28, 2010 at 5:15 AM, maha  wrote:
>> Hi,
>> 
>>  I get Error 404 when I try to use hadoop UI to monitor my job execution. I'm
>> using Hadoop-0.20.2 and the following are parts of my configuration files.
>> 
>>  in Core-site.xml:
>>    fs.default.name
>>    hdfs://speed.cs.ucsb.edu:9000
>> 
>> in mapred-site.xml:
>>    mapred.job.tracker
>>    speed.cs.ucsb.edu:9001
>> 
>> 
>> when I try to open:  http://speed.cs.ucsb.edu:50070/   I get the 404 Error.
>> 
>> 
>> Any ideas?
>> 
>>  Thank you,
>>     Maha
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Hadoop/Elastic MR on AWS

2010-12-29 Thread Sudhir Vallamkondu
> Are there any independent sites that collect cloud uptime numbers?
Not that I know of.

If you look at the full post content people have raised quite a few pros and
cons. We are analyzing the AWS Cloudwatch API and see how we can leverage it
to monitor EMR. EMR is offered in manu of their regions and since we are
planning on using S3 as the raw data store, if one region is experiencing
problems we can always look into killing the job and starting off in another
region. Just a thought.

http://lucene.472066.n3.nabble.com/Hadoop-Elastic-MR-on-AWS-td2058471.html


On 12/28/10 8:01 PM, "common-user-digest-h...@hadoop.apache.org"
 wrote:

> From: Lance Norskog 
> Date: Tue, 28 Dec 2010 18:50:14 -0800
> To: 
> Subject: Re: Hadoop/Elastic MR on AWS
> 
> Cloud providers have more uptime problems than dedicated servers. And
> it is impossible to benchmark: virtual server implementations do not
> apply quotas to I/O. I've seen the same 'instance size' have 5x deltas
> in disk bandwidth from one day to the next.
> 
> Are there any independent sites that collect cloud uptime numbers?
> 
> On Tue, Dec 28, 2010 at 5:41 PM, Sudhir Vallamkondu
>  wrote:
>> Unfortunately I can't publish the exact numbers however here are the various
>> things we considered
>> 
>> First off our data trends. We gathered our current data size and plotted a
>> future growth trend for the next few years. We then finalized on a archival
>> strategy to understand how much data needs to be on the cluster on a
>> rotating basis. We crunch our data often (meaning as we get them) so
>> computing power is not an issue and the cluster size was mainly driven by
>> our data size that needs to be readily available and replication strategy.
>> We factored in compression use on older rotating data.
>> 
>> Once we had the above numbers we could decide on our cluster infrastructure
>> size and type of hardware needed.
>> 
>> For local cluster we factored in hardware, warranty, regular networking
>> stuff for cluster that size, data center costs, support manpower. We also
>> factored in a NAS and bandwidth costs to replicate cluster data to another
>> data center for active replication.
>> 
>> For EMR costs we compared a reserved instance cluster (nodes reserved for
>> 3years with similar hardware config as above) with above cluster size vs
>> nodes on the fly. We factored in S3 costs to store the above calculated
>> rotating data and bandwidth costs for data coming in and coming out. One
>> thing to note is Amazon EMR costs are above normal EC2 instance costs. For
>> example if you run a job in EMR with 4 nodes and the job overall takes 1hr
>> then total EMR cost (excluding any data transfer costs) = 4*1*{EMR /hour} +
>> 4*1*EC2 /hour cost. Hopefully that makes sense.
>> 
>> I am sure missing a few things above but that's the jist of it.
>> 
>> - Sudhir
>> 
>> 
>> 
>> 
>> 
>> 
>> On 12/27/10 9:22 PM, "common-user-digest-h...@hadoop.apache.org"
>>  wrote:
>> 
>>> From: Dave Viner 
>>> Date: Mon, 27 Dec 2010 10:23:37 -0800
>>> To: 
>>> Subject: Re: Hadoop/Elastic MR on AWS
>>> 
>>> Hi Sudhir,
>>> 
>>> Can you publish your findings around pricing, and how you calculated the
>>> various aspects?
>>> 
>>> This is great information.
>>> 
>>> Thanks
>>> Dave Viner
>>> 
>>> 
>>> On Mon, Dec 27, 2010 at 10:17 AM, Sudhir Vallamkondu <
>>> sudhir.vallamko...@icrossing.com> wrote:
>>> 
>>>> We recently crossed this bridge and here are some insights. We did an
>>>> extensive study comparing costs and benchmarking local vs EMR for our
>>>> current needs and future trend.
>>>> 
>>>> - Scalability you get with EMR is unmatched although you need to look at
>>>> your requirement and decide this is something you need.
>>>> 
>>>> - When using EMR its cheaper to use reserved instances vs nodes on the fly.
>>>> You can always add more nodes when required. I suggest looking at your
>>>> current computing needs and reserve instances for a year or two and use
>>>> these to run EMR and add nodes at peak needs. In your cost estimation you
>>>> will need to factor in the data transfer time/costs unless you are dealing
>>>> with public datasets on S3
>>>> 
>>>> - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
>>>> benchmark map/reduce) however IO benchmarks were