Fwd:

2013-02-11 Thread madhu phatak
http://www.colpermessodellafortuna.it/i0lwqj.php?s=ot



Using mapred.max.split.size to increase number of mappers

2012-07-10 Thread madhu phatak
Hi,
  In old API FileInputFormat (org.apache.hadoop.mapred.FileInputFormat)
 code  "mapred.max.split.size" parameter is not used. Is any other way we
change the number of mappers in old API?


-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Hadoop on windows without cygwin

2012-07-04 Thread madhu phatak
Original blog post
http://vorlsblog.blogspot.com/2010/05/running-hadoop-on-windows-without.html


On Thu, Jul 5, 2012 at 6:57 AM, Ravi Shankar Nair <
ravishankar.n...@gmail.com> wrote:

>
> A document on installing Hadoop on Windows without installing CYGWIN is
> available here
>
> http://forourson.com/blogs/?p=63
>
> Best rgds,
> Ravion
>
> >>
> >>
> >>
> >>
> >>
> >>
> >>>
> >>>
> 
> >
> >
> >>
> >>
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
> >>>
> >>>
> >>>
> >>>
> >>
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Using LinkedHashMap with MapWritable

2012-06-29 Thread madhu phatak
hi,
 Current implementation of MapWritable only support HashMap implementation.
But for my application I need a LinkedHashMap since order of keys is
important for me. I m trying to customize the MapWritable
to accommodate  custom implementation but whenever I make change to
Writable, all the sequence files written prior to the change, starts to
give EOF exceptions. In Hadoop, as I observed its not possible to change
writables once they r used to persist data into HDFS. Is there anyway
overcoming this limitation?

-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Is it possible to implement transpose with PigLatin/any other MR language?

2012-06-21 Thread madhu phatak
Hi,
 Its possible in Map/Reduce. Look into the code here
https://github.com/zinnia-phatak-dev/Nectar/tree/master/Nectar-regression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce



2012/6/21 Subir S 

> Hi,
>
> Is it possible to implement transpose operation of rows into columns and
> vice versa...
>
>
> i.e.
>
> col1 col2 col3
> col4 col5 col6
> col7 col8 col9
> col10 col11 col12
>
> can this be converted to
>
> col1 col4 col7 col10
> col2 col5 col8 col11
> col3 col6 col9 col12
>
> Is this even possible with map reduce? If yes, which language helps to
> achieve this faster?
>
> Thanks
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Web Administrator UI is not accessible

2012-06-18 Thread madhu phatak
Hi,
May be namenode is down. Please look into namenode logs.

On Thu, Jun 14, 2012 at 9:37 PM, Yongwei Xing  wrote:

> Hi all
>
> My hadoop is running well for some days. Suddenly, the
> http://localhost:50070 is not accessible. Give such message like below.
> HTTP ERROR 404
>
> Problem accessing /dfshealth.jsp. Reason:
>
>/dfshealth.jsp
>
> --
> *Powered by Jetty://*
> *
> *
> *Does anyone meet it?*
>
> --
> Welcome to my ET Blog http://www.jdxyw.com
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: job history log file

2012-06-18 Thread madhu phatak
Refer this
http://www.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/


On Fri, Jun 15, 2012 at 1:49 PM, cldo cldo  wrote:

> Where are hadoop job history log files ?
> thank.
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Calling hive from reducer

2012-06-12 Thread madhu phatak
Hi,
 I am trying to call a hive query from Reducer , but getting following error

Exception in thread "Thread-10" java.lang.NoClassDefFoundError:
org/apache/thrift/TBase
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)

Caused by: java.lang.ClassNotFoundException: org.apache.thrift.TBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

libthrift jar is in classpath of the job (through distributed
cache),but still not able to load the class? Can anyone please help me
to resolve this issue?



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Can anyone help me with large distributed cache files?

2012-06-11 Thread madhu phatak
Hi Sheng,
By default , cache size is 10GB which means your file can be placed
in distributed cache .If you want more memory configure
  local.cache.size  in mapred-site.xml for bigger value.

On Tue, Jun 12, 2012 at 5:22 AM, Sheng Guo  wrote:

> Hi,
>
> Sorry to bother you all, this is my first question here in hadoop user
> mailing list.
> Can anyone help me with the memory configuration if distributed cache is
> very large and requires more memory? (2GB)
>
> And also in this case when distributed cache is very large, how do we
> handle this normally? By configure something to give more memory? or this
> should be avoided?
>
> Thanks
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Need some help for writing map reduce functions in hadoop-1.0.1 java

2012-05-22 Thread madhu phatak
Hi,
 You can go through the code of this project (
https://github.com/zinnia-phatak-dev/Nectar) to understand how the complex
algorithms are implemented using M/R.

On Fri, May 18, 2012 at 12:16 PM, Ravi Joshi  wrote:

> I am writing my own map and reduce method for implementing K Means
> algorithm in Hadoop-1.0.1 in java language. Although i got some example
> link of K Means algorithm in Hadoop over blogs but i don't want to copy
> their code, as a lerner i want to implement it my self. So i just need some
> ideas/clues for the same. Below is the work which i already done.
>
> I have Point and Cluster classes which are Writable, Point class have
> point x, point y and Cluster by whom this Point belongs. On the other hand
> my Cluster class has an ArrayList which stores all the Point objects which
> belongs to that Cluster. Cluseter class has an centroid variable also. Hope
> i am going correct (if not correct me please.)
>
> Now first of all my input (which is a file, containing some points
> coordinates) must be provided to Point Objects. I mean this input file must
> be mapped to all the Point. This should be done ONCE in map class (but
> how?). After assigning some value to each Point, some random Cluster must
> be chosen at the initial phase (This must be done only ONCE, but how). Now
> every Point must be mapped to all the cluster with the distance between
> that point and centroid. In the reduce method, every Point will be checked
> and assigned to that Cluster which is nearest to that Point (by comparing
> the distance). Now new centroid is calculated in each Cluster (Should map
> and reduce be called recursively? if yes then where all the initialization
> part would go. Here by saying initialization i mean providing input to
> Point objects (which must be done ONCE initially) and choosing some random
> centroid (Initially we have to choose random centroid ONCE) ).
> One more question, The value of parameter K(which will decide the total
> number of clusters should be assigned by user or hadoop will itself decide
> it?)
>
> Somebody please explain me, i don't need the code, i want to write it
> myself. I need a way. Thank you.
>
> -Ravi
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Where does Hadoop store its maps?

2012-05-22 Thread madhu phatak
Hi,
 Set "mapred.local.dir" in mapred-site.xml to point a directory on /mnt so
that it will not use ec2 instance EBS.

On Tue, May 22, 2012 at 6:58 PM, Mark Kerzner wrote:

> Hi,
>
> I am using a Hadoop cluster of my own construction on EC2, and I am running
> out of hard drive space with maps. If I knew which directories are used by
> Hadoop for map spill, I could use the large ephemeral drive on EC2 machines
> for that. Otherwise, I would have to keep increasing my available hard
> drive on root, and that's not very smart.
>
> Thank you. The error I get is below.
>
> Sincerely,
> Mark
>
>
>
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> any valid local directory for output/file.out
>at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:376)
>at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
>at
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
>at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1495)
>at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1180)
>at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)
>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs
> java.io.IOException: Spill failed
>at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:886)
>at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574)
>at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>at
> org.frd.main.ZipFileProcessor.emitAsMap(ZipFileProcessor.java:279)
>at
> org.frd.main.ZipFileProcessor.processWithTrueZip(ZipFileProcessor.java:107)
>at
> org.frd.main.ZipFileProcessor.process(ZipFileProcessor.java:55)
>at org.frd.main.Map.map(Map.java:70)
>at org.frd.main.Map.map(Map.java:24)
>at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:415)
>at org.apache.hadoop.security.UserGroupInformation.doAs(User
> java.io.IOException: Spill failed
>at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:886)
>at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574)
>at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>at
> org.frd.main.ZipFileProcessor.emitAsMap(ZipFileProcessor.java:279)
>at
> org.frd.main.ZipFileProcessor.processWithTrueZip(ZipFileProcessor.java:107)
>at
> org.frd.main.ZipFileProcessor.process(ZipFileProcessor.java:55)
>at org.frd.main.Map.map(Map.java:70)
>at org.frd.main.Map.map(Map.java:24)
>at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:415)
>at org.apache.hadoop.security.UserGroupInformation.doAs(User
> org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: EEXIST: File
> exists
>at
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:178)
>at
> org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:292)
>at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:365)
>at org.apache.hadoop.mapred.Child$4.run(Child.java:272)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:415)
>at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
>at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: EEXIST: File exists
>at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
>at
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:172)
>... 7 more
>



-- 
https://github.com/zinnia-phatak-d

Re: Bad connect ack with firstBadLink

2012-05-07 Thread madhu phatak
Hi,
 Increasing the open file limit solved the issue. Thank you.

On Fri, May 4, 2012 at 9:39 PM, Mapred Learn  wrote:

> Check your number of blocks in the cluster.
>
> This also indicates that your datanodes are more full than they should be.
>
> Try deleting unnecessary blocks.
>
> On Fri, May 4, 2012 at 7:40 AM, Mohit Anchlia  >wrote:
>
> > Please see:
> >
> > http://hbase.apache.org/book.html#dfs.datanode.max.xcievers
> >
> > On Fri, May 4, 2012 at 5:46 AM, madhu phatak 
> wrote:
> >
> > > Hi,
> > > We are running a three node cluster . From two days whenever we copy
> file
> > > to hdfs , it is throwing  java.IO.Exception Bad connect ack with
> > > firstBadLink . I searched in net, but not able to resolve the issue.
> The
> > > following is the stack trace from datanode log
> > >
> > > 2012-05-04 18:08:08,868 INFO
> > > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> > > blk_-7520371350112346377_50118 received exception
> > java.net.SocketException:
> > > Connection reset
> > > 2012-05-04 18:08:08,869 ERROR
> > > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> > > 172.23.208.17:50010,
> > > storageID=DS-1340171424-172.23.208.17-50010-1334672673051,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > > java.net.SocketException: Connection reset
> > >at java.net.SocketInputStream.read(SocketInputStream.java:168)
> > >at
> java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> > >at
> java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> > >at java.io.DataInputStream.read(DataInputStream.java:132)
> > >at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
> > >at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
> > >at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
> > >at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
> > >at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
> > >at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
> > >at java.lang.Thread.run(Thread.java:662)
> > >
> > >
> > > It will be great if some one can point to the direction how to solve
> this
> > > problem.
> > >
> > > --
> > > https://github.com/zinnia-phatak-dev/Nectar
> > >
> >
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Bad connect ack with firstBadLink

2012-05-04 Thread madhu phatak
Hi,
We are running a three node cluster . From two days whenever we copy file
to hdfs , it is throwing  java.IO.Exception Bad connect ack with
firstBadLink . I searched in net, but not able to resolve the issue. The
following is the stack trace from datanode log

2012-05-04 18:08:08,868 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-7520371350112346377_50118 received exception java.net.SocketException:
Connection reset
2012-05-04 18:08:08,869 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
172.23.208.17:50010,
storageID=DS-1340171424-172.23.208.17-50010-1334672673051, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:662)


It will be great if some one can point to the direction how to solve this
problem.

-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: EOFException

2012-05-01 Thread madhu phatak
Hi,
 In write method ,use writeInt() rather than write method. It should solve
your problem.

On Mon, Apr 30, 2012 at 10:40 PM, Keith Thompson wrote:

> I have been running several MapReduce jobs on some input text files. They
> were working fine earlier and then I suddenly started getting EOFException
> every time. Even the jobs that ran fine before (on the exact same input
> files) aren't running now. I am a bit perplexed as to what is causing this
> error. Here is the error:
>
> 12/04/30 12:55:55 INFO mapred.JobClient: Task Id :
> attempt_201202240659_6328_m_01_1, Status : FAILED
> java.lang.RuntimeException: java.io.EOFException
>at
>
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
>at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:967)
>at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:30)
>at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:83)
>at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
>at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1253)
>at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
>at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:396)
>at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.io.EOFException
>at java.io.DataInputStream.readInt(DataInputStream.java:375)
>at com.xerox.twitter.bin.UserTime.readFields(UserTime.java:31)
>at
>
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
>
> Since the compare function seems to be involved, here is my custom key
> class. Note: I did not include year in the key because all keys have the
> same year.
>
> public class UserTime implements WritableComparable {
>
> int id, month, day, year, hour, min, sec;
>  public UserTime() {
>
> }
>  public UserTime(int u, int mon, int d, int y, int h, int m, int s) {
> id = u;
> month = mon;
> day = d;
> year = y;
> hour = h;
> min = m;
> sec = s;
> }
>  @Override
> public void readFields(DataInput in) throws IOException {
> // TODO Auto-generated method stub
> id = in.readInt();
> month = in.readInt();
> day = in.readInt();
> year = in.readInt();
> hour = in.readInt();
> min = in.readInt();
> sec = in.readInt();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
> // TODO Auto-generated method stub
> out.write(id);
> out.write(month);
> out.write(day);
> out.write(year);
> out.write(hour);
> out.write(min);
> out.write(sec);
> }
>
> @Override
> public int compareTo(UserTime that) {
> // TODO Auto-generated method stub
> if(compareUser(that) == 0)
> return (compareTime(that));
> else if(compareUser(that) == 1)
> return 1;
> else return -1;
> }
>  private int compareUser(UserTime that) {
> if(id > that.id)
> return 1;
> else if(id == that.id)
> return 0;
> else return -1;
> }
>  //assumes all are from the same year
> private int compareTime(UserTime that) {
> if(month > that.month ||
> (month == that.month && day > that.day) ||
> (month == that.month && day == that.day && hour > that.hour) ||
> (month == that.month && day == that.day && hour == that.hour && min >
> that.min) ||
> (month == that.month && day == that.day && hour == that.hour && min ==
> that.min && sec > that.sec))
> return 1;
> else if(month == that.month && day == that.day && hour == that.hour && min
> == that.min && sec == that.sec)
> return 0;
> else return -1;
> }
>  public String toString() {
> String h, m, s;
> if(hour < 10)
> h = "0"+hour;
> else
> h = Integer.toString(hour);
> if(min < 10)
> m = "0"+min;
> else
> m = Integer.toString(hour);
> if(sec < 10)
> s = "0"+min;
> else
> s = Integer.toString(hour);
> return (id+"\t"+month+"/"+day+"/"+year+"\t"+h+":"+m+":"+s);
> }
> }
>
> Thanks for any help.
>
> Regards,
> Keith
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Hadoop and Ubuntu / Java

2012-04-19 Thread madhu phatak
As per Oracle, going forward openjdk will be official oracle jdk for linux
. Which means openjdk will be same as the official one.

On Tue, Dec 20, 2011 at 9:12 PM, hadoopman  wrote:

>
> http://www.omgubuntu.co.uk/**2011/12/java-to-be-removed-**
> from-ubuntu-uninstalled-from-**user-machines/
>
> I'm curious what this will mean for Hadoop on Ubuntu systems moving
> forward.  I've tried openJDK nearly two years ago with Hadoop.  Needless to
> say it was a real problem.
>
> Hopefully we can still download it from the Sun/Oracle web site and still
> use it.  Won't be the same though :/
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: getting UnknownHostException

2012-04-12 Thread madhu phatak
Please check contents of /etc/hosts for the hostname and ipaddress mapping.

On Thu, Apr 12, 2012 at 11:11 PM, Sujit Dhamale wrote:

> Hi Friends ,
> i am getting UnknownHostException while executing Hadoop Word count program
>
> getting below details from job tracker Web page
>
> *User:* sujit
> *Job Name:* word count
> *Job File:*
>
> hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204112234_0002/job.xml<
> http://localhost:50030/jobconf.jsp?jobid=job_201204112234_0002>
> *Submit Host:* sujit.(null)
> *Submit Host Address:* 127.0.1.1
> *Job-ACLs: All users are allowed*
> *Job Setup:*None
> *Status:* Failed
> *Failure Info:*Job initialization failed: java.net.UnknownHostException:
> sujit.(null) is not a valid Inet address at org.apache.hadoop.net.
> NetUtils.verifyHostnames(NetUtils.java:569) at
> org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:711) at
> org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4207) at
>
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> *Started at:* Wed Apr 11 22:36:46 IST 2012
> *Failed at:* Wed Apr 11 22:36:47 IST 2012
> *Failed in:* 0sec
> *Job Cleanup:*None
>
>
>
>
> Can some one help me how to resolve this issue .
> i tried with : http://wiki.apache.org/hadoop/UnknownHost
>
> but still not able to resolve issue ,
> please help me out .
>
>
> Hadoop Version: hadoop-1.0.1.tar.gz
> java version "1.6.0_30"
> Operating System : Ubuntu 11.10
>
>
> *Note *: All node were up before starting execution of Program
>
> Kind Regards
> Sujit Dhamale
> 
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Testing Map reduce code

2012-04-11 Thread madhu phatak
Hi,
 I am working on a Hadoop project where I want to make automated build to
run M/R test cases  on real hadoop cluster. As of now it seems we can only
unit test M/R through MiniDFSCluster /MiniMRCluster/MRUnit. None of this
runs the test cases on Hadoop cluster. Is any other framework or any other
way to make test cases to run on Hadoop cluster??

Thanks in Advance

-- 
https://github.com/zinnia-phatak-dev/Nectar


Cross join/product in Map/Reduce

2012-04-03 Thread madhu phatak
 Hi,
I am using the following code to generate cross product in hadoop.

package com.example.hadoopexamples.joinnew;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class JoinMapper extends Mapper {


private List inputWords;
private String secondFilePath ;
@Override
protected void setup(Context context) throws IOException,
InterruptedException {
// TODO Auto-generated method stub
secondFilePath = context.getConfiguration().get("secondFilePath");
inputWords = new ArrayList();

}

@Override
protected void map(LongWritable key, Text value,
Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
List inputWordList = getWords(value.toString());
inputWords.addAll(inputWordList);
}

@Override
protected void cleanup(Context context) throws IOException,
InterruptedException {
// TODO Auto-generated method stub
FileSystem fs = FileSystem.get(context.getConfiguration());
FSDataInputStream fsDataInputStream = fs.open(new Path(secondFilePath));
BufferedReader bufferedReader = new BufferedReader(new
InputStreamReader(fsDataInputStream));

String line;
while((line= bufferedReader.readLine())!=null)
{
System.out.println("inside while");
List words = getWords(line);
for(String word : words)
{
System.out.println("inside first loop");

for(String inputWord : inputWords)
{
if(!inputWord.equals(word))
{
Text pair = new Text(word+","+inputWord);
context.write(pair, NullWritable.get());
}
}
}
}


}

private List getWords(String inputLine)
{
List words = new ArrayList();
StringTokenizer stringTokenizer = new StringTokenizer(inputLine.toString());
while(stringTokenizer.hasMoreTokens())
{
String token = stringTokenizer.nextToken();
words.add(token);
}

return words;

}
}

*Driver class*
*
*
package com.example.hadoopexamples.joinnew;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class JoinTester
{
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException
{
Configuration configuration = new Configuration();
configuration.set("secondFilePath", args[1]);
Job job=new Job(configuration);
job.setMapperClass(JoinMapper.class);
job.setJarByClass(JoinTester.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(Reducer.class);
//job.setOutputValueGroupingComparator(FirstComparator.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
 }

}

It uses HDFS streaming of second data file .I got this idea from this
thread
http://search-hadoop.com/m/FNqzV1DrOEp/cross+product&subj=Re+Cross+Join. Is
this is a best practice or there is better way of doing cross product in
Hadoop?

-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Image Processing in Hadoop

2012-04-02 Thread madhu phatak
Hi  Shreya,
 Image files binary files . Use SequenceFile format to store the image in
hdfs and SequenceInputFormat to read the bytes . You can use TwoDWritable
to store matrix for image.
On Mon, Apr 2, 2012 at 3:36 PM, Sujit Dhamale wrote:

> Shreya  can u please Explain your scenario .
>
>
> On Mon, Apr 2, 2012 at 3:02 PM,  wrote:
>
> >
> >
> > Hi,
> >
> >
> >
> > Can someone point me to some info on Image processing using Hadoop?
> >
> >
> >
> > Regards,
> >
> > Shreya
> >
> >
> > This e-mail and any files transmitted with it are for the sole use of the
> > intended recipient(s) and may contain confidential and privileged
> > information.
> > If you are not the intended recipient, please contact the sender by reply
> > e-mail and destroy all copies of the original message.
> > Any unauthorized review, use, disclosure, dissemination, forwarding,
> > printing or copying of this email or any action taken in reliance on this
> > e-mail is strictly prohibited and may be unlawful.
> >
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: 0 tasktrackers in jobtracker but all datanodes present

2012-04-01 Thread madhu phatak
Hi,
1. Stop the job tracker and task trackers.  - bin/stop-mapred.sh

 2. Disable namenode safemode - bin/hadoop dfsadmin -safemode leave

3. Start the job tracker and tasktrackers again - bin/start-mapred.sh

On Fri, Jan 13, 2012 at 5:20 AM, Ravi Prakash  wrote:

> Courtesy Kihwal and Bobby
>
> Have you tried increasing the max heap size with -Xmx? and make sure that
> you have swap enabled.
>
> On Wed, Jan 11, 2012 at 6:59 PM, Gaurav Bagga  wrote:
>
> > Hi
> >
> > hadoop-0.19
> > I have a working hadoop cluster which has been running perfectly for
> > months.
> > But today after restarting the cluster, at jobtracker UI its showing
> state
> > INITIALIZING for a long time and is staying on the same state.
> > The nodes in jobtracker are zero whereas all the nodes are present on the
> > dfs.
> > It says Safe mode is on.
> > grep'ed on slaves and I see the tasktrackers running.
> >
> > In namenode logs i get the following error
> >
> >
> > 2012-01-11 16:50:57,195 WARN  ipc.Server - Out of Memory in server select
> > java.lang.OutOfMemoryError: Java heap space
> >at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39)
> >at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
> >at
> > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:804)
> >at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:400)
> >at org.apache.hadoop.ipc.Server$Listener.run(Server.java:309)
> >
> > Not sure why the cluster is not coming up
> > -G
> >
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Multiple linear Regression on Hadoop

2012-04-01 Thread madhu phatak
Hi ,
 Nectar already implemented  Multiple Linear Regression. You can look into
the code here  https://github.com/zinnia-phatak-dev/Nectar .

On Fri, Jan 13, 2012 at 11:24 AM, Saurabh Bajaj
wrote:

> Hi All,
>
> Could someone guide me how we can do a multiple linear regression on
> Hadoop.
> Mahout doesn't yet support Multiple Linear Regression.
>
> Saurabh Bajaj | Senior Business Analyst | +91 9986588089 |
> www.mu-sigma.com |
> ---Your problem isn't motivation, but execution - Peter Bregman---
>
>
> 
> This email message may contain proprietary, private and confidential
> information. The information transmitted is intended only for the person(s)
> or entities to which it is addressed. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited and may be illegal. If you received this in error, please
> contact the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic
> communications are free from viruses. However, given Internet
> accessibility, the Company cannot accept liability for any virus introduced
> by this e-mail or any attachment and you are advised to use up-to-date
> virus checking software.
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: input file order

2012-04-01 Thread madhu phatak
Hi,
 Mappers run in parallel. So without reducer it is not possible ensure the
sequence.

On Fri, Jan 20, 2012 at 2:32 AM, Mapred Learn wrote:

> This is my question too.
> What if I want output to be in same order as input without using reducers.
>
> Thanks,
> JJ
>
> Sent from my iPhone
>
> On Jan 19, 2012, at 12:19 PM, Ronald Petty  wrote:
>
> > Daniel,
> >
> > Can you provide a concrete example of what you mean by "output to be in
> an
> > orderly manner"?
> >
> > Also, what are the file sizes and types?
> >
> > Ron
> >
> > On Thu, Jan 19, 2012 at 11:19 AM, Daniel Yehdego
> > wrote:
> >
> >>
> >> Hi,
> >> I have 100 .txt input files and I want my mapper output to be in an
> >> orderly manner. I am not using any reducer.Any idea?
> >>
> >> Regards,
> >>
> >>
> >>
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Regarding Hadoop 1.0.0 release

2012-04-01 Thread madhu phatak
Hi ,
 Security features of 1.0.0. are same as 0.20.203 version .So you should be
able to find documentation under 0.20.203 version.

On Fri, Jan 20, 2012 at 4:03 PM, renuka  wrote:

>
>
> down vote
> favorite
> share [fb]
> share [tw]  Hadoop 1.0.0 is released in dec 2011. And its in Beta version.
>
> As per the below link security feature•Security (strong authentication via
> Kerberos authentication protocol) is added in hadoop 1.0.0 release.
> http://www.infoq.com/news/2012/01/apache-hadoop-1.0.0
>
> But we didnt find any documentation related to this in 1.0.0 documentation.
>
> http://hadoop.apache.org/common/docs/r1.0.0/
>
> Is there any documentation reg security feature available in 1.0.0 release
> and how to configure and use the same. Any inputs on this is greatly
> appreciated.
>
> Thanks MRK
>
>
> --
> View this message in context:
> http://hadoop-common.472056.n3.nabble.com/Regarding-Hadoop-1-0-0-release-tp3675071p3675071.html
> Sent from the Users mailing list archive at Nabble.com.
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: setting sample text-files error

2012-04-01 Thread madhu phatak
Hi,
 Can you tell which version of Hadoop you are using? It seems like
duplicate jars are on the classpath.

2012/1/23 Aleksandar Hudić 

> Hi
>
> I am trying to setup node and test the word count and I have a problem with
> last few steps.
>
> after I pack  classes in the jar and follow to the next step
>
> Assuming that:
>
> /home/ahudic/wordcount/input - input directory in HDFS
> /home/ahudic/wordcount/output - output directory in HDFS
> Sample text-files as input:
>
> $ bin/hadoop dfs -ls /home/ahudic/wordcount/input/
> /home/ahudic/wordcount/input/file01
> /home/ahudic/wordcount/input/file02
>
> I get this error:
>
>
> Exception in thread "main" java.lang.NoSuchMethodError:
>
> org.apache.commons.cli.OptionBuilder.withArgPattern(Ljava/lang/String;I)Lorg/apache/commons/cli/OptionBuilder;
>at
>
> org.apache.hadoop.util.GenericOptionsParser.buildGeneralOptions(GenericOptionsParser.java:181)
>at
>
> org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:341)
>at
>
> org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:136)
>at
>
> org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:121)
>at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)
>at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>at org.apache.hadoop.fs.FsShell.main(FsShell.java:1854)
>
> --
> *
> *
> *Mit freundlichen **Grüßen** / With kind regards*
> *
> *
> *Aleksandar Hudić, MSc
> aleksandar.hu...@gmail.com
> Mobile: +43 680 3303 660
>  **+385 98 871 710  ** *  **
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Hadoop fs custom commands

2012-04-01 Thread madhu phatak
Hi,
 All commands invoke FSShell.java code. As per my knowledge you have to
change the source code and build to support custom commands.

On Sun, Apr 1, 2012 at 2:11 PM, JAX  wrote:

> Hi guys : I wanted to make se custom Hadoop fs - commands.  Is this
> feasible/practical?  In particular. , I wanted to summarize file sizes and
> print some usefull estimated of things on the fly from My cluster.
>
> I'm not sure how
> The hadoop
> Shell commands are implemented... But I thought maybe there is a higher
> level
> Shell language or API which they might use that I can play with.?




-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Retail receipt analysis

2012-03-30 Thread madhu phatak
There is also Nectar. https://github.com/zinnia-phatak-dev/Nectar

On Sat, Feb 4, 2012 at 12:49 AM, praveenesh kumar wrote:

> You can also use R-hadoop package that allows you to run R statistical
> algos on hadoop.
>
> Thanks,
> Praveenesh
>
> On Fri, Feb 3, 2012 at 10:54 PM, Harsh J  wrote:
>
> > You may want to check out Apache Mahout: http://mahout.apache.org
> >
> > On Fri, Feb 3, 2012 at 10:31 PM, Fabio Pitzolu 
> > wrote:
> > > Hello everyone,
> > > I've been asked to prepare a small project for a client, which involves
> > the
> > > use of machine learning algorithms, correlation and clustering, in
> order
> > to
> > > analyse a big amount of text-format receipts.
> > > I wasn't able to find on the internet some examples of Hadoop
> > > implementation of these arguments, can you help me out?
> > >
> > > Thanks a lot!
> > >
> > > Fabio
> >
> >
> >
> > --
> > Harsh J
> > Customer Ops. Engineer
> > Cloudera | http://tiny.cloudera.com/about
> >
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: dynamic mapper?

2012-03-28 Thread madhu phatak
Hi,
 You can use java API's to compile custom java code and create jars. For
example , look at this code from Sqoop

/**
 * Licensed to Cloudera, Inc. under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  Cloudera, Inc. licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.cloudera.sqoop.orm;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import java.util.jar.JarOutputStream;
import java.util.zip.ZipEntry;

import javax.tools.JavaCompiler;
import javax.tools.JavaFileObject;
import javax.tools.StandardJavaFileManager;
import javax.tools.ToolProvider;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.mapred.JobConf;

import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.util.FileListing;

import com.cloudera.sqoop.util.Jars;

/**
 * Manages the compilation of a bunch of .java files into .class files
 * and eventually a jar.
 *
 * Also embeds this program's jar into the lib/ directory inside the
compiled
 * jar to ensure that the job runs correctly.
 */
public class CompilationManager {

  /** If we cannot infer a jar name from a table name, etc., use this. */
  public static final String DEFAULT_CODEGEN_JAR_NAME =
  "sqoop-codegen-created.jar";

  public static final Log LOG = LogFactory.getLog(
  CompilationManager.class.getName());

  private SqoopOptions options;
  private List sources;

  public CompilationManager(final SqoopOptions opts) {
options = opts;
sources = new ArrayList();
  }

  public void addSourceFile(String sourceName) {
sources.add(sourceName);
  }

  /**
   * locate the hadoop-*-core.jar in $HADOOP_HOME or --hadoop-home.
   * If that doesn't work, check our classpath.
   * @return the filename of the hadoop-*-core.jar file.
   */
  private String findHadoopCoreJar() {
String hadoopHome = options.getHadoopHome();

if (null == hadoopHome) {
  LOG.info("$HADOOP_HOME is not set");
  return Jars.getJarPathForClass(JobConf.class);
}

if (!hadoopHome.endsWith(File.separator)) {
  hadoopHome = hadoopHome + File.separator;
}

File hadoopHomeFile = new File(hadoopHome);
LOG.info("HADOOP_HOME is " + hadoopHomeFile.getAbsolutePath());
File [] entries = hadoopHomeFile.listFiles();

if (null == entries) {
  LOG.warn("HADOOP_HOME appears empty or missing");
  return Jars.getJarPathForClass(JobConf.class);
}

for (File f : entries) {
  if (f.getName().startsWith("hadoop-")
  && f.getName().endsWith("-core.jar")) {
LOG.info("Found hadoop core jar at: " + f.getAbsolutePath());
return f.getAbsolutePath();
  }
}

return Jars.getJarPathForClass(JobConf.class);
  }

  /**
   * Compile the .java files into .class files via embedded javac call.
   * On success, move .java files to the code output dir.
   */
  public void compile() throws IOException {
List args = new ArrayList();

// ensure that the jar output dir exists.
String jarOutDir = options.getJarOutputDir();
File jarOutDirObj = new File(jarOutDir);
if (!jarOutDirObj.exists()) {
  boolean mkdirSuccess = jarOutDirObj.mkdirs();
  if (!mkdirSuccess) {
LOG.debug("Warning: Could not make directories for " + jarOutDir);
  }
} else if (LOG.isDebugEnabled()) {
  LOG.debug("Found existing " + jarOutDir);
}

// Make sure jarOutDir ends with a '/'.
if (!jarOutDir.endsWith(File.separator)) {
  jarOutDir = jarOutDir + File.separator;
}

// find hadoop-*-core.jar for classpath.
String coreJar = findHadoopCoreJar();
if (null == coreJar) {
  // Couldn't find a core jar to insert into the CP for compilation.
If,
  // however, we're running this from a unit test, then the path to the
  // .class files might be set via the hadoop.alt.classpath property
  // instead. Check there first.
  String coreClassesPath = System.getProperty("hadoop.alt.classpath");
  if (null == coreClassesPath) {
// no -- we're out of options. Fail.
throw new IOException("Could not find hadoop core jar!");
  } else {
coreJar = coreClassesPath;
  }
}

// find sqoop jar for compilation

Re: Cannot renew lease for DFSClient_977492582. Name node is in safe mode in AWS

2012-03-28 Thread madhu phatak
Hi Mohit,
 HDFS is in safe mode which is read only mod. Run the following command to
get out of safemode

 bin/hadoop dfsadmin -safemode leave.

On Thu, Mar 15, 2012 at 5:54 AM, Mohit Anchlia wrote:

>  When I run client to create files in amazon HDFS I get this error. Does
> anyone know what it really means and how to resolve this?
>
> ---
>
>
> 2012-03-14 23:16:21,414 INFO org.apache.hadoop.ipc.Server (IPC Server
> handler 46 on 9000): IPC Server handler 46 on 9000, call
> renewLease(DFSClient_977492582) from 10.70.150.119:47240: error:
> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renew
> lease for DFSClient_977492582. Name node is in safe mode.
>
> The ratio of reported blocks 1. has reached the threshold 0.9990. Safe
> mode will be turned off automatically in 0 seconds.
>
> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renew
> lease for DFSClient_977492582. Name node is in safe mode.
>
> The ratio of reported blocks 1. has reached the threshold 0.9990. Safe
> mode will be turned off automatically in 0 seconds.
>
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewLease(FSNamesystem.java:2296)
>
> at
>
> org.apache.hadoop.hdfs.server.namenode.NameNode.renewLease(NameNode.java:814)
>
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Using hftp inside a servelet container

2012-03-20 Thread madhu phatak
Hi,
 I am trying to access files hdfs through hftp.  When i run following code
from eclipse it works fine.
FsUrlStreamHandlerFactory factory =
new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
java.net.URL.setURLStreamHandlerFactory(factory);

URL hdfs = new URL("hdfs:///user/hadoop/");
BufferedReader in = new BufferedReader(
new InputStreamReader(hdfs.openStream()));

String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();

But when i run the same code from a JSP page(tomcat) I get following error

*java.lang.Error: factory already defined
at java.net.URL.setURLStreamHandlerFactory(URL.java:1074)

*Is anyone tried to run hftp from tomcat?? Any ideas on how to resolve
this issue?


-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Very strange Java Collection behavior in Hadoop

2012-03-20 Thread madhu phatak
Thanks a lot :)

On Tue, Mar 20, 2012 at 11:50 AM, Owen O'Malley  wrote:

> On Mon, Mar 19, 2012 at 11:05 PM, madhu phatak 
> wrote:
>
> > Hi Owen O'Malley,
> >  Thank you for that Instant reply. It's working now. Can you explain me
> > what you mean by "input to reducer is reused" in little detail?
>
>
> Each time the statement "Text value = values.next();" is executed it always
> returns the same Text object with the contents of that object changed. When
> you add the Text to the list, you are adding a pointer to the same Text
> object. At the end you have 6 copies of the same pointer instead of 6
> different Text objects.
>
> The reason that I said it is my fault, is because I added the optimization
> that causes it. If you are interested in Hadoop archeology, it was
> HADOOP-2399 that made the change. I also did HADOOP-3522 to improve the
> documentation in the area.
>
> -- Owen
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Very strange Java Collection behavior in Hadoop

2012-03-19 Thread madhu phatak
Hi Owen O'Malley,
 Thank you for that Instant reply. It's working now. Can you explain me
what you mean by "input to reducer is reused" in little detail?

On Tue, Mar 20, 2012 at 11:28 AM, Owen O'Malley  wrote:

> On Mon, Mar 19, 2012 at 10:52 PM, madhu phatak 
> wrote:
>
> > Hi All,
> >  I am using Hadoop 0.20.2 . I am observing a Strange behavior of Java
> > Collection's . I have following code in reducer
>
>
> That is my fault. *sigh* The input to the reducer is reused. Replace:
>
> list.add(value);
>
> with:
>
> list.add(new Text(value));
>
> and the problem will go away.
>
> -- Owen
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Very strange Java Collection behavior in Hadoop

2012-03-19 Thread madhu phatak
Hi All,
 I am using Hadoop 0.20.2 . I am observing a Strange behavior of Java
Collection's . I have following code in reducer

   public void reduce(Text text, Iterator values,
OutputCollector collector, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub
List list = new ArrayList();
while(values.hasNext())
{
Text value = values.next();
list.add(value);
System.out.println(value.toString());   }

for(Text value : list)
{
System.out.println(value.toString());
}

}

The  first sysout prints following

4   5   6

1   2   3

But when I print from the List, it prints following

1   2   3
1   2   3

*
*The all List values are getting replaced by last added value.

I am not able to understand this behavior. Did anyone seen this behavior ?

Regards,
Madhukara Phatak

-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: EOFException

2012-03-19 Thread madhu phatak
Hi,
 Seems like HDFS is in safemode.

On Fri, Mar 16, 2012 at 1:37 AM, Mohit Anchlia wrote:

> This is actually just hadoop job over HDFS. I am assuming you also know why
> this is erroring out?
>
> On Thu, Mar 15, 2012 at 1:02 PM, Gopal  wrote:
>
> >  On 03/15/2012 03:06 PM, Mohit Anchlia wrote:
> >
> >> When I start a job to read data from HDFS I start getting these errors.
> >> Does anyone know what this means and how to resolve it?
> >>
> >> 2012-03-15 10:41:31,402 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Exception in createBlockOutputStream 164.28.62.204:50010java.io.**
> >> EOFException
> >> 2012-03-15 10:41:31,402 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Abandoning block blk_-6402969611996946639_11837
> >> 2012-03-15 10:41:31,403 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Excluding datanode 164.28.62.204:50010
> >> 2012-03-15 10:41:31,406 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Exception in createBlockOutputStream 164.28.62.198:50010java.io.**
> >> EOFException
> >> 2012-03-15 10:41:31,406 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Abandoning block blk_-5442664108986165368_11838
> >> 2012-03-15 10:41:31,407 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Exception in createBlockOutputStream 164.28.62.197:50010java.io.**
> >> EOFException
> >> 2012-03-15 10:41:31,407 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Abandoning block blk_-3373089616877234160_11838
> >> 2012-03-15 10:41:31,407 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Excluding datanode 164.28.62.198:50010
> >> 2012-03-15 10:41:31,409 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Excluding datanode 164.28.62.197:50010
> >> 2012-03-15 10:41:31,410 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Exception in createBlockOutputStream 164.28.62.204:50010java.io.**
> >> EOFException
> >> 2012-03-15 10:41:31,410 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Abandoning block blk_4481292025401332278_11838
> >> 2012-03-15 10:41:31,411 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Excluding datanode 164.28.62.204:50010
> >> 2012-03-15 10:41:31,412 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Exception in createBlockOutputStream 164.28.62.200:50010java.io.**
> >> EOFException
> >> 2012-03-15 10:41:31,412 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Abandoning block blk_-5326771177080888701_11838
> >> 2012-03-15 10:41:31,413 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Excluding datanode 164.28.62.200:50010
> >> 2012-03-15 10:41:31,414 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Exception in createBlockOutputStream 164.28.62.197:50010java.io.**
> >> EOFException
> >> 2012-03-15 10:41:31,414 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Abandoning block blk_-8073750683705518772_11839
> >> 2012-03-15 10:41:31,415 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Excluding datanode 164.28.62.197:50010
> >> 2012-03-15 10:41:31,416 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Exception in createBlockOutputStream 164.28.62.199:50010java.io.**
> >> EOFException
> >> 2012-03-15 10:41:31,416 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Exception in createBlockOutputStream 164.28.62.198:50010java.io.**
> >> EOFException
> >> 2012-03-15 10:41:31,416 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Abandoning block blk_441003866688859169_11838
> >> 2012-03-15 10:41:31,416 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Abandoning block blk_-466858474055876377_11839
> >> 2012-03-15 10:41:31,417 [Thread-5] INFO
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >> Excluding datanode 164.28.62.198:50010
> >> 2012-03-15 10:41:31,417 [Thread-5] WARN
>  org.apache.hadoop.hdfs.**DFSClient
> >> -
> >>
> >>
> > Try shutting down and  restarting hbase.
> >
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Can I start a Hadoop job from an EJB?

2012-03-08 Thread madhu phatak
Yes you can . Please make sure all Hadoop jars and conf directory is in
classpath.

On Thu, Feb 9, 2012 at 7:02 AM, Sanjeev Verma wrote:

> This is based on my understanding and no real life experience, so going to
> go out on a limb here :-)...assuming that you are planning on kicking off
> this map-reduce job based on a event of sorts (a file arrived and is ready
> to be processed?), and no direct "user wait" is involved, then yes, I would
> imagine you should be able to do something like this from inside a MDB
> (asynchronous so no one is held up in queue). Some random thoughts:
>
> 1. The user under which the app server is running will need to be a setup
> as a hadoop client user - this is rather obvious, just wanted to list it
> for completeness.
> 2. Hadoop, AFAIK, does not support transactions, and no XA. I assume you
> have no need for any of that stuff either.
> 3. Your MDB could potentially log job start/end times, but that info is
> available from Hadoop's monitoring infrastructure also.
>
> I would be very interested in hearing what senior members on the list have
> to say...
>
> HTH
>
> Sanjeev
>
> On Wed, Feb 8, 2012 at 2:18 PM, Andy Doddington 
> wrote:
>
> > OK, I have a working Hadoop application that I would like to integrate
> > into an application
> > server environment. So, the question arises: can I do this? E.g. can I
> > create a JobClient
> > instance inside an EJB and run it in the normal way, or is something more
> > complex
> > required? In addition, are there any unpleasant interactions between the
> > application
> > server and the hadoop runtime?
> >
> > Thanks for any guidance.
> >
> >Andy D.
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Standalone operation - file permission, Pseudo-Distributed operation - no output

2012-03-08 Thread madhu phatak
Hi,
Just make sure both task tracker and data node is up. Go to localhost:50030
and see is it shows no.of nodes equal to 1?

On Thu, Feb 9, 2012 at 9:18 AM, Kyong-Ho Min wrote:

> Hello,
>
> I am a hadoop newbie and I have 2 questions.
>
> I followed Hadoop standalone mode testing.
> I got error message from Cygwin terminal  like file permission error.
> I checked out mailing list and changed the part in RawLocalFileSystem.java
> but not working.
> Still I have file permission error in the directory:
> c:/tmp/hadoop../mapred/staging...
>
>
> I followed instruction about Pseudo-Distributed operation.
> Ssh is OK and namenode -format is OK.
> But it did not return any results and the processing is just halted.
> The Cygwin console scripts are
>
> -
> $ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
> 12/02/09 14:25:44 INFO mapred.FileInputFormat: Total input paths to
> process : 17
> 12/02/09 14:25:44 INFO mapred.JobClient: Running job: job_201202091423_0001
> 12/02/09 14:25:45 INFO mapred.JobClient:  map 0% reduce 0%
> -
>
> Any help pls.
> Thanks.
>
> Kyongho Min
>



-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Regression on Hadoop ?

2012-03-08 Thread madhu phatak
Hi,
 You can look here https://github.com/zinnia-phatak-dev/Nectar

On Thu, Feb 9, 2012 at 12:09 PM, praveenesh kumar wrote:

> Guys,
>
> Is there any regression API/tool that is developed on top of hadoop *(APART
> from mahout) *?
>
> Thanks,
> Praveenesh
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: How to verify all my master/slave name/data nodes have been configured correctly?

2012-03-08 Thread madhu phatak
Hi,
 Use the JobTracker WEB UI at master:50030 and Namenode web UI at
master:50070.

On Fri, Feb 10, 2012 at 9:03 AM, Wq Az  wrote:

> Hi,
> Is there a quick way to check this?
> Thanks ahead,
> Will
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Custom Seq File Loader: ClassNotFoundException

2012-03-04 Thread madhu phatak
Hi,
 Please make sure that your CustomWritable has a default constructor.

On Sat, Mar 3, 2012 at 4:56 AM, Mark question  wrote:

> Hello,
>
>   I'm trying to debug my code through eclipse, which worked fine with
> given Hadoop applications (eg. wordcount), but as soon as I run it on my
> application with my custom sequence input file/types, I get:
> Java.lang.runtimeException.java.ioException (Writable name can't load
> class)
> SequenceFile$Reader.getValeClass(Sequence File.class)
>
> because my valueClass is customed. In other words, how can I add/build my
> CustomWritable class to be with hadoop LongWritable,IntegerWritable 
> etc.
>
> Did anyone used eclipse?
>
> Mark
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Custom data structures in Hadoop

2012-03-04 Thread madhu phatak
Hi,
 I have a following issue in Hadoop 0.20.2. When i try to use inheritance
with WritableComparables the job is failing. Example If i create a base
writable called as shape

  public abstract class ShapeWritable implements WritableComparable
  {

  }

 Then extend this for a concrete class called CircleWritable . Now if my
mapper output is set as ShapeWritable in job configuration  and I write a
CircleWriable in collect.write() , the map fails with class mismatch. When
i looked into source code of MapTask.java , i saw the following code

*public** **synchronized** **void* collect(K key, V value,* **int* partition

 )* **throws* IOException {

  reporter.progress();

 * **if* (key.getClass() != keyClass) {

   * **throw** **new* IOException("Type mismatch in key from map:
expected "

  + keyClass.getName() + ", recieved "

  + key.getClass().getName());

  }

 * **if* (value.getClass() != valClass) {

   * **throw** **new* IOException("Type mismatch in value from map:
expected "

  + valClass.getName() + ", recieved "

  + value.getClass().getName());

  }
Here , we are matching directly classes which means inheritance don't work.
Can anyone tell why its implemented in this way? Is it changed in newer
version of Hadoop?


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: DFSIO

2012-03-01 Thread madhu phatak
Hi Harsha,
 Sorry i read DFSIO as DFS Input/Output which i thought reading and writing
using HDFS API:)

On Fri, Mar 2, 2012 at 12:32 PM, Harsh J  wrote:

> Madhu,
>
> That is incorrect. TestDFSIO is a MapReduce job and you need HDFS+MR
> setup to use it.
>
> On Fri, Mar 2, 2012 at 11:07 AM, madhu phatak 
> wrote:
> > Hi,
> >  Only HDFS should be enough.
> >
> > On Fri, Nov 25, 2011 at 1:45 AM, Thanh Do  wrote:
> >
> >> hi all,
> >>
> >> in order to run DFSIO in my cluster,
> >> do i need to run JobTracker, and TaskTracker,
> >> or just running HDFS is enough?
> >>
> >> Many thanks,
> >> Thanh
> >>
> >
> >
> >
> > --
> > Join me at http://hadoopworkshop.eventbrite.com/
>
>
>
> --
> Harsh J
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: DFSIO

2012-03-01 Thread madhu phatak
Hi,
 Only HDFS should be enough.

On Fri, Nov 25, 2011 at 1:45 AM, Thanh Do  wrote:

> hi all,
>
> in order to run DFSIO in my cluster,
> do i need to run JobTracker, and TaskTracker,
> or just running HDFS is enough?
>
> Many thanks,
> Thanh
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Reducer NullPointerException

2012-03-01 Thread madhu phatak
Hi,
 It seems like you trying to run only the reducer without a mapper. Can you
share main() method code which you trying to run?

On Mon, Jan 23, 2012 at 11:43 AM, burakkk  wrote:

> Hello everyone,
> I have 3 server(1 master, 2 slave) and I installed cdh3u2 on each
> server. I execute simple wordcount example but reducer had a
> NullPointerException. How can i solve this problem?
>
> The error log is that:
> Error: java.lang.NullPointerException
>   at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:
> 768)
>   at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2806)
>   at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $GetMapEventsThread.run(ReduceTask.java:2733)
>
> Error: java.lang.NullPointerException
>   at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:
> 768)
>   at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2806)
>   at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $GetMapEventsThread.run(ReduceTask.java:2733)
>
> Error: java.lang.NullPointerException
>   at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:
> 768)
>   at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2806)
>   at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $GetMapEventsThread.run(ReduceTask.java:2733)
>
> Error: java.lang.NullPointerException
>   at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:
> 768)
>   at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2806)
>   at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $GetMapEventsThread.run(ReduceTask.java:2733)
>
>
> Thanks
> Best Regards
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Where Is DataJoinMapperBase?

2012-03-01 Thread madhu phatak
Hi,
 Please look inside $HADOOP_HOME/contrib/datajoin folder of 0.20.2 version.
You will find the jar.

On Sat, Feb 11, 2012 at 1:09 AM, Bing Li  wrote:

> Hi, all,
>
> I am starting to learn advanced Map/Reduce. However, I cannot find the
> class DataJoinMapperBase in my downloaded Hadoop 1.0.0 and 0.20.2. So I
> searched on the Web and get the following link.
>
> http://www.java2s.com/Code/Jar/h/Downloadhadoop0201datajoinjar.htm
>
> From the link I got the package, hadoop-0.20.1-datajoin.jar. My question is
> why the package is not included in Hadoop 1.0.0 and 0.20.2? Is the correct
> way to get it?
>
> Thanks so much!
>
> Best regards,
> Bing
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: "Browse the filesystem" weblink broken after upgrade to 1.0.0: HTTP 404 "Problem accessing /browseDirectory.jsp"

2012-03-01 Thread madhu phatak
On Wed, Feb 29, 2012 at 11:34 PM, W.P. McNeill  wrote:

> I can do perform HDFS operations from the command line like "hadoop fs -ls
> /". Doesn't that meant that the datanode is up?
>

  No. It is just meta data lookup which comes from Namenode. Try to cat
some file like "hadoop fs -cat " . Then if you are able get data then
datanode should be up . Also make sure that hdfs is not in safemode . To
turnoff safemode use hdfs command "hadoop dfsadmin -safemode leave" and
then restart the jobtracker and tasktracker.



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: What determines task attempt list URLs?

2012-02-28 Thread madhu phatak
Hi,
 Its better to use the hos  tnames rather than the ipaddress. If you use
hostnames , task_attempt URL will contain the hostname rather than
localhost .

On Fri, Feb 17, 2012 at 10:52 PM, Keith Wiley  wrote:

> What property or setup parameter determines the URLs displayed on the task
> attempts webpage of the job/task trackers?  My cluster seems to be
> configured such that all URLs for higher pages (the top cluster admin page,
> the individual job overview page, and the map/reduce task list page) show
> URLs by ip address, but the lowest page (the task attempt list for a single
> task) shows the URLs for the Machine and Task Logs columns by "localhost",
> not by ip address (although the Counters column still uses the ip address
> just like URLs on all the higher pages).
>
> The "localhost" links obviously don't work (the cluster is not on the
> local machine, it's on Tier 3)...unless I just happen to have a cluster
> also running on my local machine; then the links work but obviously they go
> to my local machine and thus describe a completely unrelated Hadoop
> cluster!!!  It goes without saying, that's ridiculous.
>
> So to get it to work, I have to manually copy/paste the ip address into
> the URLs every time I want to view those pages...which makes it incredibly
> tedious to view the task logs.
>
> I've asked this a few times now and have gotten no response.  Does no one
> have any idea how to properly configure Hadoop to get around this?  I've
> experimented with the mapred-site.xml mapred.job.tracker and
> mapred.task.tracker.http.address properties to no avail.
>
> What's going on here?
>
> 
>
>
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>   --  Abe (Grandpa) Simpson
>
> 
>
>


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: namenode null pointer

2012-02-28 Thread madhu phatak
Hi,
 This may be the issue with namenode is not correctly formatted.

On Sat, Feb 18, 2012 at 1:50 PM, Ben Cuthbert  wrote:

> All sometimes when I startup my hadoop I get the following error
>
> 12/02/17 10:29:56 INFO namenode.NameNode: STARTUP_MSG:
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG: host =iMac.local/192.168.0.191
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.20.203.0
> STARTUP_MSG: build =
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203-r
>  1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
> /
> 12/02/17 10:29:56 WARN impl.MetricsSystemImpl: Metrics system not started:
> Cannot locate configuration: tried hadoop-metrics2-namenode.properties,
> hadoop-metrics2.properties
> 2012-02-17 10:29:56.994 java[4065:1903] Unable to load realm info from
> SCDynamicStore
> 12/02/17 10:29:57 INFO util.GSet: VM type = 64-bit
> 12/02/17 10:29:57 INFO util.GSet: 2% max memory = 17.77875 MB
> 12/02/17 10:29:57 INFO util.GSet: capacity = 2^21 = 2097152 entries
> 12/02/17 10:29:57 INFO util.GSet: recommended=2097152, actual=2097152
> 12/02/17 10:29:57 INFO namenode.FSNamesystem: fsOwner=scottsue
> 12/02/17 10:29:57 INFO namenode.FSNamesystem: supergroup=supergroup
> 12/02/17 10:29:57 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 12/02/17 10:29:57 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 12/02/17 10:29:57 INFO namenode.FSNamesystem: isAccessTokenEnabled=false
> accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
> 12/02/17 10:29:57 INFO namenode.FSNamesystem: Registered
> FSNamesystemStateMBean and NameNodeMXBean
> 12/02/17 10:29:57 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 12/02/17 10:29:57 INFO common.Storage: Number of files = 190
> 12/02/17 10:29:57 INFO common.Storage: Number of files under construction
> = 0
> 12/02/17 10:29:57 INFO common.Storage: Image file of size 26377 loaded in
> 0 seconds.
> 12/02/17 10:29:57 ERROR namenode.NameNode: java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1113)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1125)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1028)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:205)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:613)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1009)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:827)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:365)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:353)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:434)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)
>
> 12/02/17 10:29:57 INFO namenode.NameNode: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at iMac.local/192.168.0.191
> /




-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: "Browse the filesystem" weblink broken after upgrade to 1.0.0: HTTP 404 "Problem accessing /browseDirectory.jsp"

2012-02-28 Thread madhu phatak
Hi,
 Just make sure that Datanode is up. Looking into the datanode logs.

On Sun, Feb 19, 2012 at 10:52 PM, W.P. McNeill  wrote:

> I am running in pseudo-distributed on my Mac and just upgraded from
> 0.20.203.0 to 1.0.0. The web interface for HDFS which was working in
> 0.20.203.0 is broken in 1.0.0.
>
> HDFS itself appears to work: a command line like "hadoop fs -ls /" returns
> a result, and the namenode web interface at http://
> http://localhost:50070/dfshealth.jsp comes up. However, when I click on
> the
> "Browse the filesystem" link on this page I get a 404 Error. The error
> message displayed in the browser reads:
>
> Problem accessing /browseDirectory.jsp. Reason:
>/browseDirectory.jsp
>
> The URL in the browser bar at this point is "
> http://0.0.0.0:50070/browseDirectory.jsp?namenodeInfoPort=50070&dir=/";.
> The
> HTML source to the link on the main namenode page is  href="/nn_browsedfscontent.jsp">Browse the filesystem. If I change the
> server location from 0.0.0.0 to localhost in my browser bar I get the same
> error.
>
> I updated my configuration files in the new hadoop 1.0.0 conf directory to
> transfer over my settings from 0.20.203.0. My conf/slaves file consists of
> the line "localhost".  I ran "hadoop-daemon.sh start namenode -upgrade"
> once when prompted my errors in the namenode logs. After that all the
> namenode and datanode logs contain no errors.
>
> For what it's worth, I've verified that the bug occurs on Firefox, Chrome,
> and Safari.
>
> Any ideas on what is wrong or how I should go about further debugging it?
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: HDFS problem in hadoop 0.20.203

2012-02-28 Thread madhu phatak
Hi,
 Did you formatted the HDFS?

On Tue, Feb 21, 2012 at 7:40 PM, Shi Yu  wrote:

> Hi Hadoopers,
>
> We are experiencing a strange problem on Hadoop 0.20.203
>
> Our cluster has 58 nodes, everything is started from a fresh
> HDFS (we deleted all local folders on datanodes and
> reformatted the namenode).  After running some small jobs, the
> HDFS becomes behaving abnormally and the jobs become very
> slow.  The namenode log is crushed by Gigabytes of errors like
> is:
>
> 2012-02-21 00:00:38,632 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_4524177823306792294 is added
> to invalidSet of 10.105.19.31:50010
> 2012-02-21 00:00:38,632 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_4524177823306792294 is added
> to invalidSet of 10.105.19.18:50010
> 2012-02-21 00:00:38,632 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_4524177823306792294 is added
> to invalidSet of 10.105.19.32:50010
> 2012-02-21 00:00:38,632 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_2884522252507300332 is added
> to invalidSet of 10.105.19.35:50010
> 2012-02-21 00:00:38,632 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_2884522252507300332 is added
> to invalidSet of 10.105.19.27:50010
> 2012-02-21 00:00:38,632 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_2884522252507300332 is added
> to invalidSet of 10.105.19.33:50010
> 2012-02-21 00:00:38,632 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated:
> 10.105.19.21:50010 is added to blk_-
> 6843171124277753504_2279882 size 124490
> 2012-02-21 00:00:38,632 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
> /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
> 043_0013_m_000313_0/result_stem-m-00313. blk_-
> 6379064588594672168_2279890
> 2012-02-21 00:00:38,633 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated:
> 10.105.19.26:50010 is added to blk_5338983375361999760_2279887
> size 1476
> 2012-02-21 00:00:38,633 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated:
> 10.105.19.29:50010 is added to blk_-977828927900581074_2279887
> size 13818
> 2012-02-21 00:00:38,633 INFO
> org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
> 043_0013_m_000364_0/result_stem-m-00364 is closed by
> DFSClient_attempt_201202202043_0013_m_000364_0
> 2012-02-21 00:00:38,633 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated:
> 10.105.19.23:50010 is added to blk_5338983375361999760_2279887
> size 1476
> 2012-02-21 00:00:38,633 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated:
> 10.105.19.20:50010 is added to blk_5338983375361999760_2279887
> size 1476
> 2012-02-21 00:00:38,633 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
> /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
> 043_0013_m_000364_0/result_suffix-m-00364.
> blk_1921685366929756336_2279890
> 2012-02-21 00:00:38,634 INFO
> org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
> 043_0013_m_000279_0/result_suffix-m-00279 is closed by
> DFSClient_attempt_201202202043_0013_m_000279_0
> 2012-02-21 00:00:38,635 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_495061820035691700 is added
> to invalidSet of 10.105.19.20:50010
> 2012-02-21 00:00:38,635 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_495061820035691700 is added
> to invalidSet of 10.105.19.25:50010
> 2012-02-21 00:00:38,635 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_495061820035691700 is added
> to invalidSet of 10.105.19.33:50010
> 2012-02-21 00:00:38,635 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
> /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
> 043_0013_m_000284_0/result_stem-m-00284.
> blk_8796188324642771330_2279891
> 2012-02-21 00:00:38,638 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated:
> 10.105.19.34:50010 is added to blk_-977828927900581074_2279887
> size 13818
> 2012-02-21 00:00:38,638 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
> /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
> 043_0013_m_000296_0/result_stem-m-00296. blk_-
> 6800409224007034579_2279891
> 2012-02-21 00:00:38,638 INFO
> org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated:
> 10.105.19.29:50010 is added to blk_192168536692975

Re: Difference between hdfs dfs and hdfs fs

2012-02-28 Thread madhu phatak
Hi Mohit,
 FS is a generic filesystem which can point to any file systems like
LocalFileSystem,HDFS etc. But dfs is specific to HDFS. So when u use fs it
can copy from local file system to hdfs . But when u specify dfs src file
has to be on HDFS.

On Tue, Feb 21, 2012 at 10:46 PM, Mohit Anchlia wrote:

> What's the different between hdfs dfs and hdfs fs commands? When I run hdfs
> dfs -copyFromLocal /assa . and use pig it can't find it but when I use hdfs
> fs pig is able to find the file.
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Setting eclipse for map reduce using maven

2012-02-28 Thread madhu phatak
Hi,
 Find maven definition for Hadoop core jars
here-http://search.maven.org/#browse|-856937612
.

On Tue, Feb 21, 2012 at 10:48 PM, Mohit Anchlia wrote:

> I am trying to search for dependencies that would help me get started with
> developing map reduce in eclipse and I prefer to use maven for this.
>
> Could someone help me point to directions?
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: ClassNotFoundException: -libjars not working?

2012-02-28 Thread madhu phatak
Hi,
 -libjars doesn't always work.Better way is to create a runnable jar with
all dependencies ( if no of dependency is less) or u have to keep the jars
into the lib folder of the hadoop in all machines.

On Wed, Feb 22, 2012 at 8:13 PM, Ioan Eugen Stan wrote:

> Hello,
>
> I'm trying to run a map-reduce job and I get ClassNotFoundException, but I
> have the class submitted with -libjars. What's wrong with how I do things?
> Please help.
>
> I'm running hadoop-0.20.2-cdh3u1, and I have everithing on the -libjars
> line. The job is submitted via a java app like:
>
>  exec /usr/lib/jvm/java-6-sun/bin/**java -Dproc_jar -Xmx200m -server
> -Dhadoop.log.dir=/opt/ui/var/**log/mailsearch
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/**hadoop
> -Dhadoop.id.str=hbase -Dhadoop.root.logger=INFO,**console
> -Dhadoop.policy.file=hadoop-**policy.xml -classpath
> '/usr/lib/hadoop/conf:/usr/**lib/jvm/java-6-sun/lib/tools.**
> jar:/usr/lib/hadoop:/usr/lib/**hadoop/hadoop-core-0.20.2-**
> cdh3u1.jar:/usr/lib/hadoop/**lib/ant-contrib-1.0b3.jar:/**
> usr/lib/hadoop/lib/apache-**log4j-extras-1.1.jar:/usr/lib/**
> hadoop/lib/aspectjrt-1.6.5.**jar:/usr/lib/hadoop/lib/**
> aspectjtools-1.6.5.jar:/usr/**lib/hadoop/lib/commons-cli-1.**
> 2.jar:/usr/lib/hadoop/lib/**commons-codec-1.4.jar:/usr/**
> lib/hadoop/lib/commons-daemon-**1.0.1.jar:/usr/lib/hadoop/lib/**
> commons-el-1.0.jar:/usr/lib/**hadoop/lib/commons-httpclient-**
> 3.0.1.jar:/usr/lib/hadoop/lib/**commons-logging-1.0.4.jar:/**
> usr/lib/hadoop/lib/commons-**logging-api-1.0.4.jar:/usr/**
> lib/hadoop/lib/commons-net-1.**4.1.jar:/usr/lib/hadoop/lib/**
> core-3.1.1.jar:/usr/lib/**hadoop/lib/hadoop-**fairscheduler-0.20.2-cdh3u1.
> **jar:/usr/lib/hadoop/lib/**hsqldb-1.8.0.10.jar:/usr/lib/**
> hadoop/lib/jackson-core-asl-1.**5.2.jar:/usr/lib/hadoop/lib/**
> jackson-mapper-asl-1.5.2.jar:/**usr/lib/hadoop/lib/jasper-**
> compiler-5.5.12.jar:/usr/lib/**hadoop/lib/jasper-runtime-5.5.**
> 12.jar:/usr/lib/hadoop/lib
> /jcl-over-slf4j-1.6.1.jar:/**usr/lib/hadoop/lib/jets3t-0.6.**
> 1.jar:/usr/lib/hadoop/lib/**jetty-6.1.26.jar:/usr/lib/**
> hadoop/lib/jetty-servlet-**tester-6.1.26.jar:/usr/lib/**
> hadoop/lib/jetty-util-6.1.26.**jar:/usr/lib/hadoop/lib/jsch-**
> 0.1.42.jar:/usr/lib/hadoop/**lib/junit-4.5.jar:/usr/lib/**
> hadoop/lib/kfs-0.2.2.jar:/usr/**lib/hadoop/lib/log4j-1.2.15.**
> jar:/usr/lib/hadoop/lib/**mockito-all-1.8.2.jar:/usr/**
> lib/hadoop/lib/oro-2.0.8.jar:/**usr/lib/hadoop/lib/servlet-**
> api-2.5-20081211.jar:/usr/lib/**hadoop/lib/servlet-api-2.5-6.**
> 1.14.jar:/usr/lib/hadoop/lib/**slf4j-api-1.6.1.jar:/usr/lib/**
> hadoop/lib/slf4j-log4j12-1.6.**1.jar:/usr/lib/hadoop/lib/**
> xmlenc-0.52.jar:/usr/lib/**hadoop/lib/jsp-2.1/jsp-2.1.**
> jar:/usr/lib/hadoop/lib/jsp-2.**1/jsp-api-2.1.jar:/usr/share/**
> mailbox-convertor/lib/*:/usr/**lib/hadoop/contrib/capacity-**
> scheduler/hadoop-capacity-**scheduler-0.20.2-cdh3u1.jar:/**
> usr/lib/hbase/lib/hadoop-lzo-**0.4.13.jar:/usr/lib/hbase/**
> hbase.jar:/etc/hbase/conf:/**usr/lib/hbase/lib:/usr/lib/**
> zookeeper/zookeeper.jar:/usr/**lib/hadoop/contrib
> /capacity-scheduler/hadoop-**capacity-scheduler-0.20.2-**
> cdh3u1.jar:/usr/lib/hbase/lib/**hadoop-lzo-0.4.13.jar:/usr/**
> lib/hbase/hbase.jar:/etc/**hbase/conf:/usr/lib/hbase/lib:**
> /usr/lib/zookeeper/zookeeper.**jar' org.apache.hadoop.util.RunJar
> /usr/share/mailbox-convertor/**mailbox-convertor-0.1-**SNAPSHOT.jar
> -libjars=/usr/share/mailbox-**convertor/lib/antlr-2.7.7.jar,**
> /usr/share/mailbox-convertor/**lib/aopalliance-1.0.jar,/usr/**
> share/mailbox-convertor/lib/**asm-3.1.jar,/usr/share/**
> mailbox-convertor/lib/**backport-util-concurrent-3.1.**
> jar,/usr/share/mailbox-**convertor/lib/cglib-2.2.jar,/**
> usr/share/mailbox-convertor/**lib/hadoop-ant-3.0-u1.pom,/**
> usr/share/mailbox-convertor/**lib/speed4j-0.9.jar,/usr/**
> share/mailbox-convertor/lib/**jamm-0.2.2.jar,/usr/share/**
> mailbox-convertor/lib/uuid-3.**2.0.jar,/usr/share/mailbox-**
> convertor/lib/high-scale-lib-**1.1.1.jar,/usr/share/mailbox-**
> convertor/lib/jsr305-1.3.9.**jar,/usr/share/mailbox-**
> convertor/lib/guava-11.0.1.**jar,/usr/share/mailbox-**
> convertor/lib/protobuf-java-2.**4.0a.jar,/usr/share/mailbox-**
> convertor/lib/**concurrentlinkedhashmap-lru-1.**1.jar,/usr/share/mailbox-*
> *convertor/lib/json-simple-1.1.**jar,/usr/share/mailbox-**
> convertor/lib/itext-2.1.5.jar,**/usr/share/mailbox-convertor/**
> lib/jmxtools-1.2.1.jar,/usr/**share/mailbox-convertor/lib/**
> jersey-client-1.4.jar,/usr/**share/mailbox-converto
> r/lib/jersey-core-1.4.jar,/**usr/share/mailbox-convertor/**
> lib/jersey-json-1.4.jar,/usr/**share/mailbox-convertor/lib/**
> jersey-server-1.4.jar,/usr/**share/mailbox-convertor/lib/**
> jmxri-1.2.1.jar,/usr/share/**mailbox-convertor/lib/jaxb-**
> impl-2.1.12.jar,/usr/share/**mailbox-convertor/lib/xstream-**
> 1.2.2.jar,/usr/share/mailbox-**convertor/lib/commons-metrics-**
> 1.3.jar,/usr/share/mailbox-**convertor/lib/commons-**
> monitoring-2.9.1.jar,/us

Re: dfs.block.size

2012-02-28 Thread madhu phatak
You can use FileSystem.getFileStatus(Path p) which gives you the block size
specific to a file.

On Tue, Feb 28, 2012 at 2:50 AM, Kai Voigt  wrote:

> "hadoop fsck  -blocks" is something that I think of quickly.
>
> http://hadoop.apache.org/common/docs/current/commands_manual.html#fsckhas 
> more details
>
> Kai
>
> Am 28.02.2012 um 02:30 schrieb Mohit Anchlia:
>
> > How do I verify the block size of a given file? Is there a command?
> >
> > On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria 
> wrote:
> >
> >> dfs.block.size can be set per job.
> >>
> >> mapred.tasktracker.map.tasks.maximum is per tasktracker.
> >>
> >> -Joey
> >>
> >> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia  >
> >> wrote:
> >>> Can someone please suggest if parameters like dfs.block.size,
> >>> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
> >> can
> >>> these be set per client job configuration?
> >>>
> >>> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia  >>> wrote:
> >>>
>  If I want to change the block size then can I use Configuration in
>  mapreduce job and set it when writing to the sequence file or does it
> >> need
>  to be cluster wide setting in .xml files?
> 
>  Also, is there a way to check the block of a given file?
> 
> >>
> >>
> >>
> >> --
> >> Joseph Echeverria
> >> Cloudera, Inc.
> >> 443.305.9434
> >>
>
> --
> Kai Voigt
> k...@123.org
>
>
>
>
>


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Handling bad records

2012-02-28 Thread madhu phatak
Hi Mohit ,
 A and B refers to two different output files (multipart name). The file
names will be seq-A* and seq-B*.  Its similar to "r" in part-r-0

On Tue, Feb 28, 2012 at 11:37 AM, Mohit Anchlia wrote:

> Thanks that's helpful. In that example what is "A" and "B" referring to? Is
> that the output file name?
>
> mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye"));
> mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau"));
>
>
> On Mon, Feb 27, 2012 at 9:53 PM, Harsh J  wrote:
>
> > Mohit,
> >
> > Use the MultipleOutputs API:
> >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
> > to have a named output of bad records. There is an example of use
> > detailed on the link.
> >
> > On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia 
> > wrote:
> > > What's the best way to write records to a different file? I am doing
> xml
> > > processing and during processing I might come accross invalid xml
> format.
> > > Current I have it under try catch block and writing to log4j. But I
> think
> > > it would be better to just write it to an output file that just
> contains
> > > errors.
> >
> >
> >
> > --
> > Harsh J
> >
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Dynamically adding nodes in Hadoop

2012-01-03 Thread madhu phatak
Thanks for all the input. I am trying to do cluster setup in EC2 but not
able to find how i can do dns updation centrally. If anyone one knows how
to do this please help me ..

On Sat, Dec 17, 2011 at 8:10 PM, Michel Segel wrote:

> Actually I would recommend avoiding /etc/hosts and using DNS if this is
> going to be a production grade cluster...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Dec 17, 2011, at 5:40 AM, alo alt  wrote:
>
> > Hi,
> >
> > in the slave - file too. /etc/hosts is also recommend to avoid DNS
> > issues. After adding in slaves the new node has to be started and
> > should quickly appear in the web-ui. If you don't need the nodes all
> > time you can setup a exclude and refresh your cluster
> > (
> http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
> )
> >
> > - Alex
> >
> > On Sat, Dec 17, 2011 at 12:06 PM, madhu phatak 
> wrote:
> >> Hi,
> >>  I am trying to add nodes dynamically to a running hadoop cluster.I
> started
> >> tasktracker and datanode in the node. It works fine. But when some node
> >> try fetch values ( for reduce phase) it fails with unknown host
> exception.
> >> When i add a node to running cluster do i have to add its hostname to
> all
> >> nodes (slaves +master) /etc/hosts file? Or some other way is there?
> >>
> >>
> >> --
> >> Join me at http://hadoopworkshop.eventbrite.com/
> >
> >
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > P Think of the environment: please don't print this email unless you
> > really need to.
> >
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Dynamically adding nodes in Hadoop

2011-12-17 Thread madhu phatak
Hi,
 I am trying to add nodes dynamically to a running hadoop cluster.I started
tasktracker and datanode in the node. It works fine. But when some node
try fetch values ( for reduce phase) it fails with unknown host exception.
When i add a node to running cluster do i have to add its hostname to all
nodes (slaves +master) /etc/hosts file? Or some other way is there?


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: hadoop on fedora 15

2011-08-05 Thread madhu phatak
disable iptables and try again

On Fri, Aug 5, 2011 at 2:20 PM, Manish  wrote:

> Hi,
>
> Has anybody been able to run hadoop standalone mode on fedora 15 ?
> I have installed it correctly. It runs till map but gets stuck in reduce.
> It fails with the error "mapred.JobClient Status : FAILED Too many
> fetch-failures". I read several articles on net for this problem, all of
> them
> say about the /etc/hosts and some say firewall issue.
> I enabled firewall for the port range and also checked my /etc/hosts file.
> its
> content is "localhost" and that is the only line in it.
>
> Is sun-java absolute necessary or open-jdk will work ?
>
> can someone give me some suggestion to get along with this problem ?
>
> Thanks & regard
>
> Manish
>
>


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: ivy download error while building mumak

2011-08-03 Thread madhu phatak
May be maven is not able to connect to central repository because of proxy.

On Fri, Jul 29, 2011 at 2:54 PM, Arun K  wrote:

> Hi all !
>
>  I have downloaded hadoop-0.21.I am behind my college proxy.
>  I get the following error while building mumak :
>
> $cd /home/arun/Documents/hadoop-0.21.0/mapred
> $ant package
> Buildfile: build.xml
>
> clover.setup:
>
> clover.info:
> [echo]
> [echo]  Clover not found. Code coverage reports disabled.
> [echo]
>
> clover:
>
> ivy-download:
>  [get] Getting:
> http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-
> 2.1.0.jar
>  [get] To: /home/arun/Documents/hadoop-0.21.0/mapred/ivy/ivy-2.1.0.jar
>  [get] Error getting
> http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar to
> /home/arun/Documents/hadoop-0.21.0/mapred/ivy/ivy-2.1.0.jar
>
> Any help ?
>
> Thanks,
> Arun K
>
>


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Re: error:Type mismatch in value from map

2011-08-03 Thread madhu phatak
Sorry for earlier reply . Is your combiner outputting the Text,Text
key/value pairs?

On Wed, Aug 3, 2011 at 5:26 PM, madhu phatak  wrote:

> It should. Whats the input value class for reducer you are setting in Job?
>
> 2011/7/30 Daniel,Wu 
>
> Thanks Joey,
>>
>> It works, but one place I don't understand:
>>
>> 1: in the map
>>
>>  extends Mapper
>> so the output value is of type IntWritable
>> 2: in the reduce
>> extends Reducer
>> So input value is of type Text.
>>
>> type of map output should be the same as input type of reduce, correct?
>> but here
>> IntWritable<>Text
>>
>> And the code can run without any error, shouldn't it complain type
>> mismatch?
>>
>> At 2011-07-29 22:49:31,"Joey Echeverria"  wrote:
>> >If you want to use a combiner, your map has to output the same types
>> >as your combiner outputs. In your case, modify your map to look like
>> >this:
>> >
>> >  public static class TokenizerMapper
>> >   extends Mapper{
>> >public void map(Text key, Text value, Context context
>> >) throws IOException, InterruptedException {
>> >context.write(key, new IntWritable(1));
>> >}
>> >  }
>> >
>> >>  11/07/29 22:22:22 INFO mapred.JobClient: Task Id :
>> attempt_201107292131_0011_m_00_2, Status : FAILED
>> >> java.io.IOException: Type mismatch in value from map: expected
>> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
>> >>
>> >> But I already set IntWritable in 2 places,
>> >> 1: Reducer
>> >> 2:job.setOutputValueClass(IntWritable.class);
>> >>
>> >> So where am I wrong?
>> >>
>> >> public class MyTest {
>> >>
>> >>  public static class TokenizerMapper
>> >>   extends Mapper{
>> >>public void map(Text key, Text value, Context context
>> >>) throws IOException, InterruptedException {
>> >>context.write(key, value);
>> >>}
>> >>  }
>> >>
>> >>  public static class IntSumReducer
>> >>   extends Reducer {
>> >>
>> >>public void reduce(Text key, Iterable values,
>> >>   Context context
>> >>   ) throws IOException, InterruptedException {
>> >>   int count = 0;
>> >>   for (Text iw:values) {
>> >>count++;
>> >>   }
>> >>  context.write(key, new IntWritable(count));
>> >> }
>> >>   }
>> >>
>> >>  public static void main(String[] args) throws Exception {
>> >>Configuration conf = new Configuration();
>> >> // the configure of seprator should be done in conf
>> >>conf.set("key.value.separator.in.input.line", ",");
>> >>String[] otherArgs = new GenericOptionsParser(conf,
>> args).getRemainingArgs();
>> >>if (otherArgs.length != 2) {
>> >>  System.err.println("Usage: wordcount  ");
>> >>  System.exit(2);
>> >>}
>> >>Job job = new Job(conf, "word count");
>> >>job.setJarByClass(WordCount.class);
>> >>job.setMapperClass(TokenizerMapper.class);
>> >>job.setCombinerClass(IntSumReducer.class);
>> >> //job.setReducerClass(IntSumReducer.class);
>> >>job.setInputFormatClass(KeyValueTextInputFormat.class);
>> >>// job.set("key.value.separator.in.input.line", ",");
>> >>job.setOutputKeyClass(Text.class);
>> >>job.setOutputValueClass(IntWritable.class);
>> >>FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
>> >>FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
>> >>System.exit(job.waitForCompletion(true) ? 0 : 1);
>> >>  }
>> >> }
>> >>
>> >
>> >
>> >
>> >--
>> >Joseph Echeverria
>> >Cloudera, Inc.
>> >443.305.9434
>>
>
>
>
> --
> Join me at http://hadoopworkshop.eventbrite.com/
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Re: error:Type mismatch in value from map

2011-08-03 Thread madhu phatak
It should. Whats the input value class for reducer you are setting in Job?

2011/7/30 Daniel,Wu 

> Thanks Joey,
>
> It works, but one place I don't understand:
>
> 1: in the map
>
>  extends Mapper
> so the output value is of type IntWritable
> 2: in the reduce
> extends Reducer
> So input value is of type Text.
>
> type of map output should be the same as input type of reduce, correct? but
> here
> IntWritable<>Text
>
> And the code can run without any error, shouldn't it complain type
> mismatch?
>
> At 2011-07-29 22:49:31,"Joey Echeverria"  wrote:
> >If you want to use a combiner, your map has to output the same types
> >as your combiner outputs. In your case, modify your map to look like
> >this:
> >
> >  public static class TokenizerMapper
> >   extends Mapper{
> >public void map(Text key, Text value, Context context
> >) throws IOException, InterruptedException {
> >context.write(key, new IntWritable(1));
> >}
> >  }
> >
> >>  11/07/29 22:22:22 INFO mapred.JobClient: Task Id :
> attempt_201107292131_0011_m_00_2, Status : FAILED
> >> java.io.IOException: Type mismatch in value from map: expected
> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
> >>
> >> But I already set IntWritable in 2 places,
> >> 1: Reducer
> >> 2:job.setOutputValueClass(IntWritable.class);
> >>
> >> So where am I wrong?
> >>
> >> public class MyTest {
> >>
> >>  public static class TokenizerMapper
> >>   extends Mapper{
> >>public void map(Text key, Text value, Context context
> >>) throws IOException, InterruptedException {
> >>context.write(key, value);
> >>}
> >>  }
> >>
> >>  public static class IntSumReducer
> >>   extends Reducer {
> >>
> >>public void reduce(Text key, Iterable values,
> >>   Context context
> >>   ) throws IOException, InterruptedException {
> >>   int count = 0;
> >>   for (Text iw:values) {
> >>count++;
> >>   }
> >>  context.write(key, new IntWritable(count));
> >> }
> >>   }
> >>
> >>  public static void main(String[] args) throws Exception {
> >>Configuration conf = new Configuration();
> >> // the configure of seprator should be done in conf
> >>conf.set("key.value.separator.in.input.line", ",");
> >>String[] otherArgs = new GenericOptionsParser(conf,
> args).getRemainingArgs();
> >>if (otherArgs.length != 2) {
> >>  System.err.println("Usage: wordcount  ");
> >>  System.exit(2);
> >>}
> >>Job job = new Job(conf, "word count");
> >>job.setJarByClass(WordCount.class);
> >>job.setMapperClass(TokenizerMapper.class);
> >>job.setCombinerClass(IntSumReducer.class);
> >> //job.setReducerClass(IntSumReducer.class);
> >>job.setInputFormatClass(KeyValueTextInputFormat.class);
> >>// job.set("key.value.separator.in.input.line", ",");
> >>job.setOutputKeyClass(Text.class);
> >>job.setOutputValueClass(IntWritable.class);
> >>FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
> >>FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
> >>System.exit(job.waitForCompletion(true) ? 0 : 1);
> >>  }
> >> }
> >>
> >
> >
> >
> >--
> >Joseph Echeverria
> >Cloudera, Inc.
> >443.305.9434
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: How to access contents of a Map Reduce job's working directory

2011-08-03 Thread madhu phatak
Try to use FileSystem.copyToLocal API to copy the files from the setup
directory.

On Tue, Aug 2, 2011 at 5:54 AM, Shrish Bajpai wrote:

> I have just started to explore Hadoop but I am stuck in a situation now.
>
> I want to run a MapReduce job in hadoop which needs to create a "setup"
> folder in working directory. During the execution the job will generate
> some additional text files within this "setup" folder. The problem is I
> dont know how to access or move this setup folder content to my local file
> system as at end of the job, the job directory will be cleaned up.
>
> It would be great if you can help.
>
> Regards
>
> Shrish
>
>


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread madhu phatak
I had issue using IP address in XML files . You can try to use host names in
the place of IP address .

On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh  wrote:

> Hi,
>
> I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
> On the master node (192.168.1.101), I configure fs.default.name = hdfs://
> 127.0.0.1:9000. Then i configure everything on 3 other node
> When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on the
> master node
> Everything is ok, but the slave can't connect to the master on 9000, 9001
> port.
> I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
> "connection refused"
> Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
> result is connected.
> But, on the master node, i telnet to 192.168.1.101:9000 => Connection
> Refused
>
> Can somebody help me?
>


Re: Submitting and running hadoop jobs Programmatically

2011-07-27 Thread madhu phatak
Thank you Harsha . I am able to run the jobs by ditching *.

On Wed, Jul 27, 2011 at 11:41 AM, Harsh J  wrote:

> Madhu,
>
> Ditch the '*' in the classpath element that has the configuration
> directory. The directory ought to be on the classpath, not the files
> AFAIK.
>
> Try and let us know if it then picks up the proper config (right now,
> its using the local mode).
>
> On Wed, Jul 27, 2011 at 10:25 AM, madhu phatak 
> wrote:
> > Hi
> > I am submitting the job as follows
> >
> > java -cp
> >
>  
> Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
> > com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob
> input/book.csv
> > kkk11fffrrw 1
> >
> > I get the log in CLI as below
> >
> > 11/07/27 10:22:54 INFO security.Groups: Group mapping
> > impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> > cacheTimeout=30
> > 11/07/27 10:22:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 11/07/27 10:22:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> > processName=JobTracker, sessionId= - already initialized
> > 11/07/27 10:22:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser
> for
> > parsing the arguments. Applications should implement Tool for the same.
> > 11/07/27 10:22:54 INFO mapreduce.JobSubmitter: Cleaning up the staging
> area
> >
> file:/tmp/hadoop-hadoop/mapred/staging/hadoop-1331241340/.staging/job_local_0001
> >
> > It doesn't create any job in hadoop.
> >
> > On Tue, Jul 26, 2011 at 5:11 PM, Devaraj K  wrote:
> >
> >> Madhu,
> >>
> >>  Can you check the client logs, whether any error/exception is coming
> while
> >> submitting the job?
> >>
> >> Devaraj K
> >>
> >> -Original Message-
> >> From: Harsh J [mailto:ha...@cloudera.com]
> >> Sent: Tuesday, July 26, 2011 5:01 PM
> >> To: common-user@hadoop.apache.org
> >> Subject: Re: Submitting and running hadoop jobs Programmatically
> >>
> >> Yes. Internally, it calls regular submit APIs.
> >>
> >> On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak 
> >> wrote:
> >> > I am using JobControl.add() to add a job and running job control in
> >> > a separate thread and using JobControl.allFinished() to see all jobs
> >> > completed or not . Is this work same as Job.submit()??
> >> >
> >> > On Tue, Jul 26, 2011 at 4:08 PM, Harsh J  wrote:
> >> >
> >> >> Madhu,
> >> >>
> >> >> Do you get a specific error message / stack trace? Could you also
> >> >> paste your JT logs?
> >> >>
> >> >> On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak 
> >> >> wrote:
> >> >> > Hi
> >> >> >  I am using the same APIs but i am not able to run the jobs by just
> >> >> adding
> >> >> > the configuration files and jars . It never create a job in Hadoop
> ,
> >> it
> >> >> just
> >> >> > shows cleaning up staging area and fails.
> >> >> >
> >> >> > On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K 
> >> wrote:
> >> >> >
> >> >> >> Hi Madhu,
> >> >> >>
> >> >> >>   You can submit the jobs using the Job API's programmatically
> from
> >> any
> >> >> >> system. The job submission code can be written this way.
> >> >> >>
> >> >> >> // Create a new Job
> >> >> >> Job job = new Job(new Configuration());
> >> >> >> job.setJarByClass(MyJob.class);
> >> >> >>
> >> >> >> // Specify various job-specific parameters
> >> >> >> job.setJobName("myjob");
> >> >> >>
> >> >> >> job.setInputPath(new Path("in"));
> >> >> >> job.setOutputPath(new Path("out"));
> >> >> >>
> >> >> >> job.setMapperClass(MyJob.MyMapper.class);
> >> >> >> job.setReducerClass(MyJob.MyReducer.class);
> >> >> >>
> >> >> >> // Submit the job
> >> >> >> job.submit();
> >> >> >>
> >> >> >>
> >> >

Re: Does hadoop local mode support running multiple jobs in different threads?

2011-07-27 Thread madhu phatak
We already tested the similar scenario where each thread runs
a separate Hadoop job. It works fine with local mode. But make sure that you
don't too many parallel jobs because local mode some time not able handle
all the jobs at a time.

On Fri, Jul 1, 2011 at 6:12 PM, Yaozhen Pan  wrote:

> Hi,
>
> I am not sure if this question (as title) has been asked before, but I
> didn't find an answer by googling.
>
> I'd like to explain the scenario of my problem:
> My program launches several threads in the same time, while each thread
> will
> submit a hadoop job and wait for the job to complete.
> The unit tests were run in local mode, mini-cluster and the real hadoop
> cluster.
> I found the unit tests may fail in local mode, but they always succeeded in
> mini-cluster and real hadoop cluster.
> When unit test failed in local mode, the causes may be different (stack
> traces are posted at the end of mail).
>
> It seems multi-thead running multiple jobs is not supported in local mode,
> is it?
>
> Error 1:
> 2011-07-01 20:24:36,460 WARN  [Thread-38] mapred.LocalJobRunner
> (LocalJobRunner.java:run(256)) - job_local_0001
> java.io.FileNotFoundException: File
>
> build/test/tmp/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_00_0/output/spill0.out
> does not exist.
> at
>
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:253)
> at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1447)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
> Error 2:
> 2011-07-01 19:00:25,546 INFO  [Thread-32] fs.FSInputChecker
> (FSInputChecker.java:readChecksumChunk(247)) - Found checksum error:
> b[3584,
>
> 4096]=696f6e69643c2f6e616d653e3c76616c75653e47302e4120636f696e636964656e63652047312e413c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6d61707265642e6a6f622e747261636b65722e706572736973742e6a6f627374617475732e6469723c2f6e616d653e3c76616c75653e2f6a6f62747261636b65722f6a6f6273496e666f3c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6d61707265642e6a61723c2f6e616d653e3c76616c75653e66696c653a2f686f6d652f70616e797a682f6861646f6f7063616c632f6275696c642f746573742f746d702f6d61707265642f73797374656d2f6a6f625f6c6f63616c5f303030332f6a6f622e6a61723c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e66732e73332e627565722e6469723c2f6e616d653e3c76616c75653e247b6861646f6f702e746d702e6469727d2f7c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6a6f622e656e642e72657472792e617474656d7074733c2f6e616d653e3c76616c75653e303c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e66732e66696c652e696d706c3c2f6e616d653e3c76616c75653e6f
> org.apache.hadoop.fs.ChecksumException: Checksum error:
>
> file:/home/hadoop-user/hadoop-proj/build/test/tmp/mapred/system/job_local_0003/job.xml
> at 3584
> at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
> at
>
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
> at java.io.DataInputStream.read(DataInputStream.java:83)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
> at
>
> org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:61)
> at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1197)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:92)
> at
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
> at hadoop.GroupingRunnable.run(GroupingRunnable.java:126)
> at java.lang.Thread.run(Thread.java:619)
>


Re: Writing out a single file

2011-07-27 Thread madhu phatak
Always Map/Reduce jobs write to the HDFS. So you can use FS.copyToLocalFile
API to copy the result into local systems

On Tue, Jul 5, 2011 at 8:52 PM, James Seigel  wrote:

> Single reducer.
>
> On 2011-07-05, at 9:09 AM, Mark wrote:
>
> > Is there anyway I can write out the results of my mapreduce job into 1
> local file... ie the opposite of getmerge?
> >
> > Thanks
>
>


Re: Hadoop upgrade Java version

2011-07-27 Thread madhu phatak
We are running on _24 and _26 without any problem.

On Tue, Jul 19, 2011 at 6:33 PM, Isaac Dooley wrote:

> _24 seems to work fine on my cluster.
>


Re: error of loading logging class

2011-07-27 Thread madhu phatak
Its the problem of multiple versions of same jar.

On Thu, Jul 21, 2011 at 5:15 PM, Steve Loughran  wrote:

> On 20/07/11 07:16, Juwei Shi wrote:
>
>> Hi,
>>
>> We faced a problem of loading logging class when start the name node.  It
>> seems that hadoop can not find commons-logging-*.jar
>>
>> We have tried other commons-logging-1.0.4.jar and
>> commons-logging-api-1.0.4.jar. It does not work!
>>
>> The following are error logs from starting console:
>>
>>
> I'd drop the -api file as it isn't needed, and as you say, avoid duplicate
> versions. Make sure that log4j is at the same point in the class hierarchy
> too (e.g in hadoop/lib)
>
> to debug commons logging, tell it to log to stderr. It's useful in
> emergencies
>
> -Dorg.apache.commons.logging.**diagnostics.dest=STDERR
>


Re: Submitting and running hadoop jobs Programmatically

2011-07-27 Thread madhu phatak
Thank you . Will have a look on it.

On Wed, Jul 27, 2011 at 3:28 PM, Steve Loughran  wrote:

> On 27/07/11 05:55, madhu phatak wrote:
>
>> Hi
>> I am submitting the job as follows
>>
>> java -cp
>>  Nectar-analytics-0.0.1-**SNAPSHOT.jar:/home/hadoop/**
>> hadoop-for-nectar/hadoop-0.21.**0/conf/*:$HADOOP_COMMON_HOME/**
>> lib/*:$HADOOP_COMMON_HOME/*
>> com.zinnia.nectar.regression.**hadoop.primitive.jobs.SigmaJob
>> input/book.csv
>> kkk11fffrrw 1
>>
>
> My code to submit jobs (via a declarative configuration) is up online
>
> http://smartfrog.svn.**sourceforge.net/viewvc/**
> smartfrog/trunk/core/hadoop-**components/hadoop-ops/src/org/**
> smartfrog/services/hadoop/**operations/components/**
> submitter/SubmitterImpl.java?**revision=8590&view=markup<http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/SubmitterImpl.java?revision=8590&view=markup>
>
> It's LGPL, but ask nicely and I'll change the header to Apache.
>
> That code doesn't set up the classpath by pushing out more JARs (I'm
> planning to push out .groovy scripts instead), but it can also poll for job
> completion, take a timeout (useful in small test runs), and do other things.
> I currently mainly use it for testing
>
>


Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread madhu phatak
Hi
I am submitting the job as follows

java -cp
 
Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob input/book.csv
kkk11fffrrw 1

I get the log in CLI as below

11/07/27 10:22:54 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=30
11/07/27 10:22:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
11/07/27 10:22:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
11/07/27 10:22:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/07/27 10:22:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area
file:/tmp/hadoop-hadoop/mapred/staging/hadoop-1331241340/.staging/job_local_0001

It doesn't create any job in hadoop.

On Tue, Jul 26, 2011 at 5:11 PM, Devaraj K  wrote:

> Madhu,
>
>  Can you check the client logs, whether any error/exception is coming while
> submitting the job?
>
> Devaraj K
>
> -Original Message-
> From: Harsh J [mailto:ha...@cloudera.com]
> Sent: Tuesday, July 26, 2011 5:01 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Submitting and running hadoop jobs Programmatically
>
> Yes. Internally, it calls regular submit APIs.
>
> On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak 
> wrote:
> > I am using JobControl.add() to add a job and running job control in
> > a separate thread and using JobControl.allFinished() to see all jobs
> > completed or not . Is this work same as Job.submit()??
> >
> > On Tue, Jul 26, 2011 at 4:08 PM, Harsh J  wrote:
> >
> >> Madhu,
> >>
> >> Do you get a specific error message / stack trace? Could you also
> >> paste your JT logs?
> >>
> >> On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak 
> >> wrote:
> >> > Hi
> >> >  I am using the same APIs but i am not able to run the jobs by just
> >> adding
> >> > the configuration files and jars . It never create a job in Hadoop ,
> it
> >> just
> >> > shows cleaning up staging area and fails.
> >> >
> >> > On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K 
> wrote:
> >> >
> >> >> Hi Madhu,
> >> >>
> >> >>   You can submit the jobs using the Job API's programmatically from
> any
> >> >> system. The job submission code can be written this way.
> >> >>
> >> >> // Create a new Job
> >> >> Job job = new Job(new Configuration());
> >> >> job.setJarByClass(MyJob.class);
> >> >>
> >> >> // Specify various job-specific parameters
> >> >> job.setJobName("myjob");
> >> >>
> >> >> job.setInputPath(new Path("in"));
> >> >> job.setOutputPath(new Path("out"));
> >> >>
> >> >> job.setMapperClass(MyJob.MyMapper.class);
> >> >> job.setReducerClass(MyJob.MyReducer.class);
> >> >>
> >> >> // Submit the job
> >> >> job.submit();
> >> >>
> >> >>
> >> >>
> >> >> For submitting this, need to add the hadoop jar files and
> configuration
> >> >> files in the class path of the application from where you want to
> submit
> >> >> the
> >> >> job.
> >> >>
> >> >> You can refer this docs for more info on Job API's.
> >> >>
> >> >>
> >>
>
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
> >> >> uce/Job.html
> >> >>
> >> >>
> >> >>
> >> >> Devaraj K
> >> >>
> >> >> -Original Message-
> >> >> From: madhu phatak [mailto:phatak@gmail.com]
> >> >> Sent: Tuesday, July 26, 2011 3:29 PM
> >> >> To: common-user@hadoop.apache.org
> >> >> Subject: Submitting and running hadoop jobs Programmatically
> >> >>
> >> >> Hi,
> >> >>  I am working on a open source project
> >> >> Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
> >> >> i am trying to create the hadoop jobs depending upon the user input.
> I
> >> was
> >> >> using Java Process API to run the bin/hadoop shell script to submit
> the
> >> >> jobs. But it seems not good way because the process creation model is
> >> >> not consistent across different operating systems . Is there any
> better
> >> way
> >> >> to submit the jobs rather than invoking the shell script? I am using
> >> >> hadoop-0.21.0 version and i am running my program in the same user
> where
> >> >> hadoop is installed . Some of the older thread told if I add
> >> configuration
> >> >> files in path it will work fine . But i am not able to run in that
> way
> .
> >> So
> >> >> anyone tried this before? If So , please can you give detailed
> >> instruction
> >> >> how to achieve it . Advanced thanks for your help.
> >> >>
> >> >> Regards,
> >> >> Madhukara Phatak
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
> >
>
>
>
> --
> Harsh J
>
>


Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread madhu phatak
I am using JobControl.add() to add a job and running job control in
a separate thread and using JobControl.allFinished() to see all jobs
completed or not . Is this work same as Job.submit()??

On Tue, Jul 26, 2011 at 4:08 PM, Harsh J  wrote:

> Madhu,
>
> Do you get a specific error message / stack trace? Could you also
> paste your JT logs?
>
> On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak 
> wrote:
> > Hi
> >  I am using the same APIs but i am not able to run the jobs by just
> adding
> > the configuration files and jars . It never create a job in Hadoop , it
> just
> > shows cleaning up staging area and fails.
> >
> > On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K  wrote:
> >
> >> Hi Madhu,
> >>
> >>   You can submit the jobs using the Job API's programmatically from any
> >> system. The job submission code can be written this way.
> >>
> >> // Create a new Job
> >> Job job = new Job(new Configuration());
> >> job.setJarByClass(MyJob.class);
> >>
> >> // Specify various job-specific parameters
> >> job.setJobName("myjob");
> >>
> >> job.setInputPath(new Path("in"));
> >> job.setOutputPath(new Path("out"));
> >>
> >> job.setMapperClass(MyJob.MyMapper.class);
> >> job.setReducerClass(MyJob.MyReducer.class);
> >>
> >> // Submit the job
> >> job.submit();
> >>
> >>
> >>
> >> For submitting this, need to add the hadoop jar files and configuration
> >> files in the class path of the application from where you want to submit
> >> the
> >> job.
> >>
> >> You can refer this docs for more info on Job API's.
> >>
> >>
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
> >> uce/Job.html
> >>
> >>
> >>
> >> Devaraj K
> >>
> >> -Original Message-
> >> From: madhu phatak [mailto:phatak@gmail.com]
> >> Sent: Tuesday, July 26, 2011 3:29 PM
> >> To: common-user@hadoop.apache.org
> >> Subject: Submitting and running hadoop jobs Programmatically
> >>
> >> Hi,
> >>  I am working on a open source project
> >> Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
> >> i am trying to create the hadoop jobs depending upon the user input. I
> was
> >> using Java Process API to run the bin/hadoop shell script to submit the
> >> jobs. But it seems not good way because the process creation model is
> >> not consistent across different operating systems . Is there any better
> way
> >> to submit the jobs rather than invoking the shell script? I am using
> >> hadoop-0.21.0 version and i am running my program in the same user where
> >> hadoop is installed . Some of the older thread told if I add
> configuration
> >> files in path it will work fine . But i am not able to run in that way .
> So
> >> anyone tried this before? If So , please can you give detailed
> instruction
> >> how to achieve it . Advanced thanks for your help.
> >>
> >> Regards,
> >> Madhukara Phatak
> >>
> >>
> >
>
>
>
> --
> Harsh J
>


Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread madhu phatak
Hi
 I am using the same APIs but i am not able to run the jobs by just adding
the configuration files and jars . It never create a job in Hadoop , it just
shows cleaning up staging area and fails.

On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K  wrote:

> Hi Madhu,
>
>   You can submit the jobs using the Job API's programmatically from any
> system. The job submission code can be written this way.
>
> // Create a new Job
> Job job = new Job(new Configuration());
> job.setJarByClass(MyJob.class);
>
> // Specify various job-specific parameters
> job.setJobName("myjob");
>
> job.setInputPath(new Path("in"));
> job.setOutputPath(new Path("out"));
>
> job.setMapperClass(MyJob.MyMapper.class);
> job.setReducerClass(MyJob.MyReducer.class);
>
> // Submit the job
> job.submit();
>
>
>
> For submitting this, need to add the hadoop jar files and configuration
> files in the class path of the application from where you want to submit
> the
> job.
>
> You can refer this docs for more info on Job API's.
>
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
> uce/Job.html
>
>
>
> Devaraj K
>
> -Original Message-
> From: madhu phatak [mailto:phatak@gmail.com]
> Sent: Tuesday, July 26, 2011 3:29 PM
> To: common-user@hadoop.apache.org
> Subject: Submitting and running hadoop jobs Programmatically
>
> Hi,
>  I am working on a open source project
> Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
> i am trying to create the hadoop jobs depending upon the user input. I was
> using Java Process API to run the bin/hadoop shell script to submit the
> jobs. But it seems not good way because the process creation model is
> not consistent across different operating systems . Is there any better way
> to submit the jobs rather than invoking the shell script? I am using
> hadoop-0.21.0 version and i am running my program in the same user where
> hadoop is installed . Some of the older thread told if I add configuration
> files in path it will work fine . But i am not able to run in that way . So
> anyone tried this before? If So , please can you give detailed instruction
> how to achieve it . Advanced thanks for your help.
>
> Regards,
> Madhukara Phatak
>
>


Submitting and running hadoop jobs Programmatically

2011-07-26 Thread madhu phatak
Hi,
  I am working on a open source project
Nectar where
i am trying to create the hadoop jobs depending upon the user input. I was
using Java Process API to run the bin/hadoop shell script to submit the
jobs. But it seems not good way because the process creation model is
not consistent across different operating systems . Is there any better way
to submit the jobs rather than invoking the shell script? I am using
hadoop-0.21.0 version and i am running my program in the same user where
hadoop is installed . Some of the older thread told if I add configuration
files in path it will work fine . But i am not able to run in that way . So
anyone tried this before? If So , please can you give detailed instruction
how to achieve it . Advanced thanks for your help.

Regards,
Madhukara Phatak


Re: First open source Predictive modeling framework on Apache hadoop

2011-07-23 Thread madhu phatak
White paper associated with framework can be found here
http://zinniasystems.com/downloads/sample.jsp?fileName=Distributed_Computing_in_Business_Analytics.pdf

On Sun, Jul 24, 2011 at 11:49 AM, madhu phatak  wrote:

> thank you
>
>
> On Sun, Jul 24, 2011 at 11:47 AM, Mark Kerzner wrote:
>
>> Congratulations, looks very interesting.
>>
>> Mark
>>
>> On Sun, Jul 24, 2011 at 1:15 AM, madhu phatak 
>> wrote:
>>
>> > Hi,
>> >  We released Nectar,first open source predictive modeling on Apache
>> Hadoop.
>> > Please check it out.
>> >
>> > Info page
>> http://zinniasystems.com/zinnia.jsp?lookupPage=blogs/nectar.jsp
>> >
>> > Git Hub https://github.com/zinnia-phatak-dev/Nectar/downloads
>> >
>> > Reagards
>> > Madhukara Phatak,Zinnia Systems
>> >
>>
>
>


Re: First open source Predictive modeling framework on Apache hadoop

2011-07-23 Thread madhu phatak
thank you

On Sun, Jul 24, 2011 at 11:47 AM, Mark Kerzner wrote:

> Congratulations, looks very interesting.
>
> Mark
>
> On Sun, Jul 24, 2011 at 1:15 AM, madhu phatak 
> wrote:
>
> > Hi,
> >  We released Nectar,first open source predictive modeling on Apache
> Hadoop.
> > Please check it out.
> >
> > Info page
> http://zinniasystems.com/zinnia.jsp?lookupPage=blogs/nectar.jsp
> >
> > Git Hub https://github.com/zinnia-phatak-dev/Nectar/downloads
> >
> > Reagards
> > Madhukara Phatak,Zinnia Systems
> >
>


First open source Predictive modeling framework on Apache hadoop

2011-07-23 Thread madhu phatak
Hi,
 We released Nectar,first open source predictive modeling on Apache Hadoop.
Please check it out.

Info page http://zinniasystems.com/zinnia.jsp?lookupPage=blogs/nectar.jsp

Git Hub https://github.com/zinnia-phatak-dev/Nectar/downloads

Reagards
Madhukara Phatak,Zinnia Systems


Re: hadoop pipes

2011-07-20 Thread madhu phatak
When you launch program using bin/hadoop command full cluster info is
available to your program like name node, data node etc ..here your just
submitting binary but the starting is done by hadoop rather than you running
./a.out
On Jun 29, 2011 1:48 AM, "jitter"  wrote:
> hi i m confused about the execution of hadoop program;
> ahat happen when we write the hadoop pipe running command like bin/hadoop
> pipes -D pipie.java.record reader =true etc
>
> i don't know how the program run what does the control do;
> I know we compile the c++ program by g++ command and run it by ./a.out .
But
> in hadoop we dont use the ./a.out command than how this executable run ?
> what does the executable do in running command ?
> more ever icvhanged the program many time but every time sanre output .
> can any body tell me actually how pipes program work
>
> --
> View this message in context:
http://hadoop-common.472056.n3.nabble.com/hadoop-pipes-tp3117626p3117626.html
> Sent from the Users mailing list archive at Nabble.com.


Re: Api migration from 0.19.1 to 0.20.20

2011-07-20 Thread madhu phatak
Hadoop : the definitive guide also talks about migration
On Jun 28, 2011 8:31 PM, "Shi Yu"  wrote:
> On 6/28/2011 7:12 AM, Prashant Sharma wrote:
>> Hi ,
>>
>> I have my source code written in 0.19.1 Hadoop API and want to shift
>> it to newer API 0.20.20. Any clue on good documentation on migrating
>> from older version to newer version will be very helpful.
>>
>> Thanks.
>> Prashant
>>
>> 
>> This message was sent using IMP, the Internet Messaging Program.
>>
>>
> In the *Chuck Lam's* book *Hadoop in Action the* upgrade from 0.19.1 to
> 0.20.0 is mentioned. Also there are many pieces of information on web,
> but they are scattered around.
>


Re: measure the time taken to complete map and reduce phase

2011-07-04 Thread madhu phatak
The console will tell how much time taken by job
On Jul 5, 2011 8:26 AM, "sangroya"  wrote:
> Hi,
>
> I am trying to monitor the time to complete a map phase and reduce
> phase in hadoop. Is there any way to measure the time taken to
> complete map and reduce phase in a cluster.
>
> Thanks,
> Amit
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/measure-the-time-taken-to-complete-map-and-reduce-phase-tp3136991p3136991.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: NullPointerException when running multiple reducers with Hadoop 0.22.0-SNAPSHOT

2011-06-30 Thread madhu phatak
Can you see the logs of tasktracker for full stacktrace?

On Thu, Jun 30, 2011 at 12:24 PM, Paolo Castagna <
castagna.li...@googlemail.com> wrote:

> Hi,
> I am using Apache Whirr to setup an Hadoop cluster on EC2 using Hadoop
> 0.22.0 SNAPSHOTs (nightly) builds from Jenkins. For details, see [1,2].
> (Is there a better place where I can get nightly builds for Hadoop?)
>
> I have a Reducer which does not emit any (key,value) pairs (it generates
> only side effect files). When I run using only one reducer everything seems
> fine. If I setup multiple reducers, each one generating different side
> effect
> files, I get a NullPointerException:
>
>  WARN  Exception running child : java.lang.NullPointerException
>   at org.apache.hadoop.mapred.**TaskLogAppender.flush(**
> TaskLogAppender.java:96)
>   at org.apache.hadoop.mapred.**TaskLog.syncLogs(TaskLog.java:**239)
>   at org.apache.hadoop.mapred.**Child$4.run(Child.java:225)
>   at java.security.**AccessController.doPrivileged(**Native Method)
>   at javax.security.auth.Subject.**doAs(Subject.java:396)
>   at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1153)
>   at org.apache.hadoop.mapred.**Child.main(Child.java:217)
>
> Have you ever seen this exception/stacktrace?
>
> My driver is here [3] and the reducer is here [4].
>
> Regards,
> Paolo
>
>  [1] https://github.com/castagna/**tdbloader3/blob/master/hadoop-**
> ec2.properties
>  [2] https://builds.apache.org/**view/G-L/view/Hadoop/job/**
> Hadoop-22-Build/**lastSuccessfulBuild/artifact/**
> hadoop-0.22.0-SNAPSHOT.tar.gz
>  [3] https://github.com/castagna/**tdbloader3/blob/master/src/**
> main/java/com/talis/labs/tdb/**tdbloader3/ThirdDriver.java
>  [4] https://github.com/castagna/**tdbloader3/blob/master/src/**
> main/java/com/talis/labs/tdb/**tdbloader3/ThirdReducer.java
>
>


Re: Running Back to Back Map-reduce jobs

2011-06-21 Thread madhu phatak
You can use ControlledJob's addDependingJob to handle dependency between
multiple jobs.

On Tue, Jun 7, 2011 at 4:15 PM, Adarsh Sharma wrote:

> Harsh J wrote:
>
>> Yes, I believe Oozie does have Pipes and Streaming action helpers as well.
>>
>> On Thu, Jun 2, 2011 at 5:05 PM, Adarsh Sharma 
>> wrote:
>>
>>
>>> Ok, Is it valid for running jobs through Hadoop Pipes too.
>>>
>>> Thanks
>>>
>>> Harsh J wrote:
>>>
>>>
 Oozie's workflow feature may exactly be what you're looking for. It
 can also do much more than just chain jobs.

 Check out additional features at: http://yahoo.github.com/oozie/

 On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma >>> >
 wrote:



>>> After following the below points, I am confused about the examples used
> in the documentation :
>
> http://yahoo.github.com/oozie/**releases/3.0.0/**
> WorkflowFunctionalSpec.html#**a3.2.2.3_Pipes
>
> What I want to achieve is to terminate a job after my permission i.e if I
> want to run again a map-reduce job after the completion of one , it runs &
> then terminates after my code execution.
> I struggled to find a simple example that proves this concept. In the Oozie
> documentation, they r just setting parameters and use them.
>
> fore.g a simple Hadoop Pipes job is executed by :
>
> int main(int argc, char *argv[]) {
>  return HadoopPipes::runTask(**HadoopPipes::TemplateFactory<**
> WordCountMap,
> WordCountReduce>());
> }
>
> Now if I want to run another job after this on the reduced data in HDFS,
> how this could be possible. Do i need to add some code.
>
> Thanks
>
>
>
>
>
>  Dear all,
>
> I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.
>
> Now this time I want a map-reduce job to be run again after one.
>
> Fore.g to clear my point, suppose a wordcount is run on gutenberg file
> in
> HDFS and after completion
>
> 11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set.  User
> classes
> may not be found. See JobConf(Class) or JobConf#setJar(String).
> 11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to
> process
> : 3
> 11/06/02 15:14:36 INFO mapred.JobClient: Running job:
> job_201106021143_0030
> 11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
> 11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
> 11/06/02 15:14:59 INFO mapred.JobClient:  map 66% reduce 11%
> 11/06/02 15:15:08 INFO mapred.JobClient:  map 100% reduce 22%
> 11/06/02 15:15:17 INFO mapred.JobClient:  map 100% reduce 100%
> 11/06/02 15:15:25 INFO mapred.JobClient: Job complete:
> job_201106021143_0030
> 11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18
>
>
>
> Again a map-reduce job is started on the output or original data say
> again
>
> 1/06/02 15:14:36 INFO mapred.JobClient: Running job:
> job_201106021143_0030
> 11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
> 11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
>
> Is it possible or any parameters to achieve it.
>
> Please guide .
>
> Thanks
>
>
>
>
>




>>>
>>>
>>
>>
>>
>>
>>
>
>


Re: one-to-many Map Side Join without reducer

2011-06-21 Thread madhu phatak
I think HIVE is best suited for ur use case where it gives you the sql based
interface to the hadoop to make these type of things.

On Fri, Jun 10, 2011 at 2:39 AM, Shi Yu  wrote:

> Hi,
>
> I have two datasets: dataset 1 has the format:
>
> MasterKey1SubKey1SubKey2SubKey3
> MasterKey2Subkey4 Subkey5 Subkey6
> 
>
>
> dataset 2 has the format:
>
> SubKey1Value1
> SubKey2Value2
> ...
>
> I want to have one-to-many join based on the SubKey, and the final goal is
> to have an output like:
>
> MasterKey1Value1Value2Value3
> MasterKey2Value4Value5Value6
> ...
>
>
> After studying and experimenting some example code, I understand that it is
> doable if I transform the first data set as
>
> SubKey1MasterKey1
> SubKey2MasterKey1
> SubKey3MasterKey1
> SubKey4MasterKey2
> SubKey5MasterKey2
> SubKey6MasterKey2
>
> then using the inner join with the dataset 2 on SubKey. Then I probably
> need a reducer to perform secondary sort on MasterKey to get the result.
> However, the bottleneck is still on the reducer if each MasterKey has lots
> of SubKey.
> My question is, suppose that dataset2 contains all the Subkeys and never
> split, is it possible to join the key of dataset 2 with multiple values of
> dataset 1 at the Mapper Side? Any hint is highly appreciated.
>
> Shi
>
>
>


Re: Help with adjusting Hadoop configuration files

2011-06-21 Thread madhu phatak
Yeah it will increase performance by reducing number of mappers and making
single mapper to use more memory . So the value depends upon the application
and RAM available . For your use case i think 512MB- 1GB will be better
value.

On Tue, Jun 21, 2011 at 4:28 PM, Avi Vaknin  wrote:

> Hi,
> The block size is configured to 128MB, I've read that it is recommended to
> increase it in order to get better performance.
> What value do you recommend to set it ?
>
> Avi
>
> -Original Message-
> From: madhu phatak [mailto:phatak@gmail.com]
> Sent: Tuesday, June 21, 2011 12:54 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Help with adjusting Hadoop configuration files
>
> If u reduce the default block size of dfs(which is in the configuration
> file) and if u use default inputformat it creates more no of mappers at a
> time which may help you to effectively use the RAM.. Another way is create
> as many parallel jobs as possible at pro grammatically so that uses all
> available RAM.
>
> On Tue, Jun 21, 2011 at 3:17 PM, Avi Vaknin  wrote:
>
> > Hi Madhu,
> > First of all, thanks for the quick reply.
> > I've searched the net about the properties of the configuration files and
> I
> > specifically wanted to know if there is
> > a property that is related to memory tuning (as you can see I have 7.5
> RAM
> > on each datanode and I really want to use it properly).
> > Also, I've changed the mapred.tasktracker.reduce/map.tasks.maximum to 10
> > (number of cores on the datanodes) and unfortunately I haven't seen any
> > change on the performance or time duration of running jobs.
> >
> > Avi
> >
> > -Original Message-
> > From: madhu phatak [mailto:phatak@gmail.com]
> > Sent: Tuesday, June 21, 2011 12:33 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Help with adjusting Hadoop configuration files
> >
> > The utilization of cluster depends upon the no of jobs and no of mappers
> > and
> > reducers.The configuration files only help u set up the cluster by
> > specifying info .u can also specify some of details like block size and
> > replication in configuration files  which may help you in job
> > management.You
> > can read all the available configuration properties here
> > http://hadoop.apache.org/common/docs/current/cluster_setup.html
> >
> > On Tue, Jun 21, 2011 at 2:13 PM, Avi Vaknin 
> wrote:
> >
> > > Hi Everyone,
> > > We are a start-up company has been using the Hadoop Cluster platform
> > > (version 0.20.2) on Amazon EC2 environment.
> > > We tried to setup a cluster using two different forms:
> > > Cluster 1: includes 1 master (namenode) + 5 datanodes - all of the
> > machines
> > > are small EC2 instances (1.6 GB RAM)
> > > Cluster 2: includes 1 master (namenode) + 2 datanodes - the master is a
> > > small EC2 instance and the other two datanodes are large EC2 instances
> > (7.5
> > > GB RAM)
> > > We tried to make changes on the the configuration files (core-sit,
> > > hdfs-site
> > > and mapred-sit xml files) and we expected to see a significant
> > improvement
> > > on the performance of the cluster 2,
> > > unfortunately this has yet to happen.
> > >
> > > Are there any special parameters on the configuration files that we
> need
> > to
> > > change in order to adjust the Hadoop to a large hardware environment ?
> > > Are there any best practice you recommend?
> > >
> > > Thanks in advance.
> > >
> > > Avi
> > >
> > >
> > >
> > >
> >
> > -
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1382 / Virus Database: 1513/3707 - Release Date: 06/16/11
> >
> >
>
> -
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1382 / Virus Database: 1513/3707 - Release Date: 06/16/11
>
>


Re: Append to Existing File

2011-06-21 Thread madhu phatak
Please refer to this discussion
http://search-hadoop.com/m/rnG0h1zCZcL1/Re%253A+HDFS+File+Appending+URGENT&subj=Fw+HDFS+File+Appending+URGENT

On Tue, Jun 21, 2011 at 4:23 PM, Eric Charles wrote:

> When you say "bugs pending", are your refering to HDFS-265 (which links to
> HDFS-1060, HADOOP-6239 and HDFS-744?
>
> Are there other issues related to append than the one above?
>
> Tks, Eric
>
> https://issues.apache.org/**jira/browse/HDFS-265<https://issues.apache.org/jira/browse/HDFS-265>
>
>
>
> On 21/06/11 12:36, madhu phatak wrote:
>
>> Its not stable . There are some bugs pending . According one of the
>> disccusion till date the append is not ready for production.
>>
>> On Tue, Jun 14, 2011 at 12:19 AM, jagaran das**
>> wrote:
>>
>>  I am using hadoop-0.20.203.0 version.
>>> I have set
>>>
>>> dfs.support.append to true and then using append method
>>>
>>> It is working but need to know how stable it is to deploy and use in
>>> production
>>> clusters ?
>>>
>>> Regards,
>>> Jagaran
>>>
>>>
>>>
>>> __**__
>>> From: jagaran das
>>> To: common-user@hadoop.apache.org
>>> Sent: Mon, 13 June, 2011 11:07:57 AM
>>> Subject: Append to Existing File
>>>
>>> Hi All,
>>>
>>> Is append to an existing file is now supported in Hadoop for production
>>> clusters?
>>> If yes, please let me know which version and how
>>>
>>> Thanks
>>> Jagaran
>>>
>>>
>>
> --
> Eric
>


Re: Hadoop Runner

2011-06-21 Thread madhu phatak
Define Ur own custom Record Reader and its efficient .

On Sun, Jun 12, 2011 at 10:12 AM, Harsh J  wrote:

> Mark,
>
> I may not have gotten your question exactly, but you can do further
> processing inside of your FileInputFormat derivative's RecordReader
> implementation (just before it loads the value for a next() form of
> call -- which the MapRunner would use to read).
>
> If you're looking to dig into Hadoop's source code to understand the
> flow yourself, MapTask.java is what you may be looking for (run*
> methods).
>
> On Sun, Jun 12, 2011 at 3:25 AM, Mark question 
> wrote:
> > Hi,
> >
> >  1) Where can I find the "main" class of hadoop? The one that calls the
> > InputFormat then the MapperRunner and ReducerRunner and others?
> >
> >This will help me understand what is in memory or still on disk ,
> exact
> > flow of data between split and mappers .
> >
> > My problem is, assuming I have a TextInputFormat and would like to modify
> > the input in memory before being read by RecordReader... where shall I do
> > that?
> >
> >InputFormat was my first guess, but unfortunately, it only defines the
> > logical splits ... So, the only way I can think of is use the
> recordReader
> > to read all the records in split into another variable (with the format I
> > want) then process that variable by map functions.
> >
> >   But is that efficient? So, to understand this,I hope someone can give
> an
> > answer to Q(1)
> >
> > Thank you,
> > Mark
> >
>
>
>
> --
> Harsh J
>


Re: nomenclature

2011-06-21 Thread madhu phatak
Master and slaves comes from the old distributed system terminologies .
Since Hadoop also is a distributed system it uses that type of terminology
:)

On Mon, Jun 13, 2011 at 6:36 AM, Mark Hedges  wrote:

>
> It is a common term, just one that I have seen distastefully
> used with an unpleasant subtext.  Just reading the manual...
> Hadoop is neat-o!  --mark--
>
>
> On Mon, 13 Jun 2011, Nan Zhu wrote:
>
> > For MapReduce, I always call them "JobTracker" and
> > "TaskTracker", for HDFS, "Namenode" and "DataNode"
> >
> > because of the name of classes in source code???:-)
> >
> > On Mon, Jun 13, 2011 at 8:25 AM, Mark Hedges 
> wrote:
> >
> > > Why don't you call them "directors" and "workers"
> > > instead of "masters" and "slaves" ?
>


Re: Append to Existing File

2011-06-21 Thread madhu phatak
Its not stable . There are some bugs pending . According one of the
disccusion till date the append is not ready for production.

On Tue, Jun 14, 2011 at 12:19 AM, jagaran das wrote:

> I am using hadoop-0.20.203.0 version.
> I have set
>
> dfs.support.append to true and then using append method
>
> It is working but need to know how stable it is to deploy and use in
> production
> clusters ?
>
> Regards,
> Jagaran
>
>
>
> 
> From: jagaran das 
> To: common-user@hadoop.apache.org
> Sent: Mon, 13 June, 2011 11:07:57 AM
> Subject: Append to Existing File
>
> Hi All,
>
> Is append to an existing file is now supported in Hadoop for production
> clusters?
> If yes, please let me know which version and how
>
> Thanks
> Jagaran
>


Re: Handling external jars in EMR

2011-06-21 Thread madhu phatak
Its better to merge the library with ur code . Other wise u have to copy the
library to every lib folder of HADOOP in every node cluster. libjars is not
working for me also . I used maven shade plugin (eclipse) to get the merged
jar.

On Wed, Jun 15, 2011 at 12:20 AM, Mehmet Tepedelenlioglu <
mehmets...@gmail.com> wrote:

> I am using the Guava library in my hadoop code through a jar file. With
> hadoop one has the -libjars option (although I could not get that working on
> 0.2 for some reason). Are there any easy options with EMR short of using a
> utility like jarjar or bootstrapping magic. Or is that what I'll need to do?
>
> Thanks,
>
> Mehmet T.


Re: ClassNotFoundException while running quick start guide on Windows.

2011-06-21 Thread madhu phatak
I think the jar have some issuses where its not able to read the Main class
from manifest . try unjar the jar and see in Manifest.xml what is the main
class and then run as follows

 bin/hadoop jar hadoop-*-examples.jar  grep input
output 'dfs[a-z.]+'
On Thu, Jun 16, 2011 at 10:23 AM, Drew Gross  wrote:

> Hello,
>
> I'm trying to run the example from the quick start guide on Windows and I
> get this error:
>
> $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
> Exception in thread "main" java.lang.NoClassDefFoundError:
> Caused by: java.lang.ClassNotFoundException:
>at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> Could not find the main class: .  Program will exit.
> Exception in thread "main" java.lang.NoClassDefFoundError:
> Gross\Documents\Projects\discom\hadoop-0/21/0\logs
> Caused by: java.lang.ClassNotFoundException:
> Gross\Documents\Projects\discom\hadoop-0.21.0\logs
>at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> Could not find the main class:
> Gross\Documents\Projects\discom\hadoop-0.21.0\logs.  Program will exit.
>
> Does anyone know what I need to change?
>
> Thank you.
>
> From, Drew
>
> --
> Forget the environment. Print this e-mail immediately. Then burn it.
>


Re: automatic monitoring the utilization of slaves

2011-06-21 Thread madhu phatak
Its not possible since the slave will don't have knowledge of other slaves ,
so though master started the job , it may be running in other slaves . So in
program if your running jobs using Job Conf then you can use it to track the
status of the job . Other wise you can dig into JobTracker WebUI code to see
how exactly interacting with master to get the info

On Thu, Jun 16, 2011 at 12:27 PM, bikash sharma wrote:

> Hi -- Is there a way, by which a slave can get a trigger when a Hadoop jobs
> finished in master?
> The use case is as follows:
> I need to monitor the cpu, memory utilization utility automatically. For
> which, I need to know the timestamp to start and stop the sar utility
> corresponding to the start and finish of Hadoop job at master.
> Its simple to do at master, since the Hadoop job runs there, but how we do
> for slaves?
>
> Thanks.
> Bikash
>


Re: Heap Size is 27.25 MB/888.94 MB

2011-06-21 Thread madhu phatak
Its related with the amount of memory available to Java Virtual machine that
is created for hadoop jobs.

On Fri, Jun 17, 2011 at 1:18 AM, Harsh J  wrote:

> The 'heap size' is a Java/program and memory (RAM) thing; unrelated to
> physical disk space that the HDFS may occupy (which can be seen in
> configured capacity).
>
> More reading on what a Java heap size is about:
> http://en.wikipedia.org/wiki/Java_Virtual_Machine#Heap
>
> On Fri, Jun 17, 2011 at 1:07 AM,   wrote:
> >
> > So its saying my heap size is  (Heap Size is 27.25 MB/888.94 MB)
> >
> >
> > but my configured capacity is 971GB (4 nodes)
> >
> >
> >
> > Is heap size on the main page just for the namenode or do I need to
> > increase it to include the datanodes
> >
> >
> >
> > Cheers -
> >
> >
> >
> > Jeffery Schmitz
> > Projects and Technology
> > 3737 Bellaire Blvd Houston, Texas 77001
> > Tel: +1-713-245-7326 Fax: +1 713 245 7678
> > Email: jeff.schm...@shell.com 
> >
> > "TK-421, why aren't you at your post?"
> >
> >
> >
> >
> >
> >
>
>
>
> --
> Harsh J
>


Re: HDFS File Appending

2011-06-21 Thread madhu phatak
HDFS doesnot support Appending i think . I m not sure about pig , if you are
using Hadoop directly you can zip the files and use zip as the input the
jobs.

On Fri, Jun 17, 2011 at 6:56 AM, Xiaobo Gu  wrote:

> please refer to FileUtil.CopyMerge
>
> On Fri, Jun 17, 2011 at 8:33 AM, jagaran das 
> wrote:
> > Hi,
> >
> > We have a requirement where
> >
> >  There would be huge number of small files to be pushed to hdfs and then
> use pig
> > to do analysis.
> >  To get around the classic "Small File Issue" we merge the files and push
> a
> > bigger file in to HDFS.
> >  But we are loosing time in this merging process of our pipeline.
> >
> > But If we can directly append to an existing file in HDFS we can save
> this
> > "Merging Files" time.
> >
> > Can you please suggest if there a newer stable version of Hadoop where
> can go
> > for appending ?
> >
> > Thanks and Regards,
> > Jagaran
>


Re: Starting an HDFS node (standalone) programmatically by API

2011-06-21 Thread madhu phatak
HDFS should be available to DataNodes in order to run the jobs and bin/hdfs
just uses the hadoop jobs to access hdfs in datanodes .So if u want read a
file from hdfs inside a job you have to start data nodes when cluster comes
up.

On Fri, Jun 17, 2011 at 4:12 PM, punisher  wrote:

> Hi all,
>
> hdfs nodes can be started using the sh scripts provided with hadoop.
> I read that it's all based on script files
>
> is it possible to start an HDFS (standalone) from a java application by
> API?
>
> Thanks
>
> --
> View this message in context:
> http://hadoop-common.472056.n3.nabble.com/Starting-an-HDFS-node-standalone-programmatically-by-API-tp3075693p3075693.html
> Sent from the Users mailing list archive at Nabble.com.
>


Re: Help with adjusting Hadoop configuration files

2011-06-21 Thread madhu phatak
If u reduce the default block size of dfs(which is in the configuration
file) and if u use default inputformat it creates more no of mappers at a
time which may help you to effectively use the RAM.. Another way is create
as many parallel jobs as possible at pro grammatically so that uses all
available RAM.

On Tue, Jun 21, 2011 at 3:17 PM, Avi Vaknin  wrote:

> Hi Madhu,
> First of all, thanks for the quick reply.
> I've searched the net about the properties of the configuration files and I
> specifically wanted to know if there is
> a property that is related to memory tuning (as you can see I have 7.5 RAM
> on each datanode and I really want to use it properly).
> Also, I've changed the mapred.tasktracker.reduce/map.tasks.maximum to 10
> (number of cores on the datanodes) and unfortunately I haven't seen any
> change on the performance or time duration of running jobs.
>
> Avi
>
> -Original Message-
> From: madhu phatak [mailto:phatak@gmail.com]
> Sent: Tuesday, June 21, 2011 12:33 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Help with adjusting Hadoop configuration files
>
> The utilization of cluster depends upon the no of jobs and no of mappers
> and
> reducers.The configuration files only help u set up the cluster by
> specifying info .u can also specify some of details like block size and
> replication in configuration files  which may help you in job
> management.You
> can read all the available configuration properties here
> http://hadoop.apache.org/common/docs/current/cluster_setup.html
>
> On Tue, Jun 21, 2011 at 2:13 PM, Avi Vaknin  wrote:
>
> > Hi Everyone,
> > We are a start-up company has been using the Hadoop Cluster platform
> > (version 0.20.2) on Amazon EC2 environment.
> > We tried to setup a cluster using two different forms:
> > Cluster 1: includes 1 master (namenode) + 5 datanodes - all of the
> machines
> > are small EC2 instances (1.6 GB RAM)
> > Cluster 2: includes 1 master (namenode) + 2 datanodes - the master is a
> > small EC2 instance and the other two datanodes are large EC2 instances
> (7.5
> > GB RAM)
> > We tried to make changes on the the configuration files (core-sit,
> > hdfs-site
> > and mapred-sit xml files) and we expected to see a significant
> improvement
> > on the performance of the cluster 2,
> > unfortunately this has yet to happen.
> >
> > Are there any special parameters on the configuration files that we need
> to
> > change in order to adjust the Hadoop to a large hardware environment ?
> > Are there any best practice you recommend?
> >
> > Thanks in advance.
> >
> > Avi
> >
> >
> >
> >
>
> -
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1382 / Virus Database: 1513/3707 - Release Date: 06/16/11
>
>


Re: Setting up a Single Node Hadoop Cluster

2011-06-21 Thread madhu phatak
What is the log content? Its the best place to see whats going wrong . If
you give the logs then its easy point out the problem

On Tue, Jun 21, 2011 at 9:06 AM, Kumar Kandasami <
kumaravel.kandas...@gmail.com> wrote:

> Hi Ziyad:
>
>  Do you see any errors on the log file ?
>
> I have installed CDH3 in the past on Ubuntu machines using the two links
> below:
>
> https://ccp.cloudera.com/display/CDHDOC/Before+You+Install+CDH3+on+a+Single+Node
> <
> https://ccp.cloudera.com/display/CDHDOC/Before+You+Install+CDH3+on+a+Single+Node%20
> >
>
> https://ccp.cloudera.com/display/CDHDOC/Installing+CDH3+on+a+Single+Linux+Node+in+Pseudo-distributed+Mode
>
> Also the blog link below explains how to install using tarball files that
> works on my Ubuntu too (even though it is explained for Mac).
>
>
> http://knowledgedonor.blogspot.com/2011/05/installing-cloudera-hadoop-hadoop-0202.html
>
>
> Hope these links help you proceed further.
>
> Kumar_/|\_
> www.saisk.com
> ku...@saisk.com
> "making a profound difference with knowledge and creativity..."
>
>
> On Mon, Jun 20, 2011 at 6:22 PM, Ziyad Mir  wrote:
>
> > Hi,
> >
> > I have been attempting to set up a single node Hadoop cluster (by
> following
> >
> >
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
> > )
> > on
> > my personal computer (running Ubuntu 10.10), however, I have run into
> some
> > roadblocks.
> >
> > Specifically, there appear to be issues starting the required Hadoop
> > processes after running 'bin/hadoop/start-all.sh' (jps only returns
> > itself).
> > In addition, if I run 'bin/hadoop/stop-all.sh', I often see 'no namenode
> to
> > stop, no jobtracker to stop'.
> >
> > I have attempted looking into the hadoop/log files, however, I'm not sure
> > what specifically I am looking for.
> >
> > Any suggestions would be much appreciated.
> >
> > Thanks,
> > Ziyad
> >
>


Re: Help with adjusting Hadoop configuration files

2011-06-21 Thread madhu phatak
The utilization of cluster depends upon the no of jobs and no of mappers and
reducers.The configuration files only help u set up the cluster by
specifying info .u can also specify some of details like block size and
replication in configuration files  which may help you in job management.You
can read all the available configuration properties here
http://hadoop.apache.org/common/docs/current/cluster_setup.html

On Tue, Jun 21, 2011 at 2:13 PM, Avi Vaknin  wrote:

> Hi Everyone,
> We are a start-up company has been using the Hadoop Cluster platform
> (version 0.20.2) on Amazon EC2 environment.
> We tried to setup a cluster using two different forms:
> Cluster 1: includes 1 master (namenode) + 5 datanodes - all of the machines
> are small EC2 instances (1.6 GB RAM)
> Cluster 2: includes 1 master (namenode) + 2 datanodes - the master is a
> small EC2 instance and the other two datanodes are large EC2 instances (7.5
> GB RAM)
> We tried to make changes on the the configuration files (core-sit,
> hdfs-site
> and mapred-sit xml files) and we expected to see a significant improvement
> on the performance of the cluster 2,
> unfortunately this has yet to happen.
>
> Are there any special parameters on the configuration files that we need to
> change in order to adjust the Hadoop to a large hardware environment ?
> Are there any best practice you recommend?
>
> Thanks in advance.
>
> Avi
>
>
>
>


Re: How do I get a JobStatus object?

2011-02-16 Thread madhu phatak
Rather than running jobs by wait for completion you can use jobcontrol to
control the jobs . JobControl give access to the what all jobs are completed
,running and failed etc

On Thu, Feb 17, 2011 at 12:09 AM, Aaron Baff wrote:

> I'm submitting jobs via JobClient.submitJob(JobConf), and then waiting
> until it completes with RunningJob.waitForCompletion(). I then want to get
> how long the entire MR takes, which appears to need the JobStatus since
> RunningJob doesn't provide anything I can use for that. The only way I can
> see how to do it right now is JobClient.getAllJobs(), which gives me an
> array of all the jobs that are submitted (currently running? all previous?).
> Anyone know how I could go about doing this?
>
> --Aaron
>


Re: Hadoop in Real time applications

2011-02-16 Thread madhu phatak
Hadoop is not suited for real time applications

On Thu, Feb 17, 2011 at 9:47 AM, Karthik Kumar wrote:

> Can Hadoop be used for Real time Applications such as banking solutions...
>
> --
> With Regards,
> Karthik
>


Re: Map Task Fails.........

2011-02-16 Thread madhu phatak
tasktracker log *
On Wed, Feb 16, 2011 at 8:00 PM, madhu phatak  wrote:

> See the tasklog  of the slave to see why the task attempt is failing...



>
>
> On Wed, Feb 16, 2011 at 7:29 PM, Nitin Khandelwal <
> nitin.khandel...@germinait.com> wrote:
>
>> Hi,
>> I am using Hadoop 0.21.0. I am getting Exception as
>> java.lang.Throwable: Child Error at
>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:249) Caused by:
>> java.io.IOException: Task process exit with nonzero status of 1. at
>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:236)
>> when i am trying to run Map Red. This error comes in one of the slaves (
>> which is also master in my case) . Cam any body tell why i may be getting
>> this?
>> Thanks,
>>
>>
>> --
>>
>>
>> Nitin Khandelwal
>>
>
>


Re: Map Task Fails.........

2011-02-16 Thread madhu phatak
See the tasklog of the slave to see why the task attempt is failing...

On Wed, Feb 16, 2011 at 7:29 PM, Nitin Khandelwal <
nitin.khandel...@germinait.com> wrote:

> Hi,
> I am using Hadoop 0.21.0. I am getting Exception as
> java.lang.Throwable: Child Error at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:249) Caused by:
> java.io.IOException: Task process exit with nonzero status of 1. at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:236)
> when i am trying to run Map Red. This error comes in one of the slaves (
> which is also master in my case) . Cam any body tell why i may be getting
> this?
> Thanks,
>
>
> --
>
>
> Nitin Khandelwal
>


Re: A problem with using 0 reducers in hadoop 0.20.2

2011-02-11 Thread madhu phatak
If u don't specify any reducer things work fine .. so no need to
specify no of reducers

On Friday, February 11, 2011, Sina Samangooei  wrote:
> Hi,
>
> I have a job that benefits many mappers, but the output of each of these 
> mappers needs no further work and can be outputed directly to the HDFS as 
> sequence files. I've set up a job to do this in java, specifying my mapper 
> and setting reducers to 0 using:
>
> job.setNumReduceTasks(0);
>
> The mapper i have written works correctly when run locally through eclipse. 
> However, when i submit my job to my hadoop cluster using:
>
> hadoop jar  my.jar
>
> I am finding some problems. The following exception is thrown whenever i emit 
> from one of my map tasks using the command:
>
> context.write(key, new BytesWritable(baos.toByteArray()));
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /data/quantised_features/ascii-sift-ukbench/_temporary/_attempt_201010211037_0140_m_00_0/part-m-0
>  File does not exist. Holder DFSClient_attempt_201010211037_0140_m_00_0 
> does not have any open files.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1378)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1369)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1290)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
>         at sun.reflect.GeneratedMethodAccessor549.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:817)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>         at $Proxy1.addBlock(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at $Proxy1.addBlock(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3000)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2881)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
> This seemed quite strange in itself. I proceeded to do some testing. At the 
> setup() phase of the mapper, i can confirm that the output directory does not 
> exist on the HDFS using:
> Path destination = FileOutputFormat.getWorkOutputPath(context);
> destination.getFileSystem(context.getConfiguration()).exists(destination)
>
> Therefore i create the the output directory (for test purposes) at the setup 
> phase using the following command:
>
> destination.getFileSystem(context.getConfiguration()).mkdirs(destination);
>
> The output location then does exist, but only until the end of the setup() 
> call. When the map() function is reached the output directory is gone again!
>
> If i set my number of reducers to 1 (without setting the reducer class, i.e. 
> using the default reducer), this job works absolutely fine. The issue arises 
> only with 0 reducers.
>
> Can anyone help shine some light on this problem?
>
> Thanks
>
> - Sina


Re: no jobtracker to stop,no namenode to stop

2011-02-09 Thread madhu phatak
IP address wiLL not work ..You have to put the hostnames in every
configuration file.
On Wed, Feb 9, 2011 at 9:58 PM, madhu phatak  wrote:

>
> IP address with not work ..You have to put the hostnames in every
> configuration file.
>
> On Wed, Feb 9, 2011 at 2:01 PM, ursbrbalaji  wrote:
>
>>
>>
>> Hi Madhu,
>>
>> The jobtracker logs show the following exception.
>>
>> 2011-02-09 16:24:51,244 INFO org.apache.hadoop.mapred.JobTracker:
>> STARTUP_MSG:
>> /
>> STARTUP_MSG: Starting JobTracker
>> STARTUP_MSG:   host = BRBALAJI-PC/172.17.168.45
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.20.2
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>> 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>> /
>> 2011-02-09 16:24:51,357 INFO org.apache.hadoop.mapred.JobTracker:
>> Scheduler
>> configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
>> limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
>> 2011-02-09 16:24:51,421 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>> Initializing RPC Metrics with hostName=JobTracker, port=54311
>> 2011-02-09 16:24:56,538 INFO org.mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 2011-02-09 16:24:56,703 INFO org.apache.hadoop.http.HttpServer: Port
>> returned by webServer.getConnectors()[0].getLocalPort() before open() is
>> -1.
>> Opening the listener on 50030
>> 2011-02-09 16:24:56,704 INFO org.apache.hadoop.http.HttpServer:
>> listener.getLocalPort() returned 50030
>> webServer.getConnectors()[0].getLocalPort() returned 50030
>> 2011-02-09 16:24:56,704 INFO org.apache.hadoop.http.HttpServer: Jetty
>> bound
>> to port 50030
>> 2011-02-09 16:24:56,704 INFO org.mortbay.log: jetty-6.1.14
>> 2011-02-09 16:24:57,394 INFO org.mortbay.log: Started
>> SelectChannelConnector@0.0.0.0:50030
>> 2011-02-09 16:24:57,395 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=JobTracker, sessionId=
>> 2011-02-09 16:24:57,396 INFO org.apache.hadoop.mapred.JobTracker:
>> JobTracker
>> up at: 54311
>> 2011-02-09 16:24:57,396 INFO org.apache.hadoop.mapred.JobTracker:
>> JobTracker
>> webserver: 50030
>> 2011-02-09 16:24:58,710 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 0 time(s).
>> 2011-02-09 16:24:59,711 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 1 time(s).
>> 2011-02-09 16:25:00,712 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 2 time(s).
>> 2011-02-09 16:25:01,713 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 3 time(s).
>> 2011-02-09 16:25:02,713 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 4 time(s).
>> 2011-02-09 16:25:03,714 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 5 time(s).
>> 2011-02-09 16:25:04,715 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 6 time(s).
>> 2011-02-09 16:25:05,715 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 7 time(s).
>> 2011-02-09 16:25:06,716 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 8 time(s).
>> 2011-02-09 16:25:07,717 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect
>> to server: localhost/127.0.0.1:54310. Already tried 9 time(s).
>> 2011-02-09 16:25:07,722 INFO org.apache.hadoop.mapred.JobTracker: problem
>> cleaning system directory: null
>> java.net.ConnectException: Call to localhost/127.0.0.1:54310 failed on
>> connection exception: java.net.ConnectException: Connection refused
>>at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
>>at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>at $Proxy4.getProtocolVersion(Unknown Source)
>>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>at
&g

  1   2   >