what change to be done in OutputCollector to print custom writable object
Hi, I am learning how to make custom-writable working. So I have implemented a simple MyWriitable class. And I can play with the MyWritable object within the Map-Reduce. but suppose in Reduce Values are a type of MyWritable object and I put them into OutputCollector to get final output. Since value is a custom object I can't get them into file but a reference. What and where I have to make changes /additions so that print into file function handles the custom-writable object? Thanks regards, -- - Deepak Diwakar,
Haddop Error Massage
Hi friends, could somebody tell me what does the following quoted massage mean? 3154.42user 76.09system 44:47.21elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (15major+6092226minor)pagefaults 0swaps First part tells about system usage but what is rest part? Is it because of Heap size of program? I am running hadoop task in standalone mode on almost 250GB of compressed data. This error massage comes after finishing the task. Thanks in advance, -- - Deepak Diwakar,
Re: Haddop Error Massage
Thanks friend. 2009/1/19 Miles Osborne mi...@inf.ed.ac.uk that is a timing / space report Miles 2009/1/19 Deepak Diwakar ddeepa...@gmail.com: Hi friends, could somebody tell me what does the following quoted massage mean? 3154.42user 76.09system 44:47.21elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (15major+6092226minor)pagefaults 0swaps First part tells about system usage but what is rest part? Is it because of Heap size of program? I am running hadoop task in standalone mode on almost 250GB of compressed data. This error massage comes after finishing the task. Thanks in advance, -- - Deepak Diwakar, -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- - Deepak Diwakar,
Re: mysql in hadoop
Thanks for the solution u guys provided. actually i got one solution .. If we add the path of any external jar into classpath of hadoop which in $hadoop_home/hadoop-$version/conf/hadoop-env.sh , It will do. thanks for solution 2008/10/20 Gerardo Velez [EMAIL PROTECTED] Hi! Actually I got same problem and temporally I've solved it including jdbc dependecies inside main jar. Actually another solution I've found is you can place all jar dependencias inside hadoop/lib directory. Hope it helps. -- Gerardo On Mon, Oct 20, 2008 at 9:43 AM, Deepak Diwakar [EMAIL PROTECTED] wrote: Hi all, I am sure someone must have tried mysql connection using hadoop. But I am getting problem. Basically I am not getting how to inlcude classpath of jar of jdbc connector in the run command of hadoop or is there any other way so that we can incorporate jdbc connector jar into the main jar which we run using $hadoop-home/bin/hadoop? plz help me . Thanks in advance, -- - Deepak Diwakar, Associate Software Eng., Pubmatic, pune Contact: +919960930405
mysql in hadoop
Hi all, I am sure someone must have tried mysql connection using hadoop. But I am getting problem. Basically I am not getting how to inlcude classpath of jar of jdbc connector in the run command of hadoop or is there any other way so that we can incorporate jdbc connector jar into the main jar which we run using $hadoop-home/bin/hadoop? plz help me . Thanks in advance,
runtime change of hadoop.tmp.dir
Hi I am using hadoop in standalone mode. I want to change hadoop.tmp.dir on the runtime. I found that there is an option -D in command line to set configurable parameter, but did not succeeded. It would be really helpful to me if somebody could put exact syntax for that? Thanks regards, -- - Deepak Diwakar,
Re: parallel hadoop process reading same input file
my good luck. i resolved the problem. To run more than one map task you need to have different hadoop directory then go to /hadoop-home/conf. and copy the following property from hadoop-default.xml: property namehadoop.tmp.dir/name value/tmp/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property and paste into file hadoop-site.xml and change the value field different-different for different-different hadoop directory. Then there would not be any conflict while keeping the intermediate files for the different-different map tasks. Thanks Deepak, 2008/8/29 Deepak Diwakar [EMAIL PROTECTED] I am running two different hadoop map/reduce task in standalone mode on single node which read same folder. I found that Task1 was not able to processed those file which have been processed by Task2 and vice-versa. It gave some IO error. It seems that in standalone mode while processing the file map task usually locks the file internally (Hoping that should not be the case in DFS mode) One more observation I found that two map task can't be run on single task tracker or single node simultaneously(even if you setup two different hadoop directory and try to run map task from both places) . Possible reason I could think for is Hadoop stores its intermediate map /reduce task output into some file format in /tmp/ folder. Hence if we run two map task simultaneously then it finds conflict keep the intermediate files at the same location and results error. This is my interpretation. Any feasible solution are appreciable for the standalone mode. Thanks Deepak 2008/8/28 lohit [EMAIL PROTECTED] Hi Deepak, Can you explain what process and what files they are trying to read? If you are talking about map/reduce tasks reading files on DFS, then, yes parallel reads are allowed. Multiple writers are not. -Lohit - Original Message From: Deepak Diwakar [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Thursday, August 28, 2008 6:06:58 AM Subject: parallel hadoop process reading same input file Hi, When I am running two hadoop processes in parallel and both process has to read same file. It fails. Of course one solution is to keep copy of file into different location so that accessing simultaneously would not cause any problem. But what if we don't want to do so because it costs extra space. Plz do suggest me any suitable solution to this. Thanks Regards, Deepak
parallel hadoop process reading same input file
Hi, When I am running two hadoop processes in parallel and both process has to read same file. It fails. Of course one solution is to keep copy of file into different location so that accessing simultaneously would not cause any problem. But what if we don't want to do so because it costs extra space. Plz do suggest me any suitable solution to this. Thanks Regards, Deepak
Re: parallel hadoop process reading same input file
I am running two different hadoop map/reduce task in standalone mode on single node which read same folder. I found that Task1 was not able to processed those file which have been processed by Task2 and vice-versa. It gave some IO error. It seems that in standalone mode while processing the file map task usually locks the file internally (Hoping that should not be the case in DFS mode) One more observation I found that two map task can't be run on single task tracker or single node simultaneously(even if you setup two different hadoop directory and try to run map task from both places) . Possible reason I could think for is Hadoop stores its intermadiate map /reduce task output into some file format in /tmp/ folder. Hence if we run two map task simultneously then it finds conflict keep the intermadiate files at the same location and results error. This is my interpretation. Any feasible solution are appreciable for the standalone mode. Thanks Deepak 2008/8/28 lohit [EMAIL PROTECTED] Hi Deepak, Can you explain what process and what files they are trying to read? If you are talking about map/reduce tasks reading files on DFS, then, yes parallel reads are allowed. Multiple writers are not. -Lohit - Original Message From: Deepak Diwakar [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Thursday, August 28, 2008 6:06:58 AM Subject: parallel hadoop process reading same input file Hi, When I am running two hadoop processes in parallel and both process has to read same file. It fails. Of course one solution is to keep copy of file into different location so that accessing simultaneously would not cause any problem. But what if we don't want to do so because it costs extra space. Plz do suggest me any suitable solution to this. Thanks Regards, Deepak
input files
Hadoop usually takes either a single file or a folder as an input parameter. But is it possible to modify it so that it can take list of files(not a folder) as input parameter -- - Deepak Diwakar,
Re: Failed to repeat the Quickstart guide for Pseudo-distributed operation
You need to delete hadoop-root directory which has been created through DFS. Usually hadoop creates this directory in /tmp/. after deletion of the directory, just follow the instruction once again. It will work. 2008/7/9 Arun C Murthy [EMAIL PROTECTED]: # bin/hadoop dfs -put conf input 08/06/29 09:38:42 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/input/hadoop-env.sh could only be replicated to 0 nodes, instead of 1 Looks like your datanode didn't come up, anything in the logs? http://wiki.apache.org/hadoop/Help Arun -- - Deepak Diwakar, Associate Software Eng., Pubmatic, pune Contact: +919960930405
parallel mapping on single server
Hi, I am pretty naive to hadoop. I ran a modification of wordcount on almost a TB data on single server, but found that it takes too much time. Actually i found that at a time only one core is utilized even though my server is of 8 cores. I read that hadoop speeds up computation in DFS mode.But how to make full utilization of a single server with multicore processors? Is there in pseudo dfs mode in hadoop? What are the changes required in config files .Please let me know in detail. Is there anything to do with hadoop-site.xml and mapred-default.xml? Thanks in advance. -- - Deepak Diwakar, Associate Software Eng., Pubmatic, pune Contact: +919960930405