what change to be done in OutputCollector to print custom writable object

2009-04-01 Thread Deepak Diwakar
Hi,

I am learning how to make custom-writable working. So I have implemented a
simple MyWriitable class.

And  I can play with the MyWritable object within the Map-Reduce. but
suppose in Reduce Values are a type of MyWritable object and  I put them
into OutputCollector to get final output. Since value is a custom object I
can't get  them into file but a reference.

 What and where I have to make changes /additions so that print into file
function handles the custom-writable object?

Thanks  regards,
-- 
- Deepak Diwakar,


Haddop Error Massage

2009-01-19 Thread Deepak Diwakar
Hi friends,

could somebody tell me what does the following quoted massage mean?

 3154.42user 76.09system 44:47.21elapsed 120%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (15major+6092226minor)pagefaults 0swaps

First part tells about system usage but what is rest part? Is it because of
Heap size of program?

I am running hadoop task in standalone  mode on almost 250GB of compressed
data.

This error massage  comes after finishing the task.

Thanks in advance,
-- 
- Deepak Diwakar,


Re: Haddop Error Massage

2009-01-19 Thread Deepak Diwakar
Thanks friend.


2009/1/19 Miles Osborne mi...@inf.ed.ac.uk

 that is a timing / space report

 Miles

 2009/1/19 Deepak Diwakar ddeepa...@gmail.com:
  Hi friends,
 
  could somebody tell me what does the following quoted massage mean?
 
   3154.42user 76.09system 44:47.21elapsed 120%CPU (0avgtext+0avgdata
  0maxresident)k
  0inputs+0outputs (15major+6092226minor)pagefaults 0swaps
 
  First part tells about system usage but what is rest part? Is it because
 of
  Heap size of program?
 
  I am running hadoop task in standalone  mode on almost 250GB of
 compressed
  data.
 
  This error massage  comes after finishing the task.
 
  Thanks in advance,
  --
  - Deepak Diwakar,
 



 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.




-- 
- Deepak Diwakar,


Re: mysql in hadoop

2008-10-21 Thread Deepak Diwakar
Thanks for the solution u guys provided.

actually  i got one solution ..
If we add the path of any external jar into classpath of hadoop which in
$hadoop_home/hadoop-$version/conf/hadoop-env.sh , It will do.

thanks for solution

2008/10/20 Gerardo Velez [EMAIL PROTECTED]

 Hi!

 Actually I got same problem and temporally I've solved it including jdbc
 dependecies inside main jar.

 Actually another solution I've found is you can place all jar dependencias
 inside hadoop/lib directory.


 Hope it helps.


 -- Gerardo


 On Mon, Oct 20, 2008 at 9:43 AM, Deepak Diwakar [EMAIL PROTECTED]
 wrote:

  Hi all,
 
  I am sure someone must have tried mysql connection using hadoop. But I am
  getting problem.
  Basically I am not getting how to inlcude classpath of jar of jdbc
  connector
  in the run command of hadoop or is there any other way so that we can
  incorporate  jdbc connector jar into the main jar which we run using
  $hadoop-home/bin/hadoop?
 
  plz help me .
 
  Thanks in advance,
 




-- 
- Deepak Diwakar,
Associate Software Eng.,
Pubmatic, pune
Contact: +919960930405


mysql in hadoop

2008-10-20 Thread Deepak Diwakar
Hi all,

I am sure someone must have tried mysql connection using hadoop. But I am
getting problem.
Basically I am not getting how to inlcude classpath of jar of jdbc connector
in the run command of hadoop or is there any other way so that we can
incorporate  jdbc connector jar into the main jar which we run using
$hadoop-home/bin/hadoop?

plz help me .

Thanks in advance,


runtime change of hadoop.tmp.dir

2008-09-04 Thread Deepak Diwakar
Hi

I am using hadoop in standalone mode. I want to change hadoop.tmp.dir on the
runtime.
I found that there is an option -D in command line to set configurable
parameter, but did not succeeded.

It would be really helpful to me if  somebody could put exact syntax for
that?

Thanks  regards,
-- 
- Deepak Diwakar,


Re: parallel hadoop process reading same input file

2008-08-29 Thread Deepak Diwakar
 my good luck. i resolved the problem. To run more than one map task you
need to have different hadoop directory
then go to /hadoop-home/conf. and copy the following property from
hadoop-default.xml:
property
  namehadoop.tmp.dir/name
  value/tmp/hadoop-${user.name}/value
  descriptionA base for other temporary directories./description
/property

and paste into file hadoop-site.xml and change the value field
different-different for different-different hadoop directory. Then there
would not be any conflict while keeping the intermediate files for the
different-different map tasks.

Thanks
Deepak,

2008/8/29 Deepak Diwakar [EMAIL PROTECTED]

 I am running  two different hadoop map/reduce task in standalone mode on
 single node which read same folder. I found that Task1 was not  able to
 processed those file which have  been processed by Task2 and vice-versa. It
 gave some IO error. It seems that in standalone mode  while processing the
 file map task usually locks the file internally (Hoping that should not  be
 the case in DFS mode)

 One more observation I found that two map task can't be run on single task
 tracker or single node simultaneously(even if you setup two different hadoop
 directory and try to run map task from both places) . Possible reason I
 could think for is  Hadoop stores its intermediate map /reduce task output
 into some file format in /tmp/ folder. Hence if we run two map task
 simultaneously then it finds conflict keep the intermediate files at the
 same location and results error.

 This is my interpretation.

 Any feasible solution are appreciable for the standalone mode.

 Thanks
 Deepak



 2008/8/28 lohit [EMAIL PROTECTED]

 Hi Deepak,
 Can you explain what process and what files they are trying to read? If
 you are talking about map/reduce tasks reading files on DFS, then, yes
 parallel reads are allowed. Multiple writers are not.
 -Lohit



 - Original Message 
 From: Deepak Diwakar [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Thursday, August 28, 2008 6:06:58 AM
 Subject: parallel hadoop process reading same input file

 Hi,

 When I am running two hadoop processes in parallel and both process has to
 read same file. It fails.
 Of course one solution is to keep copy of file into different location so
 that accessing simultaneously would not cause any problem. But what if we
 don't want to do so because it costs extra space.
 Plz do suggest me any suitable solution to this.

 Thanks  Regards,
 Deepak









parallel hadoop process reading same input file

2008-08-28 Thread Deepak Diwakar
Hi,

When I am running two hadoop processes in parallel and both process has to
read same file. It fails.
Of course one solution is to keep copy of file into different location so
that accessing simultaneously would not cause any problem. But what if we
don't want to do so because it costs extra space.
Plz do suggest me any suitable solution to this.

Thanks  Regards,
Deepak


Re: parallel hadoop process reading same input file

2008-08-28 Thread Deepak Diwakar
I am running  two different hadoop map/reduce task in standalone mode on
single node which read same folder. I found that Task1 was not  able to
processed those file which have  been processed by Task2 and vice-versa. It
gave some IO error. It seems that in standalone mode  while processing the
file map task usually locks the file internally (Hoping that should not  be
the case in DFS mode)

One more observation I found that two map task can't be run on single task
tracker or single node simultaneously(even if you setup two different hadoop
directory and try to run map task from both places) . Possible reason I
could think for is  Hadoop stores its intermadiate map /reduce task output
into some file format in /tmp/ folder. Hence if we run two map task
simultneously then it finds conflict keep the intermadiate files at the same
location and results error.

This is my interpretation.

Any feasible solution are appreciable for the standalone mode.

Thanks
Deepak



2008/8/28 lohit [EMAIL PROTECTED]

 Hi Deepak,
 Can you explain what process and what files they are trying to read? If you
 are talking about map/reduce tasks reading files on DFS, then, yes parallel
 reads are allowed. Multiple writers are not.
 -Lohit



 - Original Message 
 From: Deepak Diwakar [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Thursday, August 28, 2008 6:06:58 AM
 Subject: parallel hadoop process reading same input file

 Hi,

 When I am running two hadoop processes in parallel and both process has to
 read same file. It fails.
 Of course one solution is to keep copy of file into different location so
 that accessing simultaneously would not cause any problem. But what if we
 don't want to do so because it costs extra space.
 Plz do suggest me any suitable solution to this.

 Thanks  Regards,
 Deepak




input files

2008-08-20 Thread Deepak Diwakar
Hadoop usually takes either a single file or a folder as an input parameter.
But is it possible to modify it so that it can take list of files(not a
folder) as input parameter


-- 
- Deepak Diwakar,


Re: Failed to repeat the Quickstart guide for Pseudo-distributed operation

2008-07-08 Thread Deepak Diwakar
You need to delete hadoop-root directory which has been created through DFS.
Usually hadoop creates this directory in /tmp/.
after deletion of the directory, just follow the instruction once again. It
will work.

2008/7/9 Arun C Murthy [EMAIL PROTECTED]:

 # bin/hadoop dfs -put conf input

 08/06/29 09:38:42 INFO dfs.DFSClient:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /user/root/input/hadoop-env.sh could only be replicated to 0 nodes, instead
 of 1



 Looks like your datanode didn't come up, anything in the logs?
 http://wiki.apache.org/hadoop/Help

 Arun





-- 
- Deepak Diwakar,
Associate Software Eng.,
Pubmatic, pune
Contact: +919960930405


parallel mapping on single server

2008-07-07 Thread Deepak Diwakar
Hi,

I am pretty naive to hadoop. I ran a modification of wordcount  on almost a
TB data on single server, but found that it takes too much time. Actually i
found that at a time only one core is utilized even though my server is of 8
cores.  I read that hadoop speeds up computation in DFS mode.But how to make
full utilization of a single server with multicore processors?  Is there in
pseudo dfs mode in hadoop? What are the changes required in config files
.Please let me know in detail. Is there anything to do with  hadoop-site.xml
and mapred-default.xml?

Thanks in advance.
-- 
- Deepak Diwakar,
Associate Software Eng.,
Pubmatic, pune
Contact: +919960930405