Re: Sharing Memory across Map task [multiple cores] runing in same machine

2008-09-05 Thread Owen O'Malley
On Fri, Sep 5, 2008 at 12:59 AM, Amit Kumar Singh
<[EMAIL PROTECTED]>wrote:

> Can we use something like RAM FS to share static data across map tasks.


As others have said, this won't work right. You probably should look at
MultiThreadMapRunner,
which uses a thread pool to process the inputs. It is typically used for
crawling or other map methods that take long times per a record. If you have
substantial work inside the map, you can saturate CPUs that way. Of course
the downside is that you have a single RecordReader feeding you inputs, so
you are limited by the reading speed of a single HDFS client.

-- Owen


Re: Sharing Memory across Map task [multiple cores] runing in same machine

2008-09-05 Thread Devaraj Das

Hadoop doesn't support this natively. So if you need this kind of a
functionality, you'd need to code your application in such a way. But I am
worried about the race conditions in determining which task should first
create the ramfs and load the data.
If you can provide atomicity in determining whether the ramfs has been
created and data loaded, and if not, then do the creation/load, then things
should work. 
If atomicity cannot be guaranteed, you might consider this -
1) Run a job with only maps that creates the ramfs and loads the data (if
your cluster is small you can do this manually). You can use distributed
cache to store the data you want to load.
2) Run your job that processes the data
3) Run a third job to delete the ramfs.


On 9/5/08 1:29 PM, "Amit Kumar Singh" <[EMAIL PROTECTED]> wrote:

> Can we use something like RAM FS to share static data across map tasks.
> 
> Scenario,
> 1) Quadcore machine
> 2) 2 1-TB Disk
> 3) 8 GB ram,
> 
> Now Ii need ~2.7 GB ram per Map process to load some static data in memory
> using which i would be processing data.(cpu intensive jobs)
> 
> Can i share memory across mappers on the same machine so that memory
> footprint is less and i can run more than 4 mappers simultaneously
> utilizing all 4 cores.
> 
> Can we use stuff like RamFS
> 




Sharing Memory across Map task [multiple cores] runing in same machine

2008-09-05 Thread Amit Kumar Singh
Can we use something like RAM FS to share static data across map tasks.

Scenario,
1) Quadcore machine
2) 2 1-TB Disk
3) 8 GB ram,

Now Ii need ~2.7 GB ram per Map process to load some static data in memory
using which i would be processing data.(cpu intensive jobs)

Can i share memory across mappers on the same machine so that memory
footprint is less and i can run more than 4 mappers simultaneously
utilizing all 4 cores.

Can we use stuff like RamFS



Sharing Memory across Map task [multiple cores] runing in same machine

2008-09-05 Thread Amit Kumar Singh
Can we use something like RAM FS to share static data across map tasks.

Scenario,
1) Quadcore machine
2) 2 1-TB Disk
3) 8 GB ram,

Now Ii need ~2.7 GB ram per Map process to load some static data in memory
using which i would be processing data.(cpu intensive jobs)

Can i share memory across mappers on the same machine so that memory
footprint is less and i can run more than 4 mappers simultaneously
utilizing all 4 cores.

Can we use stuff like RamFS