Owen O'Malley wrote:
On Mar 16, 2009, at 4:29 AM, Steve Loughran wrote:

I spoke with someone from the local university on their High Energy Physics problems last week -their single event files are about 2GB, so that's the only sensible block size to use when scheduling work. He'll be at ApacheCon next week, to make his use cases known.

I don't follow. Not all files need to be 1 block long. If your files are 2GB, 1GB blocks should be fine and I've personally tested those when I've wanted to have longer maps. (The block size of a dataset is the natural size of the input for each map.)

within a single 2GB event, data access is very random; you'd need all 2GB on a single machine and efficient random-access within it. The natural size for each map -and hence block- really is 2GB.

Reply via email to