subject:"\[GRASS\-dev\] \[SoC\] Parallelization of Raster and Vector modules"

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-15 Thread Jordan Neumeyer

On Mon, Apr 12, 2010 at 11:35 AM,  wrote:

> >So you "tried" to do that? Does that mean you failed or is the system
> still processing it? :-P I kid, but that is a very large file... it's bigger
> than my current desktop's hard drive.
>
> Jordan,
>
> It would be a shame to leave those multicore  64 bit processors with
> multi-terabyte hard drives with nothing to challange them.  Just think of me
> as part of the hardware entertainment committee. :-)
>
Good, they need to be kept busy. Don't want to bore them. ;-)

> The process ran overnight and finished, I just have not gotten to phase 2
> of that particular project, chopping the resulting image into overlapping
> DOQQ - size images.  I was going to go back and change the projection method
> slightly, hopefully to make it more accurate.  If you're looking for big
> images to play with, go to http://datagateway.nrcs.usda.gov/  and select a
> naip county mosaic of imagery.  The mosaic comes as a MrSid format file, but
> once you uncompress it ( I suggest using gdal's nearblack utility ) to ERDAS
> Imagine format, the resulting image will probably be in the multi-gigbyte
> range per county. <http://datagateway.nrcs.usda.gov/>
>
Well that's good. It only took overnight for 500 GB? That doesn't sound too
bad. I'm not looking for huge maps, but some decent-sized samples would be
good for testing. Thanks for the resource I'll look into it if I need some
samples for testing. The warning at the bottom the site is a bit nerving
though... I'm not a government official.

>  Doug
>
>
>
>
> Doug Newcomb
> USFWS
> Raleigh, NC
> 919-856-4520 ext. 14 doug_newc...@fws.gov
>
> -
> The opinions I express are my own and are not representative of the
> official policy of the U.S.Fish and Wildlife Service or Dept. of the
> Interior.   Life is too short for undocumented, proprietary data formats.
>
>
>  *Jordan Neumeyer *
> Sent by: grass-dev-boun...@lists.osgeo.org
>
> 04/08/2010 01:09 AM
>   To
> doug_newc...@fws.gov
> cc
> GRASS developers list 
> Subject
> Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules
>
>
>
>
> Thanks guys for putting that into perspective.
>
> On Mon, Apr 5, 2010 at 6:38 PM, <*doug_newc...@fws.gov*>
> wrote:
>
> Doug
>
> Doug Newcomb
> USFWS
> Raleigh, NC
> 919-856-4520 ext. 14 *doug_newc...@fws.gov* 
>
> -
> The opinions I express are my own and are not representative of the
> official policy of the U.S.Fish and Wildlife Service or Dept. of the
> Interior.   Life is too short for undocumented, proprietary data formats.
>
> -*grass-dev-boun...@lists.osgeo.org*wrote:
>  -----
>
> To: Jordan Neumeyer 
> <*jordan.neume...@mines.sdsmt.edu*
> >
> From: Markus Neteler <*nete...@osgeo.org* >
> Sent by: 
> *grass-dev-boun...@lists.osgeo.org*
> Date: 04/05/2010 04:06AM
> cc: GRASS developers list 
> <*grass-...@lists.osgeo.org*
> >
> Subject: Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules
>
>
> On Sun, Apr 4, 2010 at 3:22 AM, Jordan Neumeyer
> <*jordan.neume...@mines.sdsmt.edu* >
> wrote:
> > I didn't realize how big the data set could be. What's
> > biggest map you've seen?
>
> Our provincial DEM is a 3.5GB Geotiff which is of 48800x58000 size.
> Another file which I recently had to import was a 4GB Geotiff with
> 21550 bands. Finally, in remote sensing, you can quickly generate
> quickly files in the multi-GB range.
>
> Markus
> ___
> grass-dev mailing list*
> **grass-...@lists.osgeo.org* *
> **http://lists.osgeo.org/mailman/listinfo/grass-dev*<http://lists.osgeo.org/mailman/listinfo/grass-dev>
>
> ~Jordan
> ___
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev
>
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-14 Thread Yann Chemin

Hello,

was wondering if the actual raster library API could be extended to
read/write a given n contiguous number of lines in a buffer instead of
a single one? This maybe of help for parallelization...

Regards,
Yann



On 4 April 2010 11:22, Jordan Neumeyer  wrote:
>
> On Thu, Apr 1, 2010 at 3:24 PM, Glynn Clements 
> wrote:
>>
>> Jordan Neumeyer wrote:
>>
>> > > > Just kind of my thought process about how I would try to go about
>> > > > parallelizing a module.
>> > >
>> > > The main issue with parallelising raster input is that the library
>> > > keeps a copy of the current row's data, so that consecutive reads of
>> > > the same row (as happen when upsampling) don't re-read the data.
>> > >
>> > > For concurrent access to a single map, you would need to either keep
>> > > one row per thread, or abandon caching. Also, you would need to use
>> > > pread() rather than fseek()+read().
>> >
>> > It sounds like you're talking about parallelism in I/O from a file or
>> > database. Neither of which is my intent or goal for this project. I will
>> > parallelize things after they have already been read into memory, and
>> > tasks
>> > are processor intensive. I wouldn't want parallelize any I/O, but if I
>> > were
>> > to optimize I/O. I would make all operations I/O asynchronous, which is
>> > can
>> > mimic parallelism in a sense. Queuing up the chunks of data and then
>> > processing them as resources become available.
>>
>> Most GRASS raster modules process data row-by-row, rather than reading
>> entire maps into memory. Reading maps into memory is frowned upon, as
>> GRASS is regularly used with maps which are too large to fit into
>> memory. Where the algorithm cannot operate row-by-row, use of a tile
>> cache is the next best alternative; see e.g. r.proj.seg (renamed to
>> r.proj in 7.0).
>
>
> That makes more sense. So a row is like chunk from the map data? Kind of
> like the first row of pixels from an image. So from the first pixel to width
> of image is one row, then width plus one starts the next, and so on and so
> forth. How large are the rows generally?
>
>>
>> Holding an entire map in memory is only considered acceptable if the
>> algorithm is inherently so slow that processing a gigabyte-sized map
>> simply wouldn't be feasible, or the access pattern is such that even a
>> tile-cache approach isn't feasible.
>>
>> In general, GRASS should be able to process multi-gigabyte maps even
>> on 32-bit systems, and work on multi-user systems where a process
>> cannot assume that it can use a significant proportion of the system's
>> total physical memory.
>
>
> Which is good. I didn't realize how big the data set could be. What's
> biggest map you've seen?
>
>>
>> > > It's more straightfoward to read multiple maps concurrently. In 7.0,
>> > > this case should be thread-safe.
>> > >
>> > > Alternatively, you could have one thread for reading, one for writing,
>> > > and multiple worker threads for the actual processing. However, unless
>> > > the processing is complex, I/O will be the bottleneck.
>> > >
>> >
>> > I/O is generally a bottleneck anyway. Something always tends to be
>> > waiting
>> > on another.
>>
>> When I refer to I/O, I'm referring not just to read() and write(), but
>> also the (de)compression, conversion and resampling, i.e. everything
>> performed by the get/put-row functions. For many GRASS modules, this
>> takes more time than the actual processing.
>
> I can see why, especially for big maps since it's doing that row-by-row.
> So when a GRASS module loads a map the basic algorithm looks something like:
> 1) Read row
> 2) get-row function does necessary preprocessing
> 3) row is cached or held in memory. Does the caching take place after
> 4) row is processed
> 5) Display/write process ? (Or is this after a couple iterations, all of
> them?)
> 5) repeat (1)
>
> Would it be beneficial/practical to parallelize some of the preprocessing
> like conversion and resampling before the caching occurs?
>
>>
>> Finally, the thread title refers to libraries. Very little processing
>> occurs in the libraries; most of it is in the individual modules. So
>> there isn't much scope for "parallelising" the libraries. The main
>> issue for library functions is to ensure that they are thread-safe.
>> Most of the necessary work for the raster library has been done in
>> 7.0.
>
>
> I was trying to refer to all of the raster modules as a whole, but library
> is just what the modules share. I've changed the title from Parallelization
> of Raster and Vector libraries to Parallelization of Raster and Vector
> modules.
>
> Would I be working on GRASS 6.x or 7.x? Is there a minimum compiler version
> when using GCC/MingW? Just curious because openMP tasks are only supported
> on GCC >= 4.2. Which may or not be useful, but can be a valuable tool when
> you don't know how much data or how many "tasks" you have. Like processing a
> linked-list or binary trees.
>
>>
>> --
>> Glynn Clements 
>
> ~Jordan
>
>
> _

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-12 Thread Doug_Newcomb

>>Jordan,
>>I've dealt with ERDAS Imagine files larger than 10 GB on a regular 
basis.  I have occasionally tried to >>reproject and merge all of the 1 m 
NAIP imagery tiles  for North Carolina into 1 BigTIFF > 500GB with gdal. 
>> Any parallelizaion work for open source geospatial tools would be 
welcome :-).
>So you "tried" to do that? Does that mean you failed or is the system 
still processing it? :-P I kid, but that is a very large file... it's 
bigger than my current desktop's hard drive. 
Jordan,
It would be a shame to leave those multicore  64 bit processors with 
multi-terabyte hard drives with nothing to challange them.  Just think of 
me as part of the hardware entertainment committee. :-) 
The process ran overnight and finished, I just have not gotten to phase 2 
of that particular project, chopping the resulting image into overlapping 
DOQQ - size images.  I was going to go back and change the projection 
method slightly, hopefully to make it more accurate.  If you're looking 
for big images to play with, go to http://datagateway.nrcs.usda.gov/  and 
select a naip county mosaic of imagery.  The mosaic comes as a MrSid 
format file, but once you uncompress it ( I suggest using gdal's nearblack 
utility ) to ERDAS Imagine format, the resulting image will probably be in 
the multi-gigbyte range per county.

Doug
 

Doug Newcomb 
USFWS
Raleigh, NC
919-856-4520 ext. 14 doug_newc...@fws.gov
-
The opinions I express are my own and are not representative of the 
official policy of the U.S.Fish and Wildlife Service or Dept. of the 
Interior.   Life is too short for undocumented, proprietary data formats.



Jordan Neumeyer  
Sent by: grass-dev-boun...@lists.osgeo.org
04/08/2010 01:09 AM

To
doug_newc...@fws.gov
cc
GRASS developers list 
Subject
Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules






Thanks guys for putting that into perspective.

On Mon, Apr 5, 2010 at 6:38 PM,  wrote:

Doug

Doug Newcomb
USFWS
Raleigh, NC
919-856-4520 ext. 14 doug_newc...@fws.gov
-
The opinions I express are my own and are not representative of the 
official policy of the U.S.Fish and Wildlife Service or Dept. of the 
Interior.   Life is too short for undocumented, proprietary data formats.

-grass-dev-boun...@lists.osgeo.org wrote: -

To: Jordan Neumeyer 
From: Markus Neteler 
Sent by: grass-dev-boun...@lists.osgeo.org
Date: 04/05/2010 04:06AM
cc: GRASS developers list 
Subject: Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector 
modules

On Sun, Apr 4, 2010 at 3:22 AM, Jordan Neumeyer
 wrote:
> I didn't realize how big the data set could be. What's
> biggest map you've seen?

Our provincial DEM is a 3.5GB Geotiff which is of 48800x58000 size.
Another file which I recently had to import was a 4GB Geotiff with
21550 bands. Finally, in remote sensing, you can quickly generate
quickly files in the multi-GB range.

Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

~Jordan 
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-07 Thread Jordan Neumeyer

Thanks guys for putting that into perspective.

On Mon, Apr 5, 2010 at 6:38 PM,  wrote:

>  Jordan,
> I've dealt with ERDAS Imagine files larger than 10 GB on a regular basis.
>  I have occasionally tried to reproject and merge all of the 1 m NAIP
> imagery tiles  for North Carolina into 1 BigTIFF > 500GB with gdal.  Any
> parallelizaion work for open source geospatial tools would be welcome :-).
>
So you "tried" to do that? Does that mean you failed or is the system still
processing it? :-P I kid, but that is a very large file... it's bigger than
my current desktop's hard drive.

>
> Doug
>
> Doug Newcomb
> USFWS
> Raleigh, NC
> 919-856-4520 ext. 14 doug_newc...@fws.gov
>
> -
> The opinions I express are my own and are not representative of the
> official policy of the U.S.Fish and Wildlife Service or Dept. of the
> Interior.   Life is too short for undocumented, proprietary data formats.
>
> -grass-dev-boun...@lists.osgeo.org wrote: -
>
> To: Jordan Neumeyer 
> From: Markus Neteler 
> Sent by: grass-dev-boun...@lists.osgeo.org
> Date: 04/05/2010 04:06AM
> cc: GRASS developers list 
> Subject: Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules
>
> On Sun, Apr 4, 2010 at 3:22 AM, Jordan Neumeyer
>  wrote:
> > I didn't realize how big the data set could be. What's
> > biggest map you've seen?
>
> Our provincial DEM is a 3.5GB Geotiff which is of 48800x58000 size.
> Another file which I recently had to import was a 4GB Geotiff with
> 21550 bands. Finally, in remote sensing, you can quickly generate
> quickly files in the multi-GB range.
>
> Markus
> ___
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev
>

~Jordan
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-05 Thread Doug_Newcomb

Jordan,
I've dealt with ERDAS Imagine files larger than 10 GB on a regular basis.
I have occasionally tried to reproject and merge all of the 1 m NAIP
imagery tiles  for North Carolina into 1 BigTIFF > 500GB with gdal.  Any
parallelizaion work for open source geospatial tools would be welcome :-).

Doug

Doug Newcomb
USFWS
Raleigh, NC
919-856-4520 ext. 14 doug_newc...@fws.gov
-

The opinions I express are my own and are not representative of the
official policy of the U.S.Fish and Wildlife Service or Dept. of the
Interior.   Life is too short for undocumented, proprietary data formats.

-grass-dev-boun...@lists.osgeo.org wrote: -

To: Jordan Neumeyer 
From: Markus Neteler 
Sent by: grass-dev-boun...@lists.osgeo.org
Date: 04/05/2010 04:06AM
cc: GRASS developers list 
Subject: Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

On Sun, Apr 4, 2010 at 3:22 AM, Jordan Neumeyer
 wrote:
> I didn't realize how big the data set could be. What's
> biggest map you've seen?

Our provincial DEM is a 3.5GB Geotiff which is of 48800x58000 size.
Another file which I recently had to import was a 4GB Geotiff with
21550 bands. Finally, in remote sensing, you can quickly generate
quickly files in the multi-GB range.

Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-05 Thread Yann Chemin

Yes indeed, only to process simple equations on multi-Gb images takes some time.

On 5 April 2010 18:06, Markus Neteler  wrote:
> On Sun, Apr 4, 2010 at 3:22 AM, Jordan Neumeyer
>  wrote:
>> I didn't realize how big the data set could be. What's
>> biggest map you've seen?
>
> Our provincial DEM is a 3.5GB Geotiff which is of 48800x58000 size.
> Another file which I recently had to import was a 4GB Geotiff with
> 21550 bands. Finally, in remote sensing, you can quickly generate
> quickly files in the multi-GB range.
>
> Markus
> ___
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev
>



-- 
Yann Chemin
Senior Spatial Hydrologist
www.csu.edu.au/research/icwater
M +61-4-3740 7019
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-05 Thread Markus Neteler

On Sun, Apr 4, 2010 at 3:22 AM, Jordan Neumeyer
 wrote:
> I didn't realize how big the data set could be. What's
> biggest map you've seen?

Our provincial DEM is a 3.5GB Geotiff which is of 48800x58000 size.
Another file which I recently had to import was a 4GB Geotiff with
21550 bands. Finally, in remote sensing, you can quickly generate
quickly files in the multi-GB range.

Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-04 Thread Glynn Clements

Jordan Neumeyer wrote:

> > Most GRASS raster modules process data row-by-row, rather than reading
> > entire maps into memory. Reading maps into memory is frowned upon, as
> > GRASS is regularly used with maps which are too large to fit into
> > memory. Where the algorithm cannot operate row-by-row, use of a tile
> > cache is the next best alternative; see e.g. r.proj.seg (renamed to
> > r.proj in 7.0).
> >
> 
> That makes more sense. So a row is like chunk from the map data? Kind of
> like the first row of pixels from an image. So from the first pixel to width
> of image is one row, then width plus one starts the next, and so on and so
> forth. How large are the rows generally?

The width is determined by the current region. Input data is cropped,
padded with nulls and/or (nearest-neighbour) resampled to match the
current region.

The primary functions for reading raster data are:

6.x (and earlier):
int G_get_raster_row(int fd, void *buf, int row, RASTER_MAP_TYPE type);
int G_get_c_raster_row(int fd, CELL  *buf, int row);
int G_get_f_raster_row(int fd, FCELL *buf, int row);
int G_get_d_raster_row(int fd, DCELL *buf, int row);

7.0:
void Rast_get_row(int fd, void *buf, int row, RASTER_MAP_TYPE type);
void Rast_get_c_row(int fd, CELL  *buf, int row);
void Rast_get_f_row(int fd, FCELL *buf, int row);
void Rast_get_d_row(int fd, DCELL *buf, int row);

The buffer must contain space for G_window_cols() (Rast_window_cols() in
7.0) values. "row" must be non-negative and less than G_window_rows()
(Rast_window_rows() in 7.0); rows can be read in any order.

For writing, the functions are:

6.x (and earlier):
int G_put_raster_row(int fd, const void *buf, RASTER_MAP_TYPE type);
int G_put_c_raster_row(int fd, const CELL *buf);
int G_put_f_raster_row(int fd, const FCELL *buf);
int G_put_d_raster_row(int fd, const DCELL *buf);

7.0:
void Rast_put_row(int fd, const void *buf, RASTER_MAP_TYPE type);
void Rast_put_c_row(int fd, const CELL *buf);
void Rast_put_f_row(int fd, const FCELL *buf);
void Rast_put_d_row(int fd, const DCELL *buf);

Exactly G_window_rows()/Rast_window_rows() rows must be written,
sequentially and top-to-bottom, hence the lack of a "row" parameter.

Where possible, modules keep as few rows in memory as possible. E.g. 
r.series keeps a single row from each input map and a single row of
output data. r.resamp.interp keeps 1 (nearest), 2 (bilinear), or 4
(bicubic) rows of input and one row of output. r.neighbors keeps as
many map rows as there are rows in the neighbourhood window (i.e. a
sliding window).

The core of a simple raster module might look something like:

rows = G_window_rows();
cols = G_window_cols();

for (row = 0; row < rows; row++ {
G_get_d_raster_row(infd, inbuf, row);
...
for (col = 0; col < cols; col++ {
...
}
...
G_put_d_raster_row(outfd, outbuf);
}

> > Holding an entire map in memory is only considered acceptable if the
> > algorithm is inherently so slow that processing a gigabyte-sized map
> > simply wouldn't be feasible, or the access pattern is such that even a
> > tile-cache approach isn't feasible.
> >
> > In general, GRASS should be able to process multi-gigabyte maps even
> > on 32-bit systems, and work on multi-user systems where a process
> > cannot assume that it can use a significant proportion of the system's
> > total physical memory.
> 
> Which is good. I didn't realize how big the data set could be. What's
> biggest map you've seen?

I'm a programmer who works on GRASS mainly out of interest, so I don't
have to actually deal with large datasets. But 1x1 doesn't
seem to be that uncommon here. The 2GiB limit on the (compressed) data
used to be a common problem until we added large file support. Using
r.series with 365 input maps isn't uncommon.

> > When I refer to I/O, I'm referring not just to read() and write(), but
> > also the (de)compression, conversion and resampling, i.e. everything
> > performed by the get/put-row functions. For many GRASS modules, this
> > takes more time than the actual processing.
> 
> I can see why, especially for big maps since it's doing that row-by-row.
> So when a GRASS module loads a map the basic algorithm looks something like:
> 1) Read row
> 2) get-row function does necessary preprocessing
> 3) row is cached or held in memory. Does the caching take place after
> 4) row is processed
> 5) Display/write process ? (Or is this after a couple iterations, all of
> them?)
> 5) repeat (1)
> 
> Would it be beneficial/practical to parallelize some of the preprocessing
> like conversion and resampling before the caching occurs?

Reading involves:

1. read()
2. Decompress (RLE or zlib)
3. Convert from portable representation to internal representation
(e.g. by

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

2010-04-03 Thread Jordan Neumeyer

On Thu, Apr 1, 2010 at 3:24 PM, Glynn Clements wrote:

>
> Jordan Neumeyer wrote:
>
> > > > Just kind of my thought process about how I would try to go about
> > > > parallelizing a module.
> > >
> > > The main issue with parallelising raster input is that the library
> > > keeps a copy of the current row's data, so that consecutive reads of
> > > the same row (as happen when upsampling) don't re-read the data.
> > >
> > > For concurrent access to a single map, you would need to either keep
> > > one row per thread, or abandon caching. Also, you would need to use
> > > pread() rather than fseek()+read().
> >
> > It sounds like you're talking about parallelism in I/O from a file or
> > database. Neither of which is my intent or goal for this project. I will
> > parallelize things after they have already been read into memory, and
> tasks
> > are processor intensive. I wouldn't want parallelize any I/O, but if I
> were
> > to optimize I/O. I would make all operations I/O asynchronous, which is
> can
> > mimic parallelism in a sense. Queuing up the chunks of data and then
> > processing them as resources become available.
>
> Most GRASS raster modules process data row-by-row, rather than reading
> entire maps into memory. Reading maps into memory is frowned upon, as
> GRASS is regularly used with maps which are too large to fit into
> memory. Where the algorithm cannot operate row-by-row, use of a tile
> cache is the next best alternative; see e.g. r.proj.seg (renamed to
> r.proj in 7.0).
>

That makes more sense. So a row is like chunk from the map data? Kind of
like the first row of pixels from an image. So from the first pixel to width
of image is one row, then width plus one starts the next, and so on and so
forth. How large are the rows generally?

>
> Holding an entire map in memory is only considered acceptable if the
> algorithm is inherently so slow that processing a gigabyte-sized map
> simply wouldn't be feasible, or the access pattern is such that even a
> tile-cache approach isn't feasible.
>
> In general, GRASS should be able to process multi-gigabyte maps even
> on 32-bit systems, and work on multi-user systems where a process
> cannot assume that it can use a significant proportion of the system's
> total physical memory.
>

Which is good. I didn't realize how big the data set could be. What's
biggest map you've seen?

>  > > It's more straightfoward to read multiple maps concurrently. In 7.0,
> > > this case should be thread-safe.
> > >
> > > Alternatively, you could have one thread for reading, one for writing,
> > > and multiple worker threads for the actual processing. However, unless
> > > the processing is complex, I/O will be the bottleneck.
> > >
> >
> > I/O is generally a bottleneck anyway. Something always tends to be
> waiting
> > on another.
>
> When I refer to I/O, I'm referring not just to read() and write(), but
> also the (de)compression, conversion and resampling, i.e. everything
> performed by the get/put-row functions. For many GRASS modules, this
> takes more time than the actual processing.
>

I can see why, especially for big maps since it's doing that row-by-row.
So when a GRASS module loads a map the basic algorithm looks something like:
1) Read row
2) get-row function does necessary preprocessing
3) row is cached or held in memory. Does the caching take place after
4) row is processed
5) Display/write process ? (Or is this after a couple iterations, all of
them?)
5) repeat (1)

Would it be beneficial/practical to parallelize some of the preprocessing
like conversion and resampling before the caching occurs?

> Finally, the thread title refers to libraries. Very little processing
> occurs in the libraries; most of it is in the individual modules. So
> there isn't much scope for "parallelising" the libraries. The main
> issue for library functions is to ensure that they are thread-safe.
> Most of the necessary work for the raster library has been done in
> 7.0.
>

I was trying to refer to all of the raster modules as a whole, but library
is just what the modules share. I've changed the title from Parallelization
of Raster and Vector libraries to Parallelization of Raster and Vector
modules.

Would I be working on GRASS 6.x or 7.x? Is there a minimum compiler version
when using GCC/MingW? Just curious because openMP tasks are only supported
on GCC >= 4.2. Which may or not be useful, but can be a valuable tool when
you don't know how much data or how many "tasks" you have. Like processing a
linked-list or binary trees.

>  --
> Glynn Clements 
>

~Jordan
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

Re: [GRASS-dev] [SoC] Parallelization of Raster and Vector modules

9 matches

Site Navigation

Mail list logo

Footer information