Re: [GRASS-dev] [GRASS-user] r.to.vect stats

2017-05-04 Thread Markus Metz
On Fri, May 5, 2017 at 5:07 AM, Vaclav Petras  wrote:
>
> On Wed, May 3, 2017 at 5:34 AM, Moritz Lennert <
mlenn...@club.worldonline.be> wrote:
> >
> > On 02/05/17 15:53, Vaclav Petras wrote:
> >> I'm using pipe_command() which is just convenience function setting
> >> stdout=PIPE. Similarly feed_command() is just setting stdin=PIPE which
> >> I'm not using because I'm feeding the stdout of the other process
> >> directly (stdin=first_process.stdout). What I don't understand,
> >> regardless of using stdin=PIPE or stdin=first_process.stdout for the
> >> second process, is what should be next.
> >
> > Do you really need the in_process.communicate() ? Here's what I used
> > in a local script and it works, without communicate(). Then again,
> > I don't think the data flowing through this pipe ever exceeded
available memory.
> >
> > pin = gscript.pipe_command('v.db.select',
> >map = firms_map,
> > ...
> > total_turnover_map = 'turnover_%s' % nace2
> > p = gscript.start_command('r.in.xyz',
> >   input_='-',
> >   stdin=pin.stdout,
> > ...
> > if p.wait() is not 0:
> > gscript.fatal("Error in r.in.xyz with nace %s" % nace2)
>
> The Popen.wait() documentation [1] says: "Warning: This will deadlock
when using stdout=PIPE and/or stderr=PIPE and the child process generates
enough output to a pipe such that it blocks waiting for the OS pipe buffer
to accept more data. Use communicate() to avoid that."
>
> And since I'm using stdout=PIPE (pipe_command()), I use communicate().
What troubles me is that Popen.communicate(input=None) documentation [2]
says: "Note: The data read is buffered in memory, so do not use this method
if the data size is large or unlimited."
>
> It says "data read", so it probably talks about stdout=PIPE when
communicate() does not return None(s) but data (stdout=PIPE and communicate
with the same process), i.e. it doesn't apply to this case and I don't have
to be troubled. As for the wait(), I think that it may work (works most of
the time), it is just not guaranteed to work with large data and it depends
on how smart the OS will be.

Maybe it is safer to store the output of v.out.ascii in a temporary file,
then use that file as input for r.in.xyz. You can then not only check if
v.out.ascii finished successfully, but also use the percent option of
r.in.xyz to reduce memory consumption for large computational regions. The
percent option does not work when piping input to r.in.xyz.

Markus M
>
> Vaclav
>
> [1]
https://docs.python.org/2/library/subprocess.html#subprocess.Popen.wait
> [2]
https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate
>
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [GRASS-user] r.to.vect stats

2017-05-04 Thread Vaclav Petras
On Wed, May 3, 2017 at 5:34 AM, Moritz Lennert 
wrote:
>
> On 02/05/17 15:53, Vaclav Petras wrote:
>> I'm using pipe_command() which is just convenience function setting
>> stdout=PIPE. Similarly feed_command() is just setting stdin=PIPE which
>> I'm not using because I'm feeding the stdout of the other process
>> directly (stdin=first_process.stdout). What I don't understand,
>> regardless of using stdin=PIPE or stdin=first_process.stdout for the
>> second process, is what should be next.
>
> Do you really need the in_process.communicate() ? Here's what I used
> in a local script and it works, without communicate(). Then again,
> I don't think the data flowing through this pipe ever exceeded available
memory.
>
> pin = gscript.pipe_command('v.db.select',
>map = firms_map,
> ...
> total_turnover_map = 'turnover_%s' % nace2
> p = gscript.start_command('r.in.xyz',
>   input_='-',
>   stdin=pin.stdout,
> ...
> if p.wait() is not 0:
> gscript.fatal("Error in r.in.xyz with nace %s" % nace2)

The Popen.wait() documentation [1] says: "Warning: This will deadlock when
using stdout=PIPE and/or stderr=PIPE and the child process generates enough
output to a pipe such that it blocks waiting for the OS pipe buffer to
accept more data. Use communicate() to avoid that."

And since I'm using stdout=PIPE (pipe_command()), I use communicate(). What
troubles me is that Popen.communicate(input=None) documentation [2] says:
"Note: The data read is buffered in memory, so do not use this method if
the data size is large or unlimited."

It says "data read", so it probably talks about stdout=PIPE when
communicate() does not return None(s) but data (stdout=PIPE and communicate
with the same process), i.e. it doesn't apply to this case and I don't have
to be troubled. As for the wait(), I think that it may work (works most of
the time), it is just not guaranteed to work with large data and it depends
on how smart the OS will be.

Vaclav

[1] https://docs.python.org/2/library/subprocess.html#subprocess.Popen.wait
[2]
https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] r.binning

2017-05-04 Thread Vaclav Petras
On Wed, May 3, 2017 at 6:57 AM, Markus Neteler  wrote:

> > The module name r.binning is rather non-descriptive. I would suggest
> > r.vect.stats because it can be regarded as the inverse of v.rast.stats,
> and
> > because it is similar to r.resamp.stats. All do statistical aggregation.
>
> +1  r.vect.stats sounds more descriptive.


Done in r71021. Thanks for the feedback.

https://trac.osgeo.org/grass/changeset/71021
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] Spatial clustering of vector objects?

2017-05-04 Thread Anna Petrášová
On Thu, May 4, 2017 at 2:18 PM, Benjamin Ducke  wrote:
> On 04/05/17 19:22, Markus Neteler wrote:
>> Hi,
>>
>> in order to parallelize some heavy computation I was wondering how to
>> do spatial clustering of vector objects, i.e. building footprints
>> (vector polygons).
>>
>> I have to perform zonal statistics on thousands of buildings and would
>> like to split them up into "tiles" and then run the computation in
>> parallel for each tile.
>>
>> The examples in v.cluster look somehow promising
>> https://grass.osgeo.org/grass72/manuals/v.cluster.html
>>
>> but in the best case each "tile" would contain a similar amount of
>> buildings in order to balance the computation across the CPUs.
>
> Hi,
>
> I think that you would need to partition
> space into overlapping tiles, with the
> amount of overlap depending on the maximum
> distance parameter of the clustering algorithm.
> Otherwise you would get a serious edge effect
> in each tile.
>
> Prior to spatial clustering, you could use a cluster
> algorithm that aims to produce clusters with
> (nearly) equal number of points for "tiling":
>
> https://stats.stackexchange.com/questions/8744/clustering-procedure-where-each-cluster-has-an-equal-number-of-points
>
> You would then select the points for each
> cluster, buffer their convex hull by the max
> distance of your spatial cluster algorithm
> and set the working region for each "tile" to
> be the bounding box of the buffered convex
> hull (don't forget to catch all points from
> all other clusters that fall within the "tile"
> and add them to the working region's set).
>
> If that works, please make it a GRASS add-on...
>
> Regarding building footprints, I guess another
> tricky part is how to represent them as
> points: Centroids? Outer edge vertices? Both?
>
> Oh, by the way: A fellow computer scientist
> who works a lot with concurrent processing
> once told me that the frequently used
>
> number of processes = number of CPUs/cores
>
> is actually not ideal! Apparently, modern
> CPU schedulers are optimized to handle many
> more processes than there are CPUs/cores,
> and if the two counts match, then you can
> get fringe situations where processes keep
> getting transferred between cores, which
> incurs a huge performance penalty. His
> recommendation was to use a factor of
> about 2.5 (times more processes than cores).
>
> I never got around to testing his theory,
> but if you have the time, I'd love to know!
>
> Best,
>
> Ben
>
>>
>> Any idea?
>>
>> thanks,
>> Markus
>> ___
>> grass-dev mailing list
>> grass-dev@lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/grass-dev
>>
>
>
>
> --
> Dr. Benjamin Ducke
> {*} Geospatial Consultant
> {*} GIS Developer
>
> Spatial technology for the masses, not the classes:
> experience free and open source GIS at http://gvsigce.org
> ___
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/grass-dev

Not sure if it's applicable here, but you could also try to use the
quadtree segmentation in v.surf.rst, there is an output parameter
treeseg. You need to postprocess it - v.category, v.type, v.centroid
to get areas.

Anna
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] Spatial clustering of vector objects?

2017-05-04 Thread Benjamin Ducke
On 04/05/17 19:22, Markus Neteler wrote:
> Hi,
> 
> in order to parallelize some heavy computation I was wondering how to
> do spatial clustering of vector objects, i.e. building footprints
> (vector polygons).
> 
> I have to perform zonal statistics on thousands of buildings and would
> like to split them up into "tiles" and then run the computation in
> parallel for each tile.
> 
> The examples in v.cluster look somehow promising
> https://grass.osgeo.org/grass72/manuals/v.cluster.html
> 
> but in the best case each "tile" would contain a similar amount of
> buildings in order to balance the computation across the CPUs.

Hi,

I think that you would need to partition
space into overlapping tiles, with the
amount of overlap depending on the maximum
distance parameter of the clustering algorithm.
Otherwise you would get a serious edge effect
in each tile.

Prior to spatial clustering, you could use a cluster
algorithm that aims to produce clusters with
(nearly) equal number of points for "tiling":

https://stats.stackexchange.com/questions/8744/clustering-procedure-where-each-cluster-has-an-equal-number-of-points

You would then select the points for each
cluster, buffer their convex hull by the max
distance of your spatial cluster algorithm
and set the working region for each "tile" to
be the bounding box of the buffered convex
hull (don't forget to catch all points from
all other clusters that fall within the "tile"
and add them to the working region's set).

If that works, please make it a GRASS add-on...

Regarding building footprints, I guess another
tricky part is how to represent them as
points: Centroids? Outer edge vertices? Both?

Oh, by the way: A fellow computer scientist
who works a lot with concurrent processing
once told me that the frequently used

number of processes = number of CPUs/cores

is actually not ideal! Apparently, modern
CPU schedulers are optimized to handle many
more processes than there are CPUs/cores,
and if the two counts match, then you can
get fringe situations where processes keep
getting transferred between cores, which
incurs a huge performance penalty. His
recommendation was to use a factor of
about 2.5 (times more processes than cores).

I never got around to testing his theory,
but if you have the time, I'd love to know!

Best,

Ben

> 
> Any idea?
> 
> thanks,
> Markus
> ___
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/grass-dev
> 



-- 
Dr. Benjamin Ducke
{*} Geospatial Consultant
{*} GIS Developer

Spatial technology for the masses, not the classes:
experience free and open source GIS at http://gvsigce.org
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] [release] GRASS 7.2.1

2017-05-04 Thread Martin Landa
Hi,

2017-05-04 7:22 GMT+02:00 Markus Neteler :
> https://grass.osgeo.org/grass72/source/grass-7.2.1.tar.gz
> https://grass.osgeo.org/grass72/source/grass-7.2.1.md5sum

packages for Windows (standalone and osgeo4w) [1] + UbuntuGIS (zesty
users please use GRASS PPA) [2] available for testing.

Martin

[1] https://grass.osgeo.org/download/software/ms-windows/
[2] https://grass.osgeo.org/download/software/linux/

-- 
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] new addon: i.variance

2017-05-04 Thread Markus Neteler
Hi Moritz,

On Tue, Mar 21, 2017 at 1:49 PM, Moritz Lennert
 wrote:
> Hi all,
>
> FYI, I uploaded a simple new addon, i.variance [1], which calculates local
> variance in an image for successively degraded resolution in order to detect
> whether there are certain scales corresponding to specific, well-represented
> objects in that image. It is based on [2].


Very interesting!

Would you mind to add a note on how to interprete the "peak values of
variance" in the examples?
Say, what to deduce from it in terms of the size of detectable objects?

thanks
Markus

> Enjoy !
>
> Moritz
>
> [1] https://grass.osgeo.org/grass72/manuals/addons/i.variance.html
> [2] Curtis E. Woodcock, Alan H. Strahler, The factor of scale in remote
> sensing, Remote Sensing of Environment, Volume 21, Issue 3, April 1987,
> Pages 311-332, ISSN 0034-4257,
> http://dx.doi.org/10.1016/0034-4257(87)90015-0.
>
>
> ___
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/grass-dev



-- 
Markus Neteler, PhD
http://www.mundialis.de - free data with free software
http://grass.osgeo.org
http://courses.neteler.org/blog
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

[GRASS-dev] Spatial clustering of vector objects?

2017-05-04 Thread Markus Neteler
Hi,

in order to parallelize some heavy computation I was wondering how to
do spatial clustering of vector objects, i.e. building footprints
(vector polygons).

I have to perform zonal statistics on thousands of buildings and would
like to split them up into "tiles" and then run the computation in
parallel for each tile.

The examples in v.cluster look somehow promising
https://grass.osgeo.org/grass72/manuals/v.cluster.html

but in the best case each "tile" would contain a similar amount of
buildings in order to balance the computation across the CPUs.

Any idea?

thanks,
Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

[GRASS-dev] [GRASS GIS] #3345: rsunlib.c: make it a core library

2017-05-04 Thread GRASS GIS
#3345: rsunlib.c: make it a core library
-+-
 Reporter:  neteler  |  Owner:  grass-dev@…
 Type:  enhancement  | Status:  new
 Priority:  normal   |  Milestone:  7.4.0
Component:  Raster   |Version:  svn-trunk
 Keywords:  r.sun|CPU:  Unspecified
 Platform:  Unspecified  |
-+-
 Given the new set of r.sun related addons, it would be ideal to have

  raster/r.sun/rsunlib.c

 at library level. Future uses would be these addons

  * r.pv (recently submitted)
  * r.suntrack (recently submitted)
  * r.sunyear (recently submitted)
  * r.sun.mp
  * maybe more

 Through this the rsunlib could probably also be wrapped in Python.

--
Ticket URL: 
GRASS GIS 

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev