RE: [mapserver-users] ONE PASS QUERY (RFC 52) - FEATURE OR BUG?

2010-03-24 Thread Lime, Steve D (DNR)
I think that would probably eliminate the WFS issues we saw in 5.6 where OGC 
filters didn't map cleanly to a core query as we'd essentially be adding a new 
and more flexible core query. I'm not so sure that
it would address a common use case for old behavior. For example, it seems that 
folks like to start with a core query against a number of layers (e.g. find me 
all restaurants, bars, and coffee shops within a
certain bbox) and then they (through subsequent UI interactions) augment that 
original set of results by adding or removing features in a number of layers 
based on new queries, often a point query. I 
think that would be very hard to manage through a single filter. It's easy to 
do, however, via a list of feature IDs.

I still think adding a filterObj and a msLayerWhichShapesFiltered() is a good 
idea and want to see this in 6.0.

In addition, I was thinking that:

 1) we could use shapeindex to hold a global feature ID (OID, row id, etc...), 
and tileindex (for non-tiled shapefile/raster layers) to hold a result set 
specific row id
 2) driver specific version of msLayerNextShape(...) would set shapeindex and 
optionally tileindex (for example, the shapefile driver would only set the 
former, postgis would set both)
 3) driver specific versions of msLayerGetShape(...) would be charged with 
making the decision on doing either a random access query (select ... where 
oid=x) or leveraging the existing result set based 
 on the passed tileIndex (msLayerResultsGetShape(...) goes away)
 4) resurrect the old query file writer (in addition to the new one). That code 
wrote the shapeindex and tileindex but we'd only cache the tileindex for tiled 
shapefile/raster layers
 5) resurrect the old query file reader and it would load a query in a state 
that wouldn't have the tileindexes so old, slow processing would result for 
RDBMS layers

Knowing when the result set processing (e.g. via WFS, templates and query maps) 
should set things up for old vs. new is a bit tricky. Right now there's a flag 
set in the result cache that I believe only
the GML writer respects. It shows that it's possible to support both worlds 
though. That flag could trigger old/new file IO and result cache processing. 

I need to resolve this with the parallel discussion Tamas and Frank were having.

My 2 cents anyway...

Steve


From: mapserver-users-boun...@lists.osgeo.org 
[mapserver-users-boun...@lists.osgeo.org] On Behalf Of Paul Ramsey 
[pram...@cleverelephant.ca]
Sent: Monday, March 22, 2010 11:25 PM
To: Lime, Steve D (DNR)
Cc: mapserver-users@lists.osgeo.org; BrainDrain
Subject: Re: [mapserver-users] ONE PASS QUERY (RFC 52) - FEATURE OR BUG?

Is it better to back out to the old behaviour or would defining a
filter object that allows complex query logic meet the need in a more
direct way? (Ie, is running multiple queries a feature or a workaround
for an even older limitation?)

P.

On Mon, Mar 22, 2010 at 9:21 PM, Lime, Steve D (DNR)
steve.l...@state.mn.us wrote:
 I think we're in need of a RFC 52a. Clearly the compound query handling the 
 old approach afforded is of value to a group of users and that wasn't 
 accounted for in the initial RFC. The work around Assefa had to do with WFS 
 and a certain subset of OGC filters at the sprint is evidence that the 
 approach was even used in the core code (I wasn't aware of that at the time). 
 We (in 5.6.3) developed a work around that retains two sets of indexes one 
 suitable for random access and one for a specific result set (e.g. cursor). 
 It uses the already present tileindex property of a shapeObj to store the 
 latter. I think we can have the best of both here by storing the two indexes 
 and potentially we can revert to a single getShape() function in MapScript 
 and revive the old queryfile format as a option as well. Just needs to be 
 planned for now that the full impacts are better understood.

 Steve
 
 From: mapserver-users-boun...@lists.osgeo.org 
 [mapserver-users-boun...@lists.osgeo.org] On Behalf Of Frank Warmerdam 
 [warmer...@pobox.com]
 Sent: Monday, March 22, 2010 10:16 PM
 To: Tamas Szekeres
 Cc: mapserver-users@lists.osgeo.org; BrainDrain
 Subject: Re: [mapserver-users] ONE PASS QUERY (RFC 52) - FEATURE OR BUG?

 Tamas Szekeres wrote:
 In my understanding with the original approach the driver should:

 1. Retain the result set of the queries at the layer (ie. in the
 layerinfo structure) until the layer is open and no subsequent
 whichShapes is called to 'invalidate' the query.

 Tamas,

 Your point here is that the query result should live until
 invalidated by another whichShapes, right?  I would agree with
 that, but draw on a layer does do a whichShapes, right?  So a
 draw is expected to invalidate a query, right?

 2. Provide such index in shapeObj which would allow to retieve in a
 subsequent resultsGetShape within the result set.

 ok

 3. Retain the random access

Re: [mapserver-users] ONE PASS QUERY (RFC 52) - FEATURE OR BUG?

2010-03-23 Thread Tamas Szekeres
2010/3/23 Frank Warmerdam warmer...@pobox.com


 4. Preserve the behaviour of keeping separate set of results for separate
 layer instances. In this regard a query on one layer should not invalidate
 the results for a different layer instance of the same driver.


 This seems to be a lot to expect.  We go to significant effort with
 the connection pooling to allow reuse of a connection for different
 layers, and in effect in many drivers this connection also carries
 a bunch of context with it.  Certainly in the case of OGR an
 OGRLayer retains a concept of current query result, but it can
 be invalidated by lots of operations other than ResetReading() and
 GetNextFeature().  I would imagine this is true to a greater or
 lesser to other drivers that pool connections.


Frank,

It may be driver dependent, but I'm tending to think we should open up a new
connection/session/dataset whatever, for those queries which would retain
the results at the driver, corresponding to a particular result set stored
at layer level.  This is not the case for those queries where the results
are not retained (like drawing the layers / background) and the connection
pool approach could continue be used in these cases. That's why I suggested
an additional parameter in whichShapes to define the purpose of the query.




  5. Creating a clone of a layer should provide to use a separate query (by
 keeping the results intact on the original layer). This would be essential
 for msDrawQueryLayer to work when drawing the background before the
 highlighted features.


 This is also quite impractical for some implementations - certainly
 for OGR.


I'm quite unsure how msDrawQueryLayer would ever work with OGR then. In this
case, MS_HILITE would require to draw the entire layer first and then the
highlighted shapes from the result set. Drawing the entire layer (regardless
of whether it's happening on a clone) would reset the spatialfilter on the
driver with the same connection, and a subsequent resultsGetShape would fail
to retrieve the same features.




 In retrospect, I'm not all that confident that we really considered
 the impact of RFC 52 on use cases such as those you raise.  I
 certainly didn't understand these impacts.  What is less clear to
 me is where to go from here.  RFC 52 was put in place because the
 old approach was giving terrible performance in some cases.


This is really a good question ;-)  RFC 52 is out in the stable branch and
prevents from a number of users to upgrade (we don't know the exact number
though). Leaving this version as it stands would bring in more people
involved in this version of the API, while we foresee some API change
shortly.
I think our best chance would be to provide the original version at the
drivers in parallel to the current that means: both getShape and
layerGetShape should work properly. This should probably be controlled by a
layer processing option at driver level. It would be reasonable to switch
the defaults to the 2 pass approach in the stable branch, while the users
could eventually override this in their mapfiles.

But if we put the expectations you list into place there is no
 way it can be made fast on OGR short of maintaining distinct
 OGRDataset instances for each query in addition to the one used to
 draw the layer.  This could cause various performance and
 resource problems.


While I don't exactly see the performance impacts, it may require a bit more
memory for sure. However since we intend to reuse the results of a query it
would definitely imply to store the corresponding reference of the
OGRDataset during the time interval when a subsequent access to the query
may happen.

I'm also hesitant to think that the 1pass option is better for all OGR data
sources in all cases. Having a couple of test scripts to see the performance
difference of the same query with these 2 methods would be helpful.


Best regards,

Tamas
___
mapserver-users mailing list
mapserver-users@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/mapserver-users


Re: [mapserver-users] ONE PASS QUERY (RFC 52) - FEATURE OR BUG?

2010-03-22 Thread Tamas Szekeres
Hi,

I second to these concerns, absolutely. With regards to RFC 52 there have
been a couple of breaking changes in MapServer 5.6 which prevents me from
upgrading to this version in my existing projects. I've already tried to
ring the bell in this topic with a couple of posts (see below), but it seems
the use case described here (as keeping long term mapObj references) is not
widely used and falls ouside of the general area of interest;

http://n2.nabble.com/Ready-for-5-6-2-td4743344.html#a4746772
http://n2.nabble.com/OGR-single-pass-query-issues-was-Ready-for-5-6-2-td4753764.html

By raising up your issues below I've studied RFC
52http://mapserver.org/development/rfc/ms-rfc-52.htmlagain to see
the objectives, and it seems we are getting out of the sync
with the current implementation at the drivers.

In my understanding with the original approach the driver should:

1. Retain the result set of the queries at the layer (ie. in the layerinfo
structure) until the layer is open and no subsequent whichShapes is called
to 'invalidate' the query.
2. Provide such index in shapeObj which would allow to retieve in a
subsequent resultsGetShape within the result set.
3. Retain the random access behaviour (getShape) for backward compatibility
in parallel to resultsGetShape.


Since the RFC doesn't contain explicit note about the opposite, the drivers
should also:

4. Preserve the behaviour of keeping separate set of results for separate
layer instances. In this regard a query on one layer should not invalidate
the results for a different layer instance of the same driver.
5. Creating a clone of a layer should provide to use a separate query (by
keeping the results intact on the original layer). This would be essential
for msDrawQueryLayer to work when drawing the background before the
highlighted features.
6. Using a drawQuery should not invalidate the results of a previous query.
7. Drawing the map should not invalidate the results of a previous query.


Further notes: To provide correct implementations at the drivers it seems we
should provide a bit more information for the driver to distinguish between
the purpose of a query. For example whichShapes should take an additional
parameter (querymode) with the following predefined values:

MS_QUERY_SEQUENTIAL:  The returned shapes will be retrieved by nextShape (no
subsequent (result)getshape will happen). This would be used by the normal
drawing operations.
MS_QUERY_RANDOM: The results would be retrieved by nextShape. The driver
would provide oid-s as feature indexes to support to retrieve the features
by the original (2 pass) behaviour (getShape). No features are retained at
the driver for further access.
MS_QUERY_PRESERVE: The results would be retrieved by nextShape and
resultsGetShape. The drivers should store the result set at the layer for
further retrieval (1 pass).

The features retrieved by MS_QUERY_PRESERVE should be kept in a separate
location either in layerinfo or in a separate structure (a queryinfo for
example). The latter would provide the isolate the results from the layer
opened state.
MS_QUERY_SEQUENTIAL or MS_QUERY_RANDOM should not invalidate the results in
queryinfo.


We should also provide the user with an option to select between the default
of MS_QUERY_RANDOM/MS_QUERY_PRESERVE at layer level. Probably a layer
processing option would be sufficient.



Best regards,


Tamas








2010/3/21 BrainDrain paulborod...@gmail.com


 Please read carefully.
 'old style' (two pass) query
 advantages:
 - in mapscript (c#) layer's query methods are CUMULATIVE (relative to other
 layer's queries). Query result (success/failure) has no effect on other
 layers when I call map.savequery.
 - In this case (query file contains just oid's) - I CAN create COMPLEX
 queries by applying different parameters and mixing query types
 - Query result is oid (or row index in shapefile) - it's cool for creating
 advanced attr. postqueries
 and disadvantages (insignificant):
 - query binary (closed) format? no output to string (only to file)
 - query file sensitive to layer indexes (map file cleanups/refinements/some
 normalization can cause query file incompatibility)
 - if server data changed refreshing map image by old url to cgi with
 queryfile parameter doesn't perform requery (need custom http
 handler/module)
 AND ONE PASS QUERY (RFC 52)
 advantages:
 - open query file format
 - speedup (no random access)
 disadvantages (huge):
 - layer's query methods are NOT CUMULATIVE (relative to other layer's
 queries). I CAN'T create COMPLEX queries (by using different
 attribute/spatial queries (metadata driven) for different layers)!
 - query result - some (shape)indexes and when I'm querying many layers in
 sequence I NEED TO PRESERVE QUERY FILE FOR EVERY LAYER (on success result)
 (!) and than on feature attributes (for some layer) demand (delayed request
 - its a normal behavior, for. ex I'm requesting only shape names for some
 layer which has results - to build results 

Re: [mapserver-users] ONE PASS QUERY (RFC 52) - FEATURE OR BUG?

2010-03-22 Thread Frank Warmerdam

Tamas Szekeres wrote:

In my understanding with the original approach the driver should:

1. Retain the result set of the queries at the layer (ie. in the 
layerinfo structure) until the layer is open and no subsequent 
whichShapes is called to 'invalidate' the query.


Tamas,

Your point here is that the query result should live until
invalidated by another whichShapes, right?  I would agree with
that, but draw on a layer does do a whichShapes, right?  So a
draw is expected to invalidate a query, right?

2. Provide such index in shapeObj which would allow to retieve in a 
subsequent resultsGetShape within the result set.


ok

3. Retain the random access behaviour (getShape) for backward 
compatibility in parallel to resultsGetShape.


ok

Since the RFC doesn't contain explicit note about the opposite, the 
drivers should also:


4. Preserve the behaviour of keeping separate set of results for 
separate layer instances. In this regard a query on one layer should not 
invalidate the results for a different layer instance of the same driver.


This seems to be a lot to expect.  We go to significant effort with
the connection pooling to allow reuse of a connection for different
layers, and in effect in many drivers this connection also carries
a bunch of context with it.  Certainly in the case of OGR an
OGRLayer retains a concept of current query result, but it can
be invalidated by lots of operations other than ResetReading() and
GetNextFeature().  I would imagine this is true to a greater or
lesser to other drivers that pool connections.

5. Creating a clone of a layer should provide to use a separate query 
(by keeping the results intact on the original layer). This would be 
essential for msDrawQueryLayer to work when drawing the background 
before the highlighted features.


This is also quite impractical for some implementations - certainly
for OGR.


6. Using a drawQuery should not invalidate the results of a previous query.


I don't know much about drawQuery but it does seem plausible to ask
that drawQuery should not invalidate the query it is drawing.


7. Drawing the map should not invalidate the results of a previous query.


But drawing maps uses the feature access machinery like whichShapes
doesn't it?  How can we expect map drawing not to invalidate a query?

In retrospect, I'm not all that confident that we really considered
the impact of RFC 52 on use cases such as those you raise.  I
certainly didn't understand these impacts.  What is less clear to
me is where to go from here.  RFC 52 was put in place because the
old approach was giving terrible performance in some cases.
But if we put the expectations you list into place there is no
way it can be made fast on OGR short of maintaining distinct
OGRDataset instances for each query in addition to the one used to
draw the layer.  This could cause various performance and
resource problems.

Best regards,
--
---+--
I set the clouds in motion - turn up   | Frank Warmerdam, warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Programmer for Rent

___
mapserver-users mailing list
mapserver-users@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/mapserver-users


Re: [mapserver-users] ONE PASS QUERY (RFC 52) - FEATURE OR BUG?

2010-03-22 Thread Paul Ramsey
Is it better to back out to the old behaviour or would defining a
filter object that allows complex query logic meet the need in a more
direct way? (Ie, is running multiple queries a feature or a workaround
for an even older limitation?)

P.

On Mon, Mar 22, 2010 at 9:21 PM, Lime, Steve D (DNR)
steve.l...@state.mn.us wrote:
 I think we're in need of a RFC 52a. Clearly the compound query handling the 
 old approach afforded is of value to a group of users and that wasn't 
 accounted for in the initial RFC. The work around Assefa had to do with WFS 
 and a certain subset of OGC filters at the sprint is evidence that the 
 approach was even used in the core code (I wasn't aware of that at the time). 
 We (in 5.6.3) developed a work around that retains two sets of indexes one 
 suitable for random access and one for a specific result set (e.g. cursor). 
 It uses the already present tileindex property of a shapeObj to store the 
 latter. I think we can have the best of both here by storing the two indexes 
 and potentially we can revert to a single getShape() function in MapScript 
 and revive the old queryfile format as a option as well. Just needs to be 
 planned for now that the full impacts are better understood.

 Steve
 
 From: mapserver-users-boun...@lists.osgeo.org 
 [mapserver-users-boun...@lists.osgeo.org] On Behalf Of Frank Warmerdam 
 [warmer...@pobox.com]
 Sent: Monday, March 22, 2010 10:16 PM
 To: Tamas Szekeres
 Cc: mapserver-users@lists.osgeo.org; BrainDrain
 Subject: Re: [mapserver-users] ONE PASS QUERY (RFC 52) - FEATURE OR BUG?

 Tamas Szekeres wrote:
 In my understanding with the original approach the driver should:

 1. Retain the result set of the queries at the layer (ie. in the
 layerinfo structure) until the layer is open and no subsequent
 whichShapes is called to 'invalidate' the query.

 Tamas,

 Your point here is that the query result should live until
 invalidated by another whichShapes, right?  I would agree with
 that, but draw on a layer does do a whichShapes, right?  So a
 draw is expected to invalidate a query, right?

 2. Provide such index in shapeObj which would allow to retieve in a
 subsequent resultsGetShape within the result set.

 ok

 3. Retain the random access behaviour (getShape) for backward
 compatibility in parallel to resultsGetShape.

 ok

 Since the RFC doesn't contain explicit note about the opposite, the
 drivers should also:

 4. Preserve the behaviour of keeping separate set of results for
 separate layer instances. In this regard a query on one layer should not
 invalidate the results for a different layer instance of the same driver.

 This seems to be a lot to expect.  We go to significant effort with
 the connection pooling to allow reuse of a connection for different
 layers, and in effect in many drivers this connection also carries
 a bunch of context with it.  Certainly in the case of OGR an
 OGRLayer retains a concept of current query result, but it can
 be invalidated by lots of operations other than ResetReading() and
 GetNextFeature().  I would imagine this is true to a greater or
 lesser to other drivers that pool connections.

 5. Creating a clone of a layer should provide to use a separate query
 (by keeping the results intact on the original layer). This would be
 essential for msDrawQueryLayer to work when drawing the background
 before the highlighted features.

 This is also quite impractical for some implementations - certainly
 for OGR.

 6. Using a drawQuery should not invalidate the results of a previous query.

 I don't know much about drawQuery but it does seem plausible to ask
 that drawQuery should not invalidate the query it is drawing.

 7. Drawing the map should not invalidate the results of a previous query.

 But drawing maps uses the feature access machinery like whichShapes
 doesn't it?  How can we expect map drawing not to invalidate a query?

 In retrospect, I'm not all that confident that we really considered
 the impact of RFC 52 on use cases such as those you raise.  I
 certainly didn't understand these impacts.  What is less clear to
 me is where to go from here.  RFC 52 was put in place because the
 old approach was giving terrible performance in some cases.
 But if we put the expectations you list into place there is no
 way it can be made fast on OGR short of maintaining distinct
 OGRDataset instances for each query in addition to the one used to
 draw the layer.  This could cause various performance and
 resource problems.

 Best regards,
 --
 ---+--
 I set the clouds in motion - turn up   | Frank Warmerdam, warmer...@pobox.com
 light and sound - activate the windows | http://pobox.com/~warmerdam
 and watch the world go round - Rush    | Geospatial Programmer for Rent

 ___
 mapserver-users mailing list
 mapserver-users@lists.osgeo.org
 http