Hi Stefano,

I think it is great you added that feature! A vast improvement to the work that I always had hoped would still be developed.

Out of curiosity, why did you choose the "where exists" clauses rather than the simple joins that I suggested? You think it has better performance (I would expect worse on first glance, but more probably the same because of optimiser),
or was it easier to construct the queries this way?

Kind Regards
Niels

On 11-09-16 16:56, Stefano Costa wrote:
Hi all,
I'm writing to solicit feedback from the community about an improvement to
App-Schema's joining implementation that we have been working on lately.

In short, we would like to introduce support for the SQL encoding of
filters on nested attributes, which, in the current implementation, end up
in the post-filter and are evaluated in memory, after all the features have
been loaded.
For those who might be unfamiliar with it, the inner workings of the
current joining implementation are clearly illustrated by Niels Charlier in
this presentation:
https://www.seegrid.csiro.au/wiki/pub/Infosrvices/GeoserverAppSchemaJoining/joining_presentation.pdf

The aforementioned presentation mentions the fact that filtering on nested
attributes is "currently done with post-filtering", and suggests a possible
strategy for translating them to SQL.

What we have implemented so far is a slight variation of the approach
suggested by Niels.

Let's take GeoSciML's MappedFeature (container) and GeologicUnit (nested)
features as example.
The following filter on the gml:description attribute of the nested
GeologicUnit type:

<ogc:PropertyIsLike>

<ogc:PropertyName>gsml:specification/gsml:GeologicUnit/gml:description</ogc:PropertyName>
   <ogc:Literal>*sedimentary*</ogc:Literal>
</ogc:PropertyIsLike>

would be ignored by the current joining implementation (i.e. it is regarded
as a post-filter):

SELECT ...
FROM "appschematest"."MAPPEDFEATURE"
ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC,
"appschematest"."MAPPEDFEATURE"."PKEY"

SELECT ...
FROM "appschematest"."GEOLOGICUNIT"
INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON (
"MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID")
ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC,
"appschematest"."GEOLOGICUNIT"."GML_ID" ASC,
"appschematest"."GEOLOGICUNIT"."PKEY"


but would be translated to the following SQL queries in the new
implementation:

SELECT ...
FROM "appschematest"."MAPPEDFEATURE"
INNER JOIN (
SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATURE"
*WHERE EXISTS (*
* SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT"
"chain_link_1" *
* WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%'  AND
"appschematest"."MAPPEDFEATURE"."GEOLOGIC_UNIT_ID" =
"chain_link_1"."GML_ID")*
) "temp_alias_used_for_filter" ON ( "MAPPEDFEATURE"."ID" =
"temp_alias_used_for_filter"."ID" )
ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC,
"appschematest"."MAPPEDFEATURE"."PKEY"

SELECT ...
FROM "appschematest"."GEOLOGICUNIT"
INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON (
"MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID")
INNER JOIN (
SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATUREPROPERTYFILE"
*WHERE EXISTS (*
* SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT"
"chain_link_1" *
* WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%'  AND
"appschematest"."MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" =
"chain_link_1"."GML_ID")*
) "temp_alias_used_for_filter" ON ( "MAPPEDFEATUREPROPERTYFILE"."ID" =
"temp_alias_used_for_filter"."ID" )
ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC,
"appschematest"."GEOLOGICUNIT"."GML_ID" ASC,
"appschematest"."GEOLOGICUNIT"."PKEY"


We have seen significant performance improvements, especially with the
total number of features in the DB is high (several thousands), but only a
few of them would satisfy the filter.

At present, we don't aim for a comprehensive implementation of nested
filters encoding; as a consequence, our implementation has the following
limitations:
1. only binary comparison, PropertyIsLike and PropertyIsNull filters are
translated to SQL
2. polymorphic mappings are not supported
3. filters involving multiple nested attributes are not supported

In case anybody wants to take a look before a formal PR, the code can be
found in this branch of my personal geotools fork:
https://github.com/ridethepenguin/geotools/tree/my_nested_filters_join

As this enhancement only applies when joining is enabled, unit tests have
been added in the gs-app-schema-test module of geoserver:
https://github.com/ridethepenguin/geoserver/tree/my_nested_filters_join_tests

Last but not least, I would like to point out a couple of fixes (I hope)
I've introduced as I went along. The first is in UnmappingFilterVisitor:
https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-de21f72fdbda25bc69ea142fad909c8eR821

I've introduced it to make encoding of joining for simple content possible,
but it seems to me to be a more general fix to avoid having nulls in the
encoded filter.

The second is in XPathUtil:
https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-11339740d8ff9ccff758df9400eecfc5R97

It seems to me the current implementation of startsWith is broken, as it
considers gml:name[2] and gml:name[3] paths to be the same, while they may
be mapped to different tables (e.g. see SimpleAttributeFeatureChainWfsTest)

Any feedback is higly appreciated :-)

Thanks!




------------------------------------------------------------------------------


_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel


------------------------------------------------------------------------------
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to