Stefano,
Indeed, that makes sense, the optimiser should indeed do the work for
you. It's good to see there are vast performance improvements. Good work!
Regards
Niels
On 09/13/2016 10:04 PM, Stefano Costa wrote:
Hi Niels,
we went with the "where exists" subquery approach because it was
easier to translate filters to SQL that way. For example, let's think
of a filter which is actually a logical combination (OR / AND) of
multiple filters; each filter could refer to a completely different
feature chain, and thus, one would have to go trough all the filters
first to determine the global list of tables that should be joined,
and then logically combine the WHERE clauses. Using the subquery
approach felt more natural (at least from a human perspective), it
allowed us to translate each filter separately and then logically
combine the translated SQL fragments in a straightforward way.
Performance-wise, I don't expect a big difference, as optimizers
should be smart enough to translate a WHERE EXISTS subquery into a
join if needed; I have not done any real performance comparison
between the two approaches though.
Does this make sense to you?
On Tue, Sep 13, 2016 at 11:17 AM, Niels Charlier <ni...@scitus.be
<mailto:ni...@scitus.be>> wrote:
Hi Stefano,
I think it is great you added that feature! A vast improvement to
the work that I always had hoped would still be developed.
Out of curiosity, why did you choose the "where exists" clauses
rather than the simple joins that I suggested? You think it has
better performance (I would expect worse on first glance, but more
probably the same because of optimiser),
or was it easier to construct the queries this way?
Kind Regards
Niels
On 11-09-16 16:56, Stefano Costa wrote:
Hi all, I'm writing to solicit feedback from the community about
an improvement to App-Schema's joining implementation that we
have been working on lately. In short, we would like to introduce
support for the SQL encoding of filters on nested attributes,
which, in the current implementation, end up in the post-filter
and are evaluated in memory, after all the features have been
loaded. For those who might be unfamiliar with it, the inner
workings of the current joining implementation are clearly
illustrated by Niels Charlier in this presentation:
https://www.seegrid.csiro.au/wiki/pub/Infosrvices/GeoserverAppSchemaJoining/joining_presentation.pdf
<https://www.seegrid.csiro.au/wiki/pub/Infosrvices/GeoserverAppSchemaJoining/joining_presentation.pdf>
The aforementioned presentation mentions the fact that filtering
on nested attributes is "currently done with post-filtering", and
suggests a possible strategy for translating them to SQL. What we
have implemented so far is a slight variation of the approach
suggested by Niels. Let's take GeoSciML's MappedFeature
(container) and GeologicUnit (nested) features as example. The
following filter on the gml:description attribute of the nested
GeologicUnit type: <ogc:PropertyIsLike>
<ogc:PropertyName>gsml:specification/gsml:GeologicUnit/gml:description</ogc:PropertyName>
<ogc:Literal>*sedimentary*</ogc:Literal> </ogc:PropertyIsLike>
would be ignored by the current joining implementation (i.e. it
is regarded as a post-filter): SELECT ... FROM
"appschematest"."MAPPEDFEATURE" ORDER BY
"appschematest"."MAPPEDFEATURE"."ID" ASC,
"appschematest"."MAPPEDFEATURE"."PKEY" SELECT ... FROM
"appschematest"."GEOLOGICUNIT" INNER JOIN
"appschematest"."MAPPEDFEATUREPROPERTYFILE" ON (
"MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" =
"GEOLOGICUNIT"."GML_ID") ORDER BY
"appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC,
"appschematest"."GEOLOGICUNIT"."GML_ID" ASC,
"appschematest"."GEOLOGICUNIT"."PKEY" but would be translated to
the following SQL queries in the new implementation: SELECT ...
FROM "appschematest"."MAPPEDFEATURE" INNER JOIN ( SELECT DISTINCT
"ID" FROM "appschematest"."
MAPPEDFEATURE"
*WHERE EXISTS (*
* SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT"
"chain_link_1" *
* WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%' AND
"appschematest"."MAPPEDFEATURE"."GEOLOGIC_UNIT_ID" =
"chain_link_1"."GML_ID")*
) "temp_alias_used_for_filter" ON ( "MAPPEDFEATURE"."ID" =
"temp_alias_used_for_filter"."ID" )
ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC,
"appschematest"."MAPPEDFEATURE"."PKEY"
SELECT ...
FROM "appschematest"."GEOLOGICUNIT"
INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON (
"MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID")
INNER JOIN (
SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATUREPROPERTYFILE"
*WHERE EXISTS (*
* SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT"
"chain_link_1" *
* WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%' AND
"appschematest"."MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" =
"chain_link_1"."GML_ID")*
) "temp_alias_used_for_filter" ON ( "MAPPEDFEATUREPROPERTYFILE"."ID" =
"temp_alias_used_for_filter"."ID" )
ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC,
"appschematest"."GEOLOGICUNIT"."GML_ID" ASC,
"appschematest"."GEOLOGICUNIT"."PKEY"
We have seen significant performance improvements, especially with the
total number of features in the DB is high (several thousands), but only a
few of them would satisfy the filter.
At present, we don't aim for a comprehensive implementation of nested
filters encoding; as a consequence, our implementation has the following
limitations:
1. only binary comparison, PropertyIsLike and PropertyIsNull filters are
translated to SQL
2. polymorphic mappings are not supported
3. filters involving multiple nested attributes are not supported
In case anybody wants to take a look before a formal PR, the code can be
found in this branch of my personal geotools fork:
https://github.com/ridethepenguin/geotools/tree/my_nested_filters_join
<https://github.com/ridethepenguin/geotools/tree/my_nested_filters_join>
As this enhancement only applies when joining is enabled, unit
tests have been added in the gs-app-schema-test module of
geoserver:
https://github.com/ridethepenguin/geoserver/tree/my_nested_filters_join_tests
<https://github.com/ridethepenguin/geoserver/tree/my_nested_filters_join_tests>
Last but not least, I would like to point out a couple of fixes
(I hope) I've introduced as I went along. The first is in
UnmappingFilterVisitor:
https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-de21f72fdbda25bc69ea142fad909c8eR821
<https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-de21f72fdbda25bc69ea142fad909c8eR821>
I've introduced it to make encoding of joining for simple content
possible, but it seems to me to be a more general fix to avoid
having nulls in the encoded filter. The second is in XPathUtil:
https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-11339740d8ff9ccff758df9400eecfc5R97
<https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-11339740d8ff9ccff758df9400eecfc5R97>
It seems to me the current implementation of startsWith is
broken, as it considers gml:name[2] and gml:name[3] paths to be
the same, while they may be mapped to different tables (e.g. see
SimpleAttributeFeatureChainWfsTest) Any feedback is higly
appreciated :-) Thanks!
------------------------------------------------------------------------------
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
<mailto:GeoTools-Devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/geotools-devel
<https://lists.sourceforge.net/lists/listinfo/geotools-devel>
------------------------------------------------------------------------------
_______________________________________________ GeoTools-Devel
mailing list GeoTools-Devel@lists.sourceforge.net
<mailto:GeoTools-Devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/geotools-devel
<https://lists.sourceforge.net/lists/listinfo/geotools-devel>
--
Best regards, Stefano Costa == GeoServer Professional Services from
the experts! Visit http://goo.gl/it488V for more information. == Dott.
Stefano Costa Senior Software Engineer GeoSolutions S.A.S. Via di
Montramito 3/A55054 Massarosa (LU) Italy phone: +39 0584 962313 fax:
+39 0584 1660272 http://www.geo-solutions.it
http://twitter.com/geosolutions_it
------------------------------------------------------- AVVERTENZE AI
SENSI DEL D.Lgs. 196/2003 Le informazioni contenute in questo
messaggio di posta elettronica e/o nel/i file/s allegato/i sono da
considerarsi strettamente riservate. Il loro utilizzo è consentito
esclusivamente al destinatario del messaggio, per le finalità indicate
nel messaggio stesso. Qualora riceviate questo messaggio senza esserne
il destinatario, Vi preghiamo cortesemente di darcene notizia via
e-mail e di procedere alla distruzione del messaggio stesso,
cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo,
od utilizzarlo per finalità diverse, costituisce comportamento
contrario ai principi dettati dal D.Lgs. 196/2003. The information in
this message and/or attachments, is intended solely for the attention
and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data
Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.
------------------------------------------------------------------------------
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel