Stefano,

that looks great to me. A solid piece of work that likely brings large 
performance improvements for some common use-cases. However, I do not 
claim to understand it all, and I would like to hear Niels' and Rini's 
opinion as they have much deeper understanding of joining.

I can confirm that, merged on master, a full build passes for GeoTools, 
GeoServer, and GeoServer app-schema online tests against PostGIS. That 
is, these changes do not appear to break anything at all, not even the 
reference data set tests, and you have provided a flag to turn off the 
new functionality just in case it breaks something.

My tests were with OpenJDK 8 1.8.0_102 and  postgresql 9.5.4-1 / postgis 
2.2.2+dfsg-5 on debian/unstable amd64.

Quick first-pass code review:

- Your GeoTools changes add many new tab characters and mix tabs and 
spaces; there are a few places where you add tabs where tabs are already 
used, and I would prefer if you expanded these for the entire enclosing 
method
- TODO block in AppSchemaDataAccess and unused temporary variable 
"expressions" can be removed
- Flag app-schema.encodeNestedFilters is enabled by default, needs 
documentation on the joining page
- Unnecessary copyright date updates in ComplexFilterSplitter, 
JoiningJDBCFeatureSource, FilterToSQL
- FeatureChainedAttributeVisitor has commented line: // chainedAttribute 
that can be removed
- Commented "throw new UnsupportedOperationException" in 
NestedAttributeExpression can be removed
- JoiningFieldEncoder made public, requires javadocs
- JoiningJDBCFeatureSource commented source code lines should be 
removed, especially the large code block in createNestedFilter
- The commented code blocks in the GeoServer tests that mention the 
EXISTS keyword *are* OK as they really are documentation  :-)

I look forward to seeing the pull requests.

Kind regards,
Ben.


On 12/09/16 02:56, Stefano Costa wrote:
> Hi all,
> I'm writing to solicit feedback from the community about an improvement to
> App-Schema's joining implementation that we have been working on lately.
>
> In short, we would like to introduce support for the SQL encoding of
> filters on nested attributes, which, in the current implementation, end up
> in the post-filter and are evaluated in memory, after all the features have
> been loaded.
> For those who might be unfamiliar with it, the inner workings of the
> current joining implementation are clearly illustrated by Niels Charlier in
> this presentation:
> https://www.seegrid.csiro.au/wiki/pub/Infosrvices/GeoserverAppSchemaJoining/joining_presentation.pdf
>
> The aforementioned presentation mentions the fact that filtering on nested
> attributes is "currently done with post-filtering", and suggests a possible
> strategy for translating them to SQL.
>
> What we have implemented so far is a slight variation of the approach
> suggested by Niels.
>
> Let's take GeoSciML's MappedFeature (container) and GeologicUnit (nested)
> features as example.
> The following filter on the gml:description attribute of the nested
> GeologicUnit type:
>
> <ogc:PropertyIsLike>
>
> <ogc:PropertyName>gsml:specification/gsml:GeologicUnit/gml:description</ogc:PropertyName>
>   <ogc:Literal>*sedimentary*</ogc:Literal>
> </ogc:PropertyIsLike>
>
> would be ignored by the current joining implementation (i.e. it is regarded
> as a post-filter):
>
> SELECT ...
> FROM "appschematest"."MAPPEDFEATURE"
> ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC,
> "appschematest"."MAPPEDFEATURE"."PKEY"
>
> SELECT ...
> FROM "appschematest"."GEOLOGICUNIT"
> INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON (
> "MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID")
> ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC,
> "appschematest"."GEOLOGICUNIT"."GML_ID" ASC,
> "appschematest"."GEOLOGICUNIT"."PKEY"
>
>
> but would be translated to the following SQL queries in the new
> implementation:
>
> SELECT ...
> FROM "appschematest"."MAPPEDFEATURE"
> INNER JOIN (
> SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATURE"
> *WHERE EXISTS (*
> * SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT"
> "chain_link_1" *
> * WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%'  AND
> "appschematest"."MAPPEDFEATURE"."GEOLOGIC_UNIT_ID" =
> "chain_link_1"."GML_ID")*
> ) "temp_alias_used_for_filter" ON ( "MAPPEDFEATURE"."ID" =
> "temp_alias_used_for_filter"."ID" )
> ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC,
> "appschematest"."MAPPEDFEATURE"."PKEY"
>
> SELECT ...
> FROM "appschematest"."GEOLOGICUNIT"
> INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON (
> "MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID")
> INNER JOIN (
> SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATUREPROPERTYFILE"
> *WHERE EXISTS (*
> * SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT"
> "chain_link_1" *
> * WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%'  AND
> "appschematest"."MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" =
> "chain_link_1"."GML_ID")*
> ) "temp_alias_used_for_filter" ON ( "MAPPEDFEATUREPROPERTYFILE"."ID" =
> "temp_alias_used_for_filter"."ID" )
> ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC,
> "appschematest"."GEOLOGICUNIT"."GML_ID" ASC,
> "appschematest"."GEOLOGICUNIT"."PKEY"
>
>
> We have seen significant performance improvements, especially with the
> total number of features in the DB is high (several thousands), but only a
> few of them would satisfy the filter.
>
> At present, we don't aim for a comprehensive implementation of nested
> filters encoding; as a consequence, our implementation has the following
> limitations:
> 1. only binary comparison, PropertyIsLike and PropertyIsNull filters are
> translated to SQL
> 2. polymorphic mappings are not supported
> 3. filters involving multiple nested attributes are not supported
>
> In case anybody wants to take a look before a formal PR, the code can be
> found in this branch of my personal geotools fork:
> https://github.com/ridethepenguin/geotools/tree/my_nested_filters_join
>
> As this enhancement only applies when joining is enabled, unit tests have
> been added in the gs-app-schema-test module of geoserver:
> https://github.com/ridethepenguin/geoserver/tree/my_nested_filters_join_tests
>
> Last but not least, I would like to point out a couple of fixes (I hope)
> I've introduced as I went along. The first is in UnmappingFilterVisitor:
> https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-de21f72fdbda25bc69ea142fad909c8eR821
>
> I've introduced it to make encoding of joining for simple content possible,
> but it seems to me to be a more general fix to avoid having nulls in the
> encoded filter.
>
> The second is in XPathUtil:
> https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-11339740d8ff9ccff758df9400eecfc5R97
>
> It seems to me the current implementation of startsWith is broken, as it
> considers gml:name[2] and gml:name[3] paths to be the same, while they may
> be mapped to different tables (e.g. see SimpleAttributeFeatureChainWfsTest)
>
> Any feedback is higly appreciated :-)
>
> Thanks!
>
>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> GeoTools-Devel mailing list
> GeoTools-Devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geotools-devel
>

-- 
Ben Caradoc-Davies <b...@transient.nz>
Director
Transient Software Limited <http://transient.nz/>
New Zealand

------------------------------------------------------------------------------
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to