Stefano, that looks great to me. A solid piece of work that likely brings large performance improvements for some common use-cases. However, I do not claim to understand it all, and I would like to hear Niels' and Rini's opinion as they have much deeper understanding of joining.
I can confirm that, merged on master, a full build passes for GeoTools, GeoServer, and GeoServer app-schema online tests against PostGIS. That is, these changes do not appear to break anything at all, not even the reference data set tests, and you have provided a flag to turn off the new functionality just in case it breaks something. My tests were with OpenJDK 8 1.8.0_102 and postgresql 9.5.4-1 / postgis 2.2.2+dfsg-5 on debian/unstable amd64. Quick first-pass code review: - Your GeoTools changes add many new tab characters and mix tabs and spaces; there are a few places where you add tabs where tabs are already used, and I would prefer if you expanded these for the entire enclosing method - TODO block in AppSchemaDataAccess and unused temporary variable "expressions" can be removed - Flag app-schema.encodeNestedFilters is enabled by default, needs documentation on the joining page - Unnecessary copyright date updates in ComplexFilterSplitter, JoiningJDBCFeatureSource, FilterToSQL - FeatureChainedAttributeVisitor has commented line: // chainedAttribute that can be removed - Commented "throw new UnsupportedOperationException" in NestedAttributeExpression can be removed - JoiningFieldEncoder made public, requires javadocs - JoiningJDBCFeatureSource commented source code lines should be removed, especially the large code block in createNestedFilter - The commented code blocks in the GeoServer tests that mention the EXISTS keyword *are* OK as they really are documentation :-) I look forward to seeing the pull requests. Kind regards, Ben. On 12/09/16 02:56, Stefano Costa wrote: > Hi all, > I'm writing to solicit feedback from the community about an improvement to > App-Schema's joining implementation that we have been working on lately. > > In short, we would like to introduce support for the SQL encoding of > filters on nested attributes, which, in the current implementation, end up > in the post-filter and are evaluated in memory, after all the features have > been loaded. > For those who might be unfamiliar with it, the inner workings of the > current joining implementation are clearly illustrated by Niels Charlier in > this presentation: > https://www.seegrid.csiro.au/wiki/pub/Infosrvices/GeoserverAppSchemaJoining/joining_presentation.pdf > > The aforementioned presentation mentions the fact that filtering on nested > attributes is "currently done with post-filtering", and suggests a possible > strategy for translating them to SQL. > > What we have implemented so far is a slight variation of the approach > suggested by Niels. > > Let's take GeoSciML's MappedFeature (container) and GeologicUnit (nested) > features as example. > The following filter on the gml:description attribute of the nested > GeologicUnit type: > > <ogc:PropertyIsLike> > > <ogc:PropertyName>gsml:specification/gsml:GeologicUnit/gml:description</ogc:PropertyName> > <ogc:Literal>*sedimentary*</ogc:Literal> > </ogc:PropertyIsLike> > > would be ignored by the current joining implementation (i.e. it is regarded > as a post-filter): > > SELECT ... > FROM "appschematest"."MAPPEDFEATURE" > ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC, > "appschematest"."MAPPEDFEATURE"."PKEY" > > SELECT ... > FROM "appschematest"."GEOLOGICUNIT" > INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON ( > "MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID") > ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC, > "appschematest"."GEOLOGICUNIT"."GML_ID" ASC, > "appschematest"."GEOLOGICUNIT"."PKEY" > > > but would be translated to the following SQL queries in the new > implementation: > > SELECT ... > FROM "appschematest"."MAPPEDFEATURE" > INNER JOIN ( > SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATURE" > *WHERE EXISTS (* > * SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT" > "chain_link_1" * > * WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%' AND > "appschematest"."MAPPEDFEATURE"."GEOLOGIC_UNIT_ID" = > "chain_link_1"."GML_ID")* > ) "temp_alias_used_for_filter" ON ( "MAPPEDFEATURE"."ID" = > "temp_alias_used_for_filter"."ID" ) > ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC, > "appschematest"."MAPPEDFEATURE"."PKEY" > > SELECT ... > FROM "appschematest"."GEOLOGICUNIT" > INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON ( > "MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID") > INNER JOIN ( > SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATUREPROPERTYFILE" > *WHERE EXISTS (* > * SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT" > "chain_link_1" * > * WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%' AND > "appschematest"."MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = > "chain_link_1"."GML_ID")* > ) "temp_alias_used_for_filter" ON ( "MAPPEDFEATUREPROPERTYFILE"."ID" = > "temp_alias_used_for_filter"."ID" ) > ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC, > "appschematest"."GEOLOGICUNIT"."GML_ID" ASC, > "appschematest"."GEOLOGICUNIT"."PKEY" > > > We have seen significant performance improvements, especially with the > total number of features in the DB is high (several thousands), but only a > few of them would satisfy the filter. > > At present, we don't aim for a comprehensive implementation of nested > filters encoding; as a consequence, our implementation has the following > limitations: > 1. only binary comparison, PropertyIsLike and PropertyIsNull filters are > translated to SQL > 2. polymorphic mappings are not supported > 3. filters involving multiple nested attributes are not supported > > In case anybody wants to take a look before a formal PR, the code can be > found in this branch of my personal geotools fork: > https://github.com/ridethepenguin/geotools/tree/my_nested_filters_join > > As this enhancement only applies when joining is enabled, unit tests have > been added in the gs-app-schema-test module of geoserver: > https://github.com/ridethepenguin/geoserver/tree/my_nested_filters_join_tests > > Last but not least, I would like to point out a couple of fixes (I hope) > I've introduced as I went along. The first is in UnmappingFilterVisitor: > https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-de21f72fdbda25bc69ea142fad909c8eR821 > > I've introduced it to make encoding of joining for simple content possible, > but it seems to me to be a more general fix to avoid having nulls in the > encoded filter. > > The second is in XPathUtil: > https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-11339740d8ff9ccff758df9400eecfc5R97 > > It seems to me the current implementation of startsWith is broken, as it > considers gml:name[2] and gml:name[3] paths to be the same, while they may > be mapped to different tables (e.g. see SimpleAttributeFeatureChainWfsTest) > > Any feedback is higly appreciated :-) > > Thanks! > > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > GeoTools-Devel mailing list > GeoTools-Devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/geotools-devel > -- Ben Caradoc-Davies <b...@transient.nz> Director Transient Software Limited <http://transient.nz/> New Zealand ------------------------------------------------------------------------------ _______________________________________________ GeoTools-Devel mailing list GeoTools-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-devel