Re: [Geotools-devel] geotools mongodb bugs in filtering

2022-07-15 Thread Hans Yperman
Dear Marco,

I made a pull request for this issue 
https://github.com/geotools/geotools/pull/3965 .

The code change is OK, but I made some mistakes while formatting the patch and 
ran out of time yesterday.  If not acceptable, I’ll retry somewhere in the near 
future.


  *   My test now runs on the real mongodb.
  *   ECQL fixed all function bugs.
  *   Fixing the bug surfaced 3 broken unit tests.  I checked them, the 
behaviour encoded in them seems incorrect to me, so I adapted these too.
  *   I did not fix the other minor troubles.




From: Marco Volpini 
Sent: dinsdag 12 juli 2022 11:16
To: Hans Yperman 
Cc: geotools-devel@lists.sourceforge.net
Subject: Re: [Geotools-devel] geotools mongodb bugs in filtering

Dear Hans,
Find listed below some replies to your questions:

  *   Tests: when running online tests geotools makes usage of a property file 
with connection params to a mongodb instance that should be present in your a 
directory ".geotools" in your user directory. If the directory with the 
.properties is not present then geotools will create it along with an example 
.properties file to fill with proper connection parameters in order to run the 
tests. I would avoid then the change you proposed and try to run the tests with 
the properties file correctly configured.
  *   You could even test the splitting logic avoiding online test: make the 
splitting class instead of anonymous a package default class and then execute 
test on the visit methods by using a 
FilterFactory<https://github.com/geotools/geotools/blob/f66689b5305eaf381e21dc2feec8011138d4cb24/modules/library/opengis/src/main/java/org/opengis/filter/FilterFactory2.java#L48>
 to create the various visatable filters that you want to test.
  *   The JsonSelect function is supposed to be used for an integration of 
AppSchema with MongoDB doesn't have a real usage in normal CQL use cases. 
However the filter you provide to the compiler seems fine thus if you are 
receiving exception might be a bug. Try to use XCQL.toFilter or ECQL.toFilter 
methods and see if you reproduce it. The CQL compiler is more strict on the 
filters types and might be the reason of the compilation failure.
  *   I would not delete the  MongoFilterSplitter class. Although it's 
behaviour is faulty in the visiting of Binary ops and Like filters it is 
extended and overrided in GeoServer and its deletion will cause issues. Instead 
you might do you code changes to that one instead of to the anonymous  class in 
the MongoFeatureSource and use it as a base to be extended for it.
  *   Regarding the Temporal operator test with Before: I think it don't have 
much sense to test it in the splitting process as it is currently not supported 
by the plugin. The FilterToMongo class will indeed throw an exception if found. 
The currently supported time filters are in the form dateTimeProp >= 
'2021-06-01T09:42:47.000+02:00' using the comparison operators <>=.
  *   Regarding  the way to open pull requests: usually a pull request should 
have a single focus on a bug or a functionality. So if all of your 
modifications are for the faulty splitting logic I guess you can just open one 
pr.

Regards,

Marco Volpini

==
GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us for more information.
==

Marco Volpini

Software Engineer

GeoSolutions Group
phone: +39 0584 962313

fax: +39 0584 1660272


https://www.geosolutionsgroup.com/

http://twitter.com/geosolutions_it

---


Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 
2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa 
che ogni circostanza inerente alla presente email (il suo contenuto, gli 
eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i 
destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per 
errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei 
comunque grato se potesse darmene notizia.

This email is intended only for the person or entity to which it is addressed 
and may contain information that is privileged, confidential or otherwise 
protected from disclosure. We remind that - as provided by European Regulation 
2016/679 “GDPR” - copying, dissemination or use of this e-mail or the 
information herein by anyone other than the intended recipient is prohibited. 
If you have received this email by mistake, please notify us immediately by 
telephone or e-mail.



On Fri, Jul 8, 2022 at 1:35 PM Hans Yperman 
mailto:hans.yper...@vliz.be>> wrote:
Hi Marco,

Thanks for your quick reaction,

In this mail I’ll concentrate on the Filtering issue, as it is the most 
blocking for us.  The schemaless plugin is interesting and I’ll check it out. 
Fixing the filtering bug causes the dates to partially work.  So I’ll see what 
happens after the core bug is fixed.

I’ve investigated the code wedn

Re: [Geotools-devel] geotools mongodb bugs in filtering

2022-07-12 Thread Hans Yperman
Marco,

Thanks for your answers.  I’ll modify the code this evening

  *   Tests: I’ll install mongo on my dev machine and have a try.
  *   Tests without mongo: I need more than just the splitter, I also need the 
filter capabilities so I have to instantiate a MongoDataStore object without 
making a connection somehow.  I can add an extra non-public constructor that 
grants access to the necessary fields if you prefer that.
  *   I have classcastexceptions with every function I try, not only 
JSonSelect.  I’ll try XCQL this evening.
  *   I’ll not delete the MongoFiterSplitter class.  Do I understand you that 
other modules import the Mongo driver module and write code on top of it?  If 
yes, I’d better test them too.
  *   The Before operation is there exactly because mongo does not support it.  
I need something that gets split of to force the splitter to do a split,  and I 
can’t get functions to work, so there is not much left.

From: Marco Volpini 
Sent: dinsdag 12 juli 2022 11:16
To: Hans Yperman 
Cc: geotools-devel@lists.sourceforge.net
Subject: Re: [Geotools-devel] geotools mongodb bugs in filtering

Dear Hans,
Find listed below some replies to your questions:

  *   Tests: when running online tests geotools makes usage of a property file 
with connection params to a mongodb instance that should be present in your a 
directory ".geotools" in your user directory. If the directory with the 
.properties is not present then geotools will create it along with an example 
.properties file to fill with proper connection parameters in order to run the 
tests. I would avoid then the change you proposed and try to run the tests with 
the properties file correctly configured.
  *   You could even test the splitting logic avoiding online test: make the 
splitting class instead of anonymous a package default class and then execute 
test on the visit methods by using a 
FilterFactory<https://github.com/geotools/geotools/blob/f66689b5305eaf381e21dc2feec8011138d4cb24/modules/library/opengis/src/main/java/org/opengis/filter/FilterFactory2.java#L48>
 to create the various visatable filters that you want to test.
  *   The JsonSelect function is supposed to be used for an integration of 
AppSchema with MongoDB doesn't have a real usage in normal CQL use cases. 
However the filter you provide to the compiler seems fine thus if you are 
receiving exception might be a bug. Try to use XCQL.toFilter or ECQL.toFilter 
methods and see if you reproduce it. The CQL compiler is more strict on the 
filters types and might be the reason of the compilation failure.
  *   I would not delete the  MongoFilterSplitter class. Although it's 
behaviour is faulty in the visiting of Binary ops and Like filters it is 
extended and overrided in GeoServer and its deletion will cause issues. Instead 
you might do you code changes to that one instead of to the anonymous  class in 
the MongoFeatureSource and use it as a base to be extended for it.
  *   Regarding the Temporal operator test with Before: I think it don't have 
much sense to test it in the splitting process as it is currently not supported 
by the plugin. The FilterToMongo class will indeed throw an exception if found. 
The currently supported time filters are in the form dateTimeProp >= 
'2021-06-01T09:42:47.000+02:00' using the comparison operators <>=.
  *   Regarding  the way to open pull requests: usually a pull request should 
have a single focus on a bug or a functionality. So if all of your 
modifications are for the faulty splitting logic I guess you can just open one 
pr.

Regards,

Marco Volpini

==
GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us for more information.
==

Marco Volpini

Software Engineer

GeoSolutions Group
phone: +39 0584 962313

fax: +39 0584 1660272


https://www.geosolutionsgroup.com/

http://twitter.com/geosolutions_it

---


Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 
2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa 
che ogni circostanza inerente alla presente email (il suo contenuto, gli 
eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i 
destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per 
errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei 
comunque grato se potesse darmene notizia.

This email is intended only for the person or entity to which it is addressed 
and may contain information that is privileged, confidential or otherwise 
protected from disclosure. We remind that - as provided by European Regulation 
2016/679 “GDPR” - copying, dissemination or use of this e-mail or the 
information herein by anyone other than the intended recipient is prohibited. 
If you have received this email by mistake, please notify us immediately by 
telephone or e-mail.



On Fri, Jul 8, 202

Re: [Geotools-devel] geotools mongodb bugs in filtering

2022-07-12 Thread Marco Volpini
protected instead of private
>2. Add a class MongoSchemaMemoryStore, comparable to
>MongoSchemaFileStore, backed by a ConcurrentHashMap.  The MongoDB
>schemaless plugin might have made this obsolete.
>
>
>
> I’d like your idea about both dirty changes.
>
>
>
> At this point, the problematic cases become testable.  I created some
> infra and a checkSplitter method that checks if the splitter creates the
> correct output:
>
>
>
> public class FilterSplitterTest {
>
> private MongoFeatureSource source;
>
> @Before
>
> public void setup() {
>
> MongoDataStore ds =
>
> new MongoDataStore("mongodb://dummy/dum", "mem:") {
>
> @Override
>
> protected boolean
> isMongoVersionLessThan2_6(MongoClientURI dataStoreClientURI) {
>
> return false;
>
> }
>
> };
>
> ContentEntry entry = new ContentEntry(ds, new
> NameImpl("dummyEntry"));
>
> this.source = new MongoFeatureSource(entry, null, null);
>
> }
>
>
>
> @Test
>
> public void testProperty() throws Exception {
>
> checkSplitter("a=1", "a=1", "INCLUDE");
>
> }
>
>
>
> @Test
>
> public void testAndMongoOnly() throws Exception {
>
> checkSplitter("a=1 AND b=1", "a=1 AND b=1", "INCLUDE");
>
> }
>
>
>
> @Test
>
> public void testAndBoth() throws Exception {
>
>//FIXME: CQL.toFilter(before) has wrong date format
>
> checkSplitter("a=1 AND b BEFORE 1980-09-03T00:00:00", "a=1", "c
> before 1980-09-03T00:00:00");
>
> }
>
>
>
> @Test
>
> public void testJSonSelect() throws Exception {
>
>//FIXME this makes no sense, but nothing I try makes more
> sense
>
> //checkSplitter("a=jsonSelect('a')", "a=1 AND b=1", "INCLUDE");
>
>
>
> //FIXME need a working example of a function.
>
> //CQL.toFilter("strConcat(NAME, 'suffix')"); does not work even if
> it javadoc claims it should
>
> }
>
>
>
> /**
>
>  * @param beforeSplit  The input to the splitter
>
>  * @param toMongo The part of the filter given to mongodb
>
>  * @param toPostprocess The part of the filter done by postprocessing
>
>  * @throws CQLException
>
>  */
>
> private void checkSplitter(String beforeSplit, String toMongo, String
> toPostprocess)
>
> throws CQLException {
>
> //FIXME getCountInternal ignores the toPostprocess, is this
> correct?
>
> Filter[] split = source.splitFilter(CQL.toFilter(beforeSplit));
>
> Assert.assertEquals(CQL.toFilter(toMongo), split[0]);
>
> Assert.assertEquals(CQL.toFilter(toPostprocess), split[1]);
>
> }
>
> }
>
>
>
> These tests demonstrate some interesting point:
>
>- testProperty/ testAndMongoOnly trigger the bug as they should
>- testAndBoth does not trigger the bug.  As it happens, the AND filter
>can paper over part of the impact if the stacks are not balanced.
>- Bonus bug?  testAndBoth demonstrates that BeforeImpl.toString() uses
>the default java date format and not ISO.
>- Bonus bug? testJSonSelect: As the JSonSelect patch is the root cause
>of all these troubles, I try to write a test that finds out if this feature
>is not damaged by my bugfix.  Unfortunately, I found out that I can’t.
>Every usage  I try crashes with a ClassCastException while parsing.  Even
>the strConCat demo mentioned in the CQL Javadoc crashes.  I saw these same
>crashes on our dev geoserver instance.  But maybe it’s just me not knowing
>what CQL is supposed to do.
>
>
>
> If you can comment on the 2 bonus bugs and the supposed way to use the
> JSonSelect function, I would be grateful.
>
>
>
> All of this is preparation for the real bugfix.I agree with your
> assessment of the bug, but with some caveats:
>
>- There is a class MongoFilterSplitter that duplicates the defective
>filter, with slightly different but also faulty behaviour.  I propose to
>delete it.
>- MongoFeatureSource.splitFilter is the core of the bug.  Both
>BinaryComparisonOperator and PropertyIsLike seem impacted, and as you say,
>should defer to their parent implementation.
>- But:  What if e.g. there is a JSonSelectFunction  but no Literal.
>  

Re: [Geotools-devel] geotools mongodb bugs in filtering

2022-07-08 Thread Hans Yperman
mplementation.
  *   But:  What if e.g. there is a JSonSelectFunction  but no Literal. The 
behaviour from super.visit…() needs to be verified for this function, it might 
not be as intended.  But I need clarification on the 2nd bonus bug to write 
some unit tests that demonstrate correct behaviour before

If I suppose optimistically that you agree with all the above and fill in the 
2nd bonus bug, I can provide a PR.  Do you want 1 PR containing everything, or 
would you prefer me to split it up in multiple PRs for ease of code review?

Hans


From: Marco Volpini 
Sent: vrijdag 8 juli 2022 9:14
To: Hans Yperman 
Cc: geotools-devel@lists.sourceforge.net
Subject: Re: [Geotools-devel] geotools mongodb bugs in filtering

Dear Hans,
all improvements and bug fixes are welcome on both GeoTools and GeoServer. You 
can either provide the bug fix/improvement by yourself by opening a pull 
request on the GeoServer or Geotools repo, either by having someone else doing 
it for you (see the commercial support  page<https://geoserver.org/support/>).
In terms of help that can be provided to fix the issues that you have found, I 
can give you some code pointers (see below) and review the pull requests once 
they are opened.
Please see my replies below to your points:


·MongoDB is schemaless, but it seems geoserver needs 2 schema 
definitions: 1 defining the attributes for layers, a 2nd specific for mongodb 
cached on disk and invisible if you don't know about its existence.  If an 
attribute is added to mongodb, we'll need to delete/adapt the second, and 
restart the geoserver.
This is due the fact that GeoTools and GeoServer are schema driven based on the 
gml standard that always needs a schema upfront in order to properly handle the 
data. Moreover the gt mongodb data store manages SimpleFeatures. Simple Feature 
specs doesn't allow nested objects and arrays to be encoded. The addition 
internal JSON schema is indeed also used to flattenize any nested structure to 
make it SimpleFeature compatible. However if you are interested in serving 
feature only as WFS GeoJSON and as WMS you might think to try the MongoDB 
schemaless 
plugin<https://docs.geoserver.org/latest/en/user/community/schemaless-features/schemaless-mongo/index.html>
 that instead will serve mongodb document as features as they are in the db, 
without flattening properties and without the need to provide a schema upfront. 
Mind that GML output will not be available in this case.


·The mongodb password needs to be hardcoded in the URL.  This means 
geoserver can't hide/encrypt it.  It is readable for everyone who can see the 
datastore.
Yes currently the mongodb pwd can be only provided in the connection string 
input field in plain text, although GeoServer allows to externalize the 
connection string using environment 
properties<https://docs.geoserver.org/master/en/user/datadirectory/configtemplate.html>.
 Feel free to improve the mongodb connection string configuration of the 
MongoDB DataStore. Here some code pointers that you might find usefull: 
MongoDataStoreFactory<https://github.com/geotools/geotools/blob/1fed12f88ab0c9d4ef92aa9cf73af8f94d5c1291/modules/plugin/mongodb/src/main/java/org/geotools/data/mongodb/MongoDataStoreFactory.java#L30>
 and 
MongoDataStore<https://github.com/geotools/geotools/blob/1e036c73a4e420beeeb86c381c797a898a68c8a7/modules/plugin/mongodb/src/main/java/org/geotools/data/mongodb/MongoDataStore.java#L74>.

·I can't get Filtering to work. e.g.  I add cql_filter=datatype%3D'C' 
which adds a filter on an attribute datatype='C'.   Depending on the attribute, 
I get the whole mongodb collection or nothing.  I debugged it to a function 
splitFilter that basically just throws my filter away.  Issue GEOT-5911 has the 
same conclusions as I have and a partial fix, but it seems dead.

This indeed is a bug. I've reproduced it with the mongo db data store (while on 
the mongodb schemaless plugin the same filter works fine). I gave just a quick 
look so I'm not 100% sure but to fix it I believe that at the end of this visit 
method in the PostPreFilter 
Splitter<https://github.com/geotools/geotools/blob/1e036c73a4e420beeeb86c381c797a898a68c8a7/modules/plugin/mongodb/src/main/java/org/geotools/data/mongodb/MongoFeatureSource.java#L340>
 in an else statement a call to the superclass visit(BinaryComparisonOperator 
filter) should be added.

· I have the impression filtering on dates/times is either not 
implemented (throw new UnsupportedOperationException() ) or done by geoserver 
instead of mongo (costing me the mongodb indexes).  Can't test this because of 
the issue above so I might be wrong here.

Currently you can filter by dates/times only if the date time value is stored 
as a string in ISO format on MongoDB. In a cql filter then you can specify a 
time filter using >,<,>=,<= operators like in the following example: 
cql_filter= mydateproperty >'2021-03-15T09

Re: [Geotools-devel] geotools mongodb bugs in filtering

2022-07-08 Thread Marco Volpini
Dear Hans,
all improvements and bug fixes are welcome on both GeoTools and GeoServer.
You can either provide the bug fix/improvement by yourself by opening a
pull request on the GeoServer or Geotools repo, either by having someone
else doing it for you (see the commercial support  page
).
In terms of help that can be provided to fix the issues that you have
found, I can give you some code pointers (see below) and review the pull
requests once they are opened.
Please see my replies below to your points:


   -

   MongoDB is schemaless, but it seems geoserver needs 2 schema
definitions: 1 defining the attributes for layers, a 2nd specific for
mongodb cached on disk and invisible if you don't know about its
existence.  If an attribute is added to mongodb, we'll need to
delete/adapt the second, and restart the geoserver.


This is due the fact that GeoTools and GeoServer are schema driven based on
the gml standard that always needs a schema upfront in order to properly
handle the data. Moreover the gt mongodb data store manages SimpleFeatures.
Simple Feature specs doesn't allow nested objects and arrays to be encoded.
The addition internal JSON schema is indeed also used to flattenize any
nested structure to make it SimpleFeature compatible. However if you are
interested in serving feature only as WFS GeoJSON and as WMS you might
think to try the MongoDB schemaless plugin

that
instead will serve mongodb document as features as they are in the db,
without flattening properties and without the need to provide a schema
upfront. Mind that GML output will not be available in this case.


   -

   The mongodb password needs to be hardcoded in the URL.  This means
geoserver can't hide/encrypt it.  It is readable for everyone who can
see the datastore.


Yes currently the mongodb pwd can be only provided in the connection string
input field in plain text, although GeoServer allows to externalize the
connection string using environment properties
.
Feel free to improve the mongodb connection string configuration of the
MongoDB DataStore. Here some code pointers that you might find usefull:
MongoDataStoreFactory

and MongoDataStore

.


   - I can't get Filtering to work. e.g.  I add
cql_filter=datatype%3D'C' which adds a filter on an attribute
datatype='C'.   Depending on the attribute, I get the whole mongodb
collection or nothing.  I debugged it to a function splitFilter that
basically just throws my filter away.  Issue GEOT-5911 has the same
conclusions as I have and a partial fix, but it seems dead.

This indeed is a bug. I've reproduced it with the mongo db data store
(while on the mongodb schemaless plugin the same filter works fine). I
gave just a quick look so I'm not 100% sure but to fix it I believe
that at the end of this visit method in the PostPreFilter Splitter

in an else statement a call to the superclass
visit(BinaryComparisonOperator filter) should be added.

   -  I have the impression filtering on dates/times is either not
implemented (throw new UnsupportedOperationException() ) or done by
geoserver instead of mongo (costing me the mongodb indexes).  Can't
test this because of the issue above so I might be wrong here.

Currently you can filter by dates/times only if the date time value is
stored as a string in ISO format on MongoDB. In a cql filter then you
can specify a time filter using >,<,>=,<= operators like in the
following example: cql_filter= mydateproperty
>'2021-03-15T09:54:59.000Z'. MongoDB Date type is not currently
supported as it is not supported the translation of cql temporal
operators to mongodb filters.

Regards,

Marco Volpini

==
GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us for more information.
==

Marco Volpini

Software Engineer

GeoSolutions Group
phone: +39 0584 962313

fax: +39 0584 1660272

https://www.geosolutionsgroup.com/

http://twitter.com/geosolutions_it

---

Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE
2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si
precisa che ogni circostanza inerente alla presente email (il suo
contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è
riservata