Re: [Geotools-devel] geotools mongodb bugs in filtering

2022-07-08 Thread Hans Yperman
Hi Marco,

Thanks for your quick reaction,

In this mail I’ll concentrate on the Filtering issue, as it is the most 
blocking for us.  The schemaless plugin is interesting and I’ll check it out. 
Fixing the filtering bug causes the dates to partially work.  So I’ll see what 
happens after the core bug is fixed.

I’ve investigated the code wednesday evening, and this is what I came up with:

A lot of the unit tests simply don’t run when no database is attached.  In 
fact, I see a lot of tests without @Test annotation, so maybe they’re from the 
Junit 3 era and never work anymore.  If you have an easy way to run them, e.g. 
on a geoserver build machine,  I’d be interested

Because of this, I tried to create junit tests that does not require mongodb.   
  To make this possible, I had to make 2 dirty changes in the src/main/java 
code:

  1.  In MongoDataStore, declare the method isMongoVersionLessThan2_6  
protected instead of private
  2.  Add a class MongoSchemaMemoryStore, comparable to MongoSchemaFileStore, 
backed by a ConcurrentHashMap.  The MongoDB schemaless plugin might have made 
this obsolete.

I’d like your idea about both dirty changes.

At this point, the problematic cases become testable.  I created some infra and 
a checkSplitter method that checks if the splitter creates the correct output:

public class FilterSplitterTest {
private MongoFeatureSource source;
@Before
public void setup() {
MongoDataStore ds =
new MongoDataStore("mongodb://dummy/dum", "mem:") {
@Override
protected boolean isMongoVersionLessThan2_6(MongoClientURI 
dataStoreClientURI) {
return false;
}
};
ContentEntry entry = new ContentEntry(ds, new NameImpl("dummyEntry"));
this.source = new MongoFeatureSource(entry, null, null);
}

@Test
public void testProperty() throws Exception {
checkSplitter("a=1", "a=1", "INCLUDE");
}

@Test
public void testAndMongoOnly() throws Exception {
checkSplitter("a=1 AND b=1", "a=1 AND b=1", "INCLUDE");
}

@Test
public void testAndBoth() throws Exception {
   //FIXME: CQL.toFilter(before) has wrong date format
checkSplitter("a=1 AND b BEFORE 1980-09-03T00:00:00", "a=1", "c before 
1980-09-03T00:00:00");
}

@Test
public void testJSonSelect() throws Exception {
   //FIXME this makes no sense, but nothing I try makes more sense
//checkSplitter("a=jsonSelect('a')", "a=1 AND b=1", "INCLUDE");

//FIXME need a working example of a function.
//CQL.toFilter("strConcat(NAME, 'suffix')"); does not work even if it 
javadoc claims it should
}

/**
 * @param beforeSplit  The input to the splitter
 * @param toMongo The part of the filter given to mongodb
 * @param toPostprocess The part of the filter done by postprocessing
 * @throws CQLException
 */
private void checkSplitter(String beforeSplit, String toMongo, String 
toPostprocess)
throws CQLException {
//FIXME getCountInternal ignores the toPostprocess, is this correct?
Filter[] split = source.splitFilter(CQL.toFilter(beforeSplit));
Assert.assertEquals(CQL.toFilter(toMongo), split[0]);
Assert.assertEquals(CQL.toFilter(toPostprocess), split[1]);
}
}

These tests demonstrate some interesting point:

  *   testProperty/ testAndMongoOnly trigger the bug as they should
  *   testAndBoth does not trigger the bug.  As it happens, the AND filter can 
paper over part of the impact if the stacks are not balanced.
  *   Bonus bug?  testAndBoth demonstrates that BeforeImpl.toString() uses the 
default java date format and not ISO.
  *   Bonus bug? testJSonSelect: As the JSonSelect patch is the root cause of 
all these troubles, I try to write a test that finds out if this feature is not 
damaged by my bugfix.  Unfortunately, I found out that I can’t.  Every usage  I 
try crashes with a ClassCastException while parsing.  Even the strConCat demo 
mentioned in the CQL Javadoc crashes.  I saw these same crashes on our dev 
geoserver instance.  But maybe it’s just me not knowing what CQL is supposed to 
do.

If you can comment on the 2 bonus bugs and the supposed way to use the 
JSonSelect function, I would be grateful.

All of this is preparation for the real bugfix.I agree with your assessment 
of the bug, but with some caveats:

  *   There is a class MongoFilterSplitter that duplicates the defective 
filter, with slightly different but also faulty behaviour.  I propose to delete 
it.
  *   MongoFeatureSource.splitFilter is the core of the bug.  Both 
BinaryComparisonOperator and PropertyIsLike seem impacted, and as you say, 
should defer to their parent implementation.
  *   But:  What if e.g. there is a JSonSelectFunction  but no Literal. The 
behaviour from super.visit…() needs to be verified for this function, it migh

Re: [Geotools-devel] geotools mongodb bugs in filtering

2022-07-08 Thread Marco Volpini
Dear Hans,
all improvements and bug fixes are welcome on both GeoTools and GeoServer.
You can either provide the bug fix/improvement by yourself by opening a
pull request on the GeoServer or Geotools repo, either by having someone
else doing it for you (see the commercial support  page
).
In terms of help that can be provided to fix the issues that you have
found, I can give you some code pointers (see below) and review the pull
requests once they are opened.
Please see my replies below to your points:


   -

   MongoDB is schemaless, but it seems geoserver needs 2 schema
definitions: 1 defining the attributes for layers, a 2nd specific for
mongodb cached on disk and invisible if you don't know about its
existence.  If an attribute is added to mongodb, we'll need to
delete/adapt the second, and restart the geoserver.


This is due the fact that GeoTools and GeoServer are schema driven based on
the gml standard that always needs a schema upfront in order to properly
handle the data. Moreover the gt mongodb data store manages SimpleFeatures.
Simple Feature specs doesn't allow nested objects and arrays to be encoded.
The addition internal JSON schema is indeed also used to flattenize any
nested structure to make it SimpleFeature compatible. However if you are
interested in serving feature only as WFS GeoJSON and as WMS you might
think to try the MongoDB schemaless plugin

that
instead will serve mongodb document as features as they are in the db,
without flattening properties and without the need to provide a schema
upfront. Mind that GML output will not be available in this case.


   -

   The mongodb password needs to be hardcoded in the URL.  This means
geoserver can't hide/encrypt it.  It is readable for everyone who can
see the datastore.


Yes currently the mongodb pwd can be only provided in the connection string
input field in plain text, although GeoServer allows to externalize the
connection string using environment properties
.
Feel free to improve the mongodb connection string configuration of the
MongoDB DataStore. Here some code pointers that you might find usefull:
MongoDataStoreFactory

and MongoDataStore

.


   - I can't get Filtering to work. e.g.  I add
cql_filter=datatype%3D'C' which adds a filter on an attribute
datatype='C'.   Depending on the attribute, I get the whole mongodb
collection or nothing.  I debugged it to a function splitFilter that
basically just throws my filter away.  Issue GEOT-5911 has the same
conclusions as I have and a partial fix, but it seems dead.

This indeed is a bug. I've reproduced it with the mongo db data store
(while on the mongodb schemaless plugin the same filter works fine). I
gave just a quick look so I'm not 100% sure but to fix it I believe
that at the end of this visit method in the PostPreFilter Splitter

in an else statement a call to the superclass
visit(BinaryComparisonOperator filter) should be added.

   -  I have the impression filtering on dates/times is either not
implemented (throw new UnsupportedOperationException() ) or done by
geoserver instead of mongo (costing me the mongodb indexes).  Can't
test this because of the issue above so I might be wrong here.

Currently you can filter by dates/times only if the date time value is
stored as a string in ISO format on MongoDB. In a cql filter then you
can specify a time filter using >,<,>=,<= operators like in the
following example: cql_filter= mydateproperty
>'2021-03-15T09:54:59.000Z'. MongoDB Date type is not currently
supported as it is not supported the translation of cql temporal
operators to mongodb filters.

Regards,

Marco Volpini

==
GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us for more information.
==

Marco Volpini

Software Engineer

GeoSolutions Group
phone: +39 0584 962313

fax: +39 0584 1660272

https://www.geosolutionsgroup.com/

http://twitter.com/geosolutions_it

---

Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE
2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si
precisa che ogni circostanza inerente alla presente email (il suo
contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è
riservata al