Thanks Niels, some comments inline, assume this is for GSIP-132
<https://github.com/geoserver/geoserver/wiki/GSIP-132> (unless that
is completed already).
On 7 October 2015 at 05:28, Niels Charlier <[email protected]
<mailto:[email protected]>> wrote:
Hi Jody, Gabriel, Kevin
I have been porting all modules to use the resources system
consistently and only use files when necessary (usually external
library). I still stumbled upon two minor questions/issues I
wanted to discuss.
1. Usage of the "data" directory. At the moment the import from
data directory -> jdbc store ignores the "data" directory. In a
clustered environment, this directory thus remains instance
specific, and it would be up to the user to refer to shared files.
Are you talking about GEOSERVER_DATA_DIRECTORY/data? If so that is
only a convention, I have made data directories that used "raster"
and "vector" folders for example.
For storing spatial data (GeoTIFF, Shapefile, Image Mosaic here) I
had the idea of doing something like JNDI but for referencing an
external folder used for this purpose. This could both provide an
"ignore" list (so "data" was not hard coded) and allow for a cluster
with RAID storage mapped to a specific mount.
At this moment, there is no reason why we couldn't include the
data dir in the jdbcstore and cache it before loading the
geotools datastore. This is actually what my modified version of
the rest service already does because it uses resources everywhere.
For configuration files this is what we want.
Another idea, was to program the jdbcstore to return file based
resources only when the "data" directory is used, so that it
definitely will never store those files in the database
unnecessarily.
Okay pretty sure you are talking about GEOSERVER_DATA_DIRECTORY/data now.
Q: Is it worth removing the files that have been imported into
JDBCConfig from GEOSERVER_DATA_DIRECTORY? This would prevent
confusion, and allow GEOSERVER_DATA_DIRECTORY to work strictly as a
cache (for the few things that require a file to be unpacked on to disk).
2. In the jdbcstore, should the children of a directory be cached
when dir() is called?
Cached is on import (so yes). Should the resources be unpacked
(staged) to the file system when dir() is called? Yes
The DataDirectory class uses the dir() method to know the root of
the data directory, causing the whole data directory to be cached
at once multiple times unnecessarily, since the root dir is
usually requested just to know the path for some reason (all code
where it actually needs files in the data dir, have been replaced
by resources).
This is a bug, such logic should be replaced. There is another method
to get the root of the GEOSERVER_DATA_DIRECTORY. While we may hard
code some things now it would be wide to have an extension point for
modules (such as geowebcache) to mark off working directories that
should not be cached.
Using dir() to determine the root of GEOSERVER_DATA_DIRECTORY is a
bad idea, in addition to breaking the design of dir() we are trying
to avoid duplicating code contain data directory structure logic.
We now always want to use resources as long as possible, only
calling file() at the last moment if necessary. As a consequence
the dir() method is actually hardly used for the purpose or
getting all the files inside that dir. I would suggest on calling
dir() only to create the dir if it doesn't exist yet and not
cache its children. There is only one part of code left where
that would pose a problem, the community module "validation",
which passes on a whole dir to its geotools counterpart. This
however could be changed in the geotools module to pass on a
collection of files instead.
Validation only needed one "validation" folder, so that code could be
changed to use resource("validation").dir().
After this change, I wonder if we should make a doc page on the
proper practices of using the Resource API in order to be
clustering-safe.
Yep, could add to the developers guide under "file access".