Re: [Geoserver-devel] Resources port

Niels Charlier Tue, 27 Oct 2015 06:30:51 -0700

Jody,

Small addition. With respect to point (1), I know aboutGeoServerResourceLoader.lookupGeoServerDataDirectory(servletContext),but then we are bypassing the ResourceStore API and missing some of itsgeneric purpose. The point is if we're making a GeoServerResourceLoaderfrom a ResourceStore, should it not take the baseDirectory from it somehow.


Niels

On 27-10-15 14:21, Niels Charlier wrote:

Hello Jody,
Thanks for your email! That clarifies at least which direction weshould be going with some of these issues. A few remaining importantpoints:
1. Can you fill me in on a way to get the path to the DataDirectorywithout calling dir() ? I'll have to make a patch for that then, but Ireally did not see a way to do that in the current API, if you areworking with a ResourceStore. See the constructorGeoServerResourceLoader(ResourceStore resourceStore). We'll have tochange the resourcestore API to make this possible, no?
2. The problem with the GEOSERVER_DATA_DIRECTORY/data directory, orany other raster/vector data is slightly more complicated than you think.* The REST api uploads both configuration files as well as datafiles, and it uses the same methods for both. I converted the wholemodule to use resources instead of files. This results (for now) indata files being uploaded to the database and then cached when thestore is created.* The distinction is not always simple to make, app-schema hasconfiguration files (usually located in the workspaces dir) that arethreated by geoserver in the same way as data files and they are readby geotools.
Is there a reason why using the database to store and distribute thedata files is not recommended, is it a matter performance/space?
Otherwise, indeed I would recommend allowing the user specify in thejdbcstore configuration file which dirs to ignore. The jdbcstore wouldignore with import as well as return a filebasedresource when thesefolders are being queried. Does that sound good?
3. I like the idea of deleting the data directory after import. Butthen point (1) _absolutely_ needs to be resolved, because otherwisethe data directory will immediately be cached completely, repeatedly.
4. In my opinion, dir() should _always_ be avoided. I would recommendusing resources as much as possible and as long as possible and onlycache when absolutely necessary (usually a 3rd party lib), which meansery dir() is rarely necessary but file() is sufficient. The issue withthe usage of dir() is that it could encourage people to use the filesystem directly, forgetting that changes to the file system have nolasting effect when using the jdbcstore!
Kind Regards
Niels


On 26-10-15 22:28, Jody Garnett wrote:
Thanks Niels, some comments inline, assume this is for GSIP-132<https://github.com/geoserver/geoserver/wiki/GSIP-132> (unless thatis completed already).
On 7 October 2015 at 05:28, Niels Charlier <[email protected]<mailto:[email protected]>> wrote:
    Hi Jody, Gabriel, Kevin

    I have been  porting all modules to use the resources system
    consistently and only use files when necessary (usually external
    library). I still stumbled upon two minor questions/issues I
    wanted to discuss.

    1. Usage of the "data" directory. At the moment the import from
    data directory -> jdbc store ignores the "data" directory. In a
    clustered environment, this directory thus remains instance
    specific, and it would be up to the user to refer to shared files.
Are you talking about GEOSERVER_DATA_DIRECTORY/data? If so that isonly a convention, I have made data directories that used "raster"and "vector" folders for example.
For storing spatial data (GeoTIFF, Shapefile, Image Mosaic here) Ihad the idea of doing something like JNDI but for referencing anexternal folder used for this purpose. This could both provide an"ignore" list (so "data" was not hard coded) and allow for a clusterwith RAID storage mapped to a specific mount.
    At this moment, there is no reason why we couldn't include the
    data dir in the jdbcstore and cache it before loading the
    geotools datastore. This is actually what my modified version of
the rest service already does because it uses resources everywhere.
For configuration files this is what we want.

    Another idea, was to program the jdbcstore to return file based
    resources only when the "data" directory is used, so that it
    definitely will never store those files in the database
    unnecessarily.


Okay pretty sure you are talking about GEOSERVER_DATA_DIRECTORY/data now.
Q: Is it worth removing the files that have been imported intoJDBCConfig from GEOSERVER_DATA_DIRECTORY? This would preventconfusion, and allow GEOSERVER_DATA_DIRECTORY to work strictly as acache (for the few things that require a file to be unpacked on to disk).
    2. In the jdbcstore, should the children of a directory be cached
    when dir() is called?
Cached is on import (so yes). Should the resources be unpacked(staged) to the file system when dir() is called? Yes
    The DataDirectory class uses the dir() method to know the root of
    the data directory, causing the whole data directory to be cached
    at once multiple times unnecessarily, since the root dir is
    usually requested just to know the path for some reason (all code
    where it actually needs files in the data dir, have been replaced
    by resources).
This is a bug, such logic should be replaced. There is another methodto get the root of the GEOSERVER_DATA_DIRECTORY. While we may hardcode some things now it would be wide to have an extension point formodules (such as geowebcache) to mark off working directories thatshould not be cached.
Using dir() to determine the root of GEOSERVER_DATA_DIRECTORY is abad idea, in addition to breaking the design of dir() we are tryingto avoid duplicating code contain data directory structure logic.
    We now always want to use resources as long as possible, only
    calling file() at the last moment if necessary. As a consequence
    the dir() method is actually hardly used for the purpose or
    getting all the files inside that dir. I would suggest on calling
    dir() only to create the dir if it doesn't exist yet and not
    cache its children. There is only one part of code left where
    that would pose a problem, the community module "validation",
    which passes on a whole dir to its geotools counterpart. This
    however could be changed in the geotools module to pass on a
    collection of files instead.
Validation only needed one "validation" folder, so that code could bechanged to use resource("validation").dir().
    After this change, I wonder if we should make a doc page on the
    proper practices of using the Resource API in order to be
    clustering-safe.


Yep, could add to the developers guide under "file access".

------------------------------------------------------------------------------

_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Re: [Geoserver-devel] Resources port

Reply via email to