Re: [Geoserver-devel] Live backup and restore and concurrency issues

Andrea Aime Wed, 09 Nov 2011 07:00:35 -0800

On Wed, Nov 9, 2011 at 3:29 PM, Gabriel Roldan <[email protected]> wrote:
> Hey Andrea,
>
> interesting issue.
>
> Just as a heads up, so that some of the comments bellow make more
> sense, I'm starting to investigate on what it would take for a
> scalable catalog/configuration backend, possibly with HA and on a
> cluster environment.


Interesting, and very much welcomed.

This work of ours is unfortunately on a short deadline, and needs
to land on the stable series.
This is why I was trying to suggest something on the low complexity
and low risk.
The deadline is something I cannot move so the other option is to
keep an internal fork of GeoServer for this project and/or enough time to
let the GS api evolve enough to allow this use case to be handled
by a nosql solution (we have no requirement of contributing back the
work in this project).

I have the impression your work is going to be longer term and trunk
only, but I may be wrong.

> The only I know for sure now is that I want to leverage the
> infrastructure already in place in terms of catalog objects
> marshaling/unmarshaling.
> That is, the back end, either a rdbms or a key/value db would just
> need to be able of storing the serialized catalog objects in the same
> way we store them now, in some sort of clob, and provide indexes for
> the common queries (id and name mostly, given the current queries the
> catalog performs).
> That is because the hibernate catalog strikes me as overkill
> complexity for a rather simple issue, and there's just too much
> knowledge and effort put on the xstream persistence that I think it
> would be nice to leverage it.

I can relate to that, indeed a relational dbms seems overkill for the
task at hand, a XML db or nosql db could work as well I guess.
However in some places a relational dbms (thinking Oracle here)
is the only thing allowed to store data, by
policy, so we should also leave the door open to that option to
be developed too, in the future of course.

> So my idea would be to encapsulate all the locking logic inside the
> catalog, so that client code doesn't need to worry about acquiring a
> read lock, but any operation that does write lock would inherently
> lock read operations until done. Internally it could have a queue of
> commands, and the client code that needs to do multiple operations
> like in the examples above, would need to issue a command on the
> catalog. The call would be synchronous, but the catalog would put the
> command on the queue and wait for it to finish before returning.
>
> I think this model would also make it easier to implement any message
> passing needed when on a clustered environment, like in acquiring a
> cluster wide write lock, or notifying other nodes of a configuration
> change so that they release stuff from their own resource pool, etc.

Sounds like a reasonable approach. So the command would be a transaction,
and the code calling it would have to build one for each of its specific needs,
and send it down.
It seems it would make for a 180 shift in how the catalog api works though,
with calling code have to be redone to use commands any time a transaction
is needed.
However... a typical command pattern would allow for just a write lock, allowing
reads to flow uncontrolled.
For example, during a reload I'm not sure we want the GUI or rest
config to access
the catalog, even read only, as they would get incosistent information.
I guess we could attach a lock type to the command?

> agreed. The lru expiration being a separate issue, I think it's
> generally fair to let an ongoing operation fail in face of a
> configuration change that affects the resource(s) it's using (like in
> a wfs request serving a feature type that's removed meanwhile).
> Otherwise we would be making too much effort for a use case that we
> don't even know is the  _correct_ one. Perhaps the admin shuts down
> that resource exactly due to client abuse and wants to stop any usage
> of it immediately, but we would be preventing that. Anyway, just
> thinking out loud on this regard

Yep, I agree.

>>
>> Btw, backup wise we'd like to backup the entire data directory (data, fonts,
>> styling icons and whatnot included), and restore
>> would wipe it out, replace it with the new contentes, and force a reload.
>>
>> GWC is actually putting a wrench in this plan since it keeps a database open,
>> wondering if there is any way to have it give up on the db connections
>> for some time?
> There could be, just lets sync up on what's needed, what the workflow
> would be, and lets make it possible.

I guess I would just need a way to tell GWC

> Would you be backing up the tile caches too?

I don't think so, they are often just too big and can be rebuilt.
However that idea of having a configurable set of subdirs to be backed
up would allow complete or partial tile cache backup (don't know, maybe
the admin is aware that a particular layer is extremely expensive to re-create)

>> As an alternative we could have the backup and restore tools work on
>> selected directories
>> and files, configurable in the request (which could be a post) and
>> with some defaults
>> that only include the gwc xml config file, I guess GWC would
>> just reconfigure itself after the reload and wipe out the caches, right?
> oh I see, makes sense. Yeah, it should reconfigure itself.
>
>>
>> Anyways... opinions? Other things that might go pear shaped that we
>> did not account for?
>
> I think it's an issue worth taking the time to get it right.

I would agree in a perfect world, unfortunately we work in one where
time

> Fortunately we do have the funding to do so now, starting firmly two
> weeks from now as I'll be on vacation for the next two weeks, so I'd
> be glad to run the whole process, exploratory prototyping included, by
> the community.
>
> Do you have a rush to get something working asap? or would be ok in
> going for a larger but steady devel cycle?

Unfortunately not, I have a week at most. Maybe the locking could be
handled at a very high level, so that it's not obstrusive and can be easily
removed later.
For example, at the request filter level, if a request is GUI and the page is
reserved to admins just do a write lock, if it's REST go and lock read if
the request is a get, write otherwise.
Actually this way I could avoid a full blown fork, but just use standard +
custom plugins of sorts.
The downside is that the same problem is afflicting the user base, especially
heavy users of rest config or simply people having multiple people using
the admin GUI, and the solution would not be provided to anyone
else but our customer...

Cheers
Andrea

-- 
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy

phone: +39 0584 962313
fax:      +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Re: [Geoserver-devel] Live backup and restore and concurrency issues

Reply via email to