Re: Where to store BLOBs in clerezza?

Reto Bachmann-Gmuer Fri, 11 Mar 2011 09:30:52 -0800

Hi Tsuy

On Thu, Mar 10, 2011 at 10:21 AM, Tsuyoshi Ito <[email protected]> wrote:

> Currently we are storing BLOBs in graphs as base64Binary literal by
> default. I am not sure if this is the way to go. I am wondering what
> other users/developers think about this.
>
Yes, the focus of Clerezza is more to have all the data integrated in a very
consistent fashion. Obviously this makes certain things harder which are
easier with other systems. Some recent improvement I believe have made
things easier, further improvements could cover many cases where one would
currently recommend using an external system for Blobs (such as hadop).

>
> i have the following concerns:
>
> a) back up graphs (export as turtle) and restoring graphs (PUT rdf+xml
> or turtle) is cumbersome (takes a long time and consumes a lot ofas hadop
> resources), could also lead to out of memory exception (see Andy
> Seaborne thread concerning tbd)
>
When using rdf.storage.externalizer (see resolution of
CLEREZZA-286<https://issues.apache.org/jira/browse/CLEREZZA-286>),
you can backup the externalized-literals graph and the folder containing the
literals separately. Clearly things should be improved to allow fast backups
in any situation, maybe supporting provider specific native backup formats
(such as a copy of the tdb-data directory for the tdb provider).

> b) filtering, adding and removing triples containing BLOBs (large
> literals) is slow and can lead to out of memory exception
>
Are you filtering with a large literal as value? if this is null then the
size of literals in the graph shouldn't play a role (for sure when using the
externalizer, but the other implementations probably behave similarly)

> c) when requesting BLOBs via web service literals (BLOBs) have to be
> converted to byte arrays
>

CLEREZZA-423 will give storage providers the possibility to store byte[]
directly, so that the lexical form only gets created when really needed (by
using a specific handling of those literals by LiteralFactory).

> ...
> d) webpublisher who develops  js and css have to update the graphs in
> order to update the js and css (this is often done by trial and error
> for IE compatibility).
>

css and js should be placed in CLEREZZA-INF/web-resources, bundles with a
higher startvelevel will overwrite resource provided by bundles with lower
start-level. One approach to update the resource when changed on disk is to
use source-bundles. Sourcebundles will ignore java files, so if your bundle
is implemented in java you need to install the bundle normally (which is
obsolete for pure scala bundles) and set the source folder as source-bundle,
as a result you'll have to bundles in the osgi environment, one containing
the java classes (which you update as you normally would) and one providing
overriding web-resources, like that changes to the files are automatically
deployed on save (an avaialble approx. 5 seconds after pressing save).

It would be possible (and quite easy) to implement a feature to be able to
override web-resources from a filesystem folder (without the inderection of
the bundle created on the fly), like that the deployment time could be
reduced to ~ 1 second (it could be made even faster by actually checking the
filesystem on request, but this would be harder to implement consistently
with the architecture).

Cheers,
Reto

Re: Where to store BLOBs in clerezza?

Reply via email to