Hi Tsuy
On Thu, Mar 10, 2011 at 10:21 AM, Tsuyoshi Ito <[email protected]> wrote: > Currently we are storing BLOBs in graphs as base64Binary literal by > default. I am not sure if this is the way to go. I am wondering what > other users/developers think about this. > Yes, the focus of Clerezza is more to have all the data integrated in a very consistent fashion. Obviously this makes certain things harder which are easier with other systems. Some recent improvement I believe have made things easier, further improvements could cover many cases where one would currently recommend using an external system for Blobs (such as hadop). > > i have the following concerns: > > a) back up graphs (export as turtle) and restoring graphs (PUT rdf+xml > or turtle) is cumbersome (takes a long time and consumes a lot ofas hadop > resources), could also lead to out of memory exception (see Andy > Seaborne thread concerning tbd) > When using rdf.storage.externalizer (see resolution of CLEREZZA-286<https://issues.apache.org/jira/browse/CLEREZZA-286>), you can backup the externalized-literals graph and the folder containing the literals separately. Clearly things should be improved to allow fast backups in any situation, maybe supporting provider specific native backup formats (such as a copy of the tdb-data directory for the tdb provider). > b) filtering, adding and removing triples containing BLOBs (large > literals) is slow and can lead to out of memory exception > Are you filtering with a large literal as value? if this is null then the size of literals in the graph shouldn't play a role (for sure when using the externalizer, but the other implementations probably behave similarly) > c) when requesting BLOBs via web service literals (BLOBs) have to be > converted to byte arrays > CLEREZZA-423 will give storage providers the possibility to store byte[] directly, so that the lexical form only gets created when really needed (by using a specific handling of those literals by LiteralFactory). > ... > d) webpublisher who develops js and css have to update the graphs in > order to update the js and css (this is often done by trial and error > for IE compatibility). > css and js should be placed in CLEREZZA-INF/web-resources, bundles with a higher startvelevel will overwrite resource provided by bundles with lower start-level. One approach to update the resource when changed on disk is to use source-bundles. Sourcebundles will ignore java files, so if your bundle is implemented in java you need to install the bundle normally (which is obsolete for pure scala bundles) and set the source folder as source-bundle, as a result you'll have to bundles in the osgi environment, one containing the java classes (which you update as you normally would) and one providing overriding web-resources, like that changes to the files are automatically deployed on save (an avaialble approx. 5 seconds after pressing save). It would be possible (and quite easy) to implement a feature to be able to override web-resources from a filesystem folder (without the inderection of the bundle created on the fly), like that the deployment time could be reduced to ~ 1 second (it could be made even faster by actually checking the filesystem on request, but this would be harder to implement consistently with the architecture). Cheers, Reto
