Hi,

I¹m trying to ingest a huge amount of data files to the Fedora Repository.
The API-M method addDatastream has a lot of parameters, one of them is the
ŒString dsLocation Location of managed, redirect, or external referenced
datastream content.¹ The usual way, I understand, is to first use the
upload(File) method from FedoraClient, returning a temporary Id such as
Œuploaded://123¹, and using this temporary Id as the dsLocation. That works
fine. Another way is to have a real URL, starting with ³http://² etc. and
setting that as the value for the dsLocation-parameter. That works equally
fine.

However, as the data files already reside on the same server where Fedora is
running, I see no reason why to use the presumably slow upload or http
methods. I want Fedora to read the datastream content directly from file.
There seems to be a way to do this. The
fedora.server.storage.DefaultDOManager in it¹s commit-method has somewhere:

} else if (dmc.DSLocation.startsWith(DatastreamManagedContent.TEMP_SCHEME))
{

and than starts to read the content from disc. So if I give
³temp:///myDirectory/myFile.txt² (mind the three slashes) as the value of
parameter dsLocation, everything should work fine... Not so! Before we get
at the location in DefaultDOManager, we pass through DefaultManagement and
it says on line 520:

if (dsLocation != null) {
ValidationUtility.validateURL(dsLocation);

and the exact implementationof this ValidationUtility method reads:

    public static void validateURL(String url)            throws
ValidationException {        try {            URL candidate = new URL(url);
if (candidate.getProtocol().equals("file")) {                throw new
ValidationException("Malformed URL: invalid protocol: " + url);            }
} catch (MalformedURLException e) {            if
(url.startsWith(DatastreamManagedContent.UPLOADED_SCHEME)) {
return;            }            throw new ValidationException("Malformed
URL: " + url, e);        }    }

And it is obvious, there is no way to get ³temp:///myDirectory/myFile.txt²
passed this method, without a ValidationException being thrown at me.

So if I hack this ValidationUtility class an put the ³temp://² protocol to
the if-statement in the catch clause:

if (url.startsWith(DatastreamManagedContent.UPLOADED_SCHEME)
|| url.startsWith(DatastreamManagedContent.TEMP_SCHEME)) {
return; 
Than everything works as expected. The file is picked up from disc and added
as managed content to the repository.

Questions:
1. Is there a legal way to let Fedora read managed content from disc? (other
than the one described above)
2. Is this ValidationUtility class out of sync with the rest of the code
body? Does it accidentally prevent something that elsewhere is legal
routine? 
3. And if so, when is it going to be repaired?
4. Are there any reasons why not to use the hacked ValidationUtility class
(besides from being a hack); will using it corrupt something somewhere else?

If anyone could answer some or all of these questions, I would be much
obliged.


 sincerely,

henk van den berg,
software developer at DANS - Data Archives and Networked Services
http://www.dans.knaw.nl


------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to