There are some open issues related to your problem: https://fedora-commons.org/jira/browse/FCREPO-453 https://fedora-commons.org/jira/browse/FCREPO-485
If you have an account on the issue tracker, please vote for them! With regard to your questions: > Is there a legal way to let Fedora read managed content from disc? (other > than the one described above) I don't believe there's an out-of-the-box way to specify file:// URLs (at least, there wasn't last time I looked at the release code). You could probably sort out a way to copy your files into the temp directory and provide the uploaded:// urls yourself, but that seems like a frail solution. > Is this ValidationUtility class out of sync with the rest of the code body? > Does it accidentally prevent something that elsewhere is legal routine? > And if so, when is it going to be repaired? The temp and uploaded protocols are supposed to be internal, and the ValidationUtility is supposed to deal with externally supplied content. In that sense, things aren't broken, but the aforementioned issues reflect a desire for this to change. > Are there any reasons why not to use the hacked ValidationUtility class > (besides from being a hack); will using it corrupt something somewhere else? Once your objects/updates are ingested, Fedora will be tracking the location of the managed content internally, so you should be safe. Regards, Benjamin On Wed, May 27, 2009 at 9:52 AM, henk van den berg <[email protected]> wrote: > Hi, > > I’m trying to ingest a huge amount of data files to the Fedora Repository. > The API-M method addDatastream has a lot of parameters, one of them is the > ‘String dsLocation Location of managed, redirect, or external referenced > datastream content.’ The usual way, I understand, is to first use the > upload(File) method from FedoraClient, returning a temporary Id such as > ‘uploaded://123’, and using this temporary Id as the dsLocation. That works > fine. Another way is to have a real URL, starting with “http://” etc. and > setting that as the value for the dsLocation-parameter. That works equally > fine. > > However, as the data files already reside on the same server where Fedora is > running, I see no reason why to use the presumably slow upload or http > methods. I want Fedora to read the datastream content directly from file. > There seems to be a way to do this. The > fedora.server.storage.DefaultDOManager in it’s commit-method has somewhere: > > } else if (dmc.DSLocation.startsWith(DatastreamManagedContent.TEMP_SCHEME)) > { > > and than starts to read the content from disc. So if I give > “temp:///myDirectory/myFile.txt” (mind the three slashes) as the value of > parameter dsLocation, everything should work fine... Not so! Before we get > at the location in DefaultDOManager, we pass through DefaultManagement and > it says on line 520: > > if (dsLocation != null) { > ValidationUtility.validateURL(dsLocation); > > and the exact implementationof this ValidationUtility method reads: > > public static void validateURL(String url) throws > ValidationException { try { URL candidate = new URL(url); > if (candidate.getProtocol().equals("file")) { > throw new ValidationException("Malformed URL: invalid > protocol: " + url); } } catch (MalformedURLException e) { > if (url.startsWith(DatastreamManagedContent.UPLOADED_SCHEME)) { > return; } throw new > ValidationException("Malformed URL: " + url, e); } } > > And it is obvious, there is no way to get “temp:///myDirectory/myFile.txt” > passed this method, without a ValidationException being thrown at me. > > So if I hack this ValidationUtility class an put the “temp://” protocol to > the if-statement in the catch clause: > > if (url.startsWith(DatastreamManagedContent.UPLOADED_SCHEME) > || url.startsWith(DatastreamManagedContent.TEMP_SCHEME)) > { return; > Than everything works as expected. The file is picked up from disc and added > as managed content to the repository. > > Questions: > > Is there a legal way to let Fedora read managed content from disc? (other > than the one described above) > Is this ValidationUtility class out of sync with the rest of the code body? > Does it accidentally prevent something that elsewhere is legal routine? > And if so, when is it going to be repaired? > Are there any reasons why not to use the hacked ValidationUtility class > (besides from being a hack); will using it corrupt something somewhere else? > > If anyone could answer some or all of these questions, I would be much > obliged. > > > sincerely, > > henk van den berg, > software developer at DANS - Data Archives and Networked Services > http://www.dans.knaw.nl > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity professionals. > Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Fedora-commons-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com _______________________________________________ Fedora-commons-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
