Good questions, and questions that probably only have some provisional answers regarding Akubra.

In our setup we patched fedora to support the irods: pseudo-protocol for external. For managed data streams we have a pre-Akubra low-level storage module for iRODS. Our objects and datastreams folder structures are timestamp driven, just like the default fedora. Also like the default LLS module we index and store the map of PIDs and DSIDs to iRODS paths in the fedora db. We were able to optimize the reindex() part of the LLS module to use an iRODS query instead of a folder crawl.

With a pid-based folder structure for objects and datastreams, you might be able to do away with the fedora db tables, as every path would be predictable. Also, I suppose you could put the fedora DS or object token in an iRODS AVU and use the ICAT as the index..

One potential advantage that could come with Akubra is optimized transfer between irods: external and irods managed locations. Here we use irods: external locations to stage files. It would be significant if we could move these staged files to their archival location without streaming them in their entirety through Fedora, via the iRODS parallel transfer protocols or a rule-driven replication. The same potential exists for other protocols I guess.

I'm finding more and more than our bottleneck is transferring the same data many times. We probably want to be able to ingest 10,000 objects with data in an hour or two at some point. With staging in advance and iRODS protocols I think that bottleneck is reduced. One could probably go even further by defining data stream paths with a UUID that is permanently assigned to a staged file. Then ingest could register the existing file to a Fedora object and trigger asynchronous replication to archival storage. That would eliminate transfers at ingest time, sort of a modified version of the upload: pseudo-protocol..

Apologies to the list for all the iRODS chatter. I think the discussion applies to other schemes too.
Greg

On 12/03/2010 05:43 AM, Jörg Panzer wrote:
Hello Greg,

thank you for the attachement.

What do you think about to use the DAVIS WebDAV component? So we could get iRODS objects over HTTP. In the "external reference" and "redirected" datastreams the location is set with a HTTP-URI. This doesn't need any change on the fcrepo-server. The additional component is clearly a drawback.

Yes, i use an Akubra-iRODS module. With "managed" datastreams it works fine. Due to that fcrepo only support http scheme, i only use "managed" datastreams, yet. But my actual approach is neither optimal nor intuitive. Because, i derive the location of an iRODS file from its Fedora object- or datastream-id. So i get something like "datastreamStore/object-id/DS1/DS1.0". I don't like the way to put the current datastream with its version in the same folder. Moreover i impose to use this structure for direct ingest over iRODS. What is your approach to this?


Best wishes,
jörg

Am 02.12.2010 um 15:56 schrieb Greg Jansen:

Hey Jörg,
We are doing something very similar to your case. I have an ExternalContentManager that works with iRODS, but it also required patching two other classes in fcrepo-server. I've pasted a patch below and attached the ExternalContentManager class. We just support the iRODS protocol so that we can ingest from an iRODS staging area. I think your project is also pursuing an Akubra-iRODS module? Will you still be pursuing managed datastreams in iRODS?

I hope this helps.

thanks,
Greg Jansen
UNC Chapel Hill

THE PATCH:
Index: org/fcrepo/server/storage/ContentManagerParams.java
===================================================================
--- org/fcrepo/server/storage/ContentManagerParams.java (revision 8829)
+++ org/fcrepo/server/storage/ContentManagerParams.java    (working copy)
@@ -1,5 +1,5 @@
/* The contents of this file are subject to the license and copyright terms - * detailed in the license directory at the root of the source tree (also + * detailed in the license directory at the root of the source tree (also
  * available online at http://fedora-commons.org/license/).
  */
 package org.fcrepo.server.storage;
@@ -11,10 +11,10 @@


 /**
- * Simple data transfer object for the content manager.
+ * Simple data transfer object for the content manager.
  * This should avoid breaking the content manager interface every
- * time the parameters change.
- *
+ * time the parameters change.
+ *
  * @version $Id$
  *
  */
@@ -26,11 +26,11 @@
     private String protocol;
     private boolean bypassBackend = false;
     private Context context;
-
-
+
+
     public ContentManagerParams(){
     }
-
+
public ContentManagerParams(String url, String mimeType, String username, String password){
         setUrl(url);
         this.mimeType = mimeType;
@@ -41,11 +41,11 @@
     public ContentManagerParams(String url){
         setUrl(url);
     }
-
+
     public String getProtocol() {
         return protocol;
     }
-
+
     public String getUrl() {
         return url;
     }
@@ -54,6 +54,9 @@
         try {
             this.protocol = new URL(url).getProtocol();
         } catch (MalformedURLException e) {
+            if(url.startsWith("irods://")) {
+            return;
+            }
             throw new RuntimeException(e);
         }
     }
@@ -79,7 +82,7 @@
     public void setBypassBackend(boolean b) {
         bypassBackend = b;
     }
-
+
     public boolean isBypassBackend() {
         return bypassBackend;
     }
Index: org/fcrepo/server/validation/ValidationUtility.java
===================================================================
--- org/fcrepo/server/validation/ValidationUtility.java (revision 8829)
+++ org/fcrepo/server/validation/ValidationUtility.java    (working copy)
@@ -50,7 +50,7 @@
      * @param controlGroup
* The control group of the datastream the URL belongs to.
      *
-     * @throws ValidationException
+     * @throws ValidationExcept
      *             if the URL is malformed.
      */
     public static void validateURL(String url, String controlGroup)
@@ -65,6 +65,8 @@
         } catch (MalformedURLException e) {
if (url.startsWith(DatastreamManagedContent.UPLOADED_SCHEME)) {
                 return;
+            } else if (url.startsWith("irods://")) {
+                return;
             }
             throw new ValidationException("Malformed URL: " + url, e);
         }


On 12/02/2010 06:38 AM, Jörg Panzer wrote:
Hello Steve,

we plan to use fedora as frontend for iRODS. So, we will allow direct ingest in iRODS. The ingested files are subsequently registered in fedora by an callback mechanism.

The idea was, to do this with an "external referenced" datastream.

<foxml:datastream ID="DS5" STATE="A" CONTROL_GROUP="E" VERSIONABLE="true"> <foxml:datastreamVersion ID="DS5.0" LABEL="ds5 label" CREATED="2010-12-02T10:45:17.785Z" MIMETYPE="application/pdf"> <foxml:contentLocation TYPE="URL" REF="irods://zone/home/user/datastreams/info:fedora/test:1/DS5/DS5.0"/>
</foxml:datastreamVersion>
</foxml:datastream>

So i take a look at ExternalContentManager and DefaultExternalContentManager.

Thanks,
Jörg

--


Am 02.12.2010 um 10:31 schrieb Steve Bayliss:

Hi Jörg

Currently only http and file protocols are supported.

External content is managed by an ExternalContentManager - the only one
implemented currently is the DefaultExternalContentManager.  This is
specified in fedora.fcfg so theoretically it's possible to provide an
alternative to manage other URI schemes (or indeed extend the existing
content manager to do this; perhaps some configuration information to
specify how different protocols should be resolved).
For "R" datastreams Fedora simply issues a temporary redirect with the URI to redirect to, so it would be the browser (or client) responsibility to
handle the resolution for non-http schemes.

This sounds to me like, it is possible for "R" to use other schemes, but we get a Malformed URL Exception on ingest.

For "E" datastreams the resolution is via the ExternalContentManager, so
implementation of additional URL schemes would be an option for these.

What URL schemes would you like to see supported?

Regards
Steve



-----Original Message-----
From: Jörg Panzer [mailto:[email protected]]
Sent: 02 December 2010 09:17
To: Support and info exchange list for Fedora users.
Subject: [fcrepo-user] Control Group - supported URL schemes


Hello,

can someone tell me, if there are other URL schemes supported for the
control groups "Redirect" and "External Referenced" in addition to "http"?

Regards,
Jörg

---
Jörg-H. Panzer

Georg-August-Universitaet Goettingen
State and University Library
----------------------------------------------------------------------------
--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

---
Jörg-H. Panzer

Georg-August-Universitaet Goettingen
State and University Library


------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App&  Earn a Chance To Win $500!
Tap into the largest installed PC base&  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev


_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


--
___
Gregory N. Jansen
Developer - Carolina Digital Repository
UNC Chapel Hill Libraries
<IrodsExternalContentManager.java>------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

---
Jörg-H. Panzer

Georg-August-Universitaet Goettingen
State and University Library
[email protected] <mailto:[email protected]>


------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App&  Earn a Chance To Win $500!
Tap into the largest installed PC base&  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev


_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


--
___
Gregory N. Jansen
Developer - Carolina Digital Repository
UNC Chapel Hill Libraries

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to