hi Phil We also have this file size issue, potentially to an even greater degree as we intend future ingests to include audio and video material, and have reached a similar conclusion, that we will store our content files externally as referenced datastreams. We have also found that storing content in this way has a marked improvement on upload times.
Peri Stracchino Digital Library Team University of York ext 4082 -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 11 November 2008 19:42 To: [email protected] Subject: Fedora-commons-users Digest, Vol 21, Issue 8 Send Fedora-commons-users mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/fedora-commons-users or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than "Re: Contents of Fedora-commons-users digest..." Today's Topics: 1. Re: Sizes of managed content datastreams in running repositories (Phil Cryer) 2. Datastream size attribute (Bill Tantzen) 3. Re: Sizes of managed content datastreams in running repositories (Uwe Klosa) 4. Re: fedora/activemq messaging - can add stomp transport? (Bill Branan) ---------------------------------------------------------------------- Message: 1 Date: Tue, 11 Nov 2008 11:16:15 -0600 From: Phil Cryer <[EMAIL PROTECTED]> Subject: Re: [Fedora-commons-users] Sizes of managed content datastreams in running repositories To: fedora-commons <[email protected]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain On Tue, 2008-11-11 at 11:13 -0600, Phil Cryer wrote: > On Tue, 2008-11-11 at 17:18 +0100, Posthumus, Etienne wrote: > > We are in the process of migrating several hundred gigabytes of > > repository content from a CMS to a Fedora 3.x installation. > > One of the issues that we have is the decision whether to store the > > assets (mostly PDF files at the moment) as managed or external > > content. > > Some of the PDF files can be several hundred megabytes in size. > > > > The strategy for the conversion (until now) was to create FOXML > > on-disk with several datastreams embedded, and then do ingest using > > the client command-line scripts. With the large PDF files embedded > > as datastreams, the Java client crashes with out of memory errors, > > even when I increase the heap size to seemingly sufficient sizes ( > > -Xms512m > > -Xmx640m) > > This is similar to what I did with our Tropicos Images collection - I > didn't want to bring in all of the images, they amounted to over a TB, > so instead I use a link to the image that I ingest to fedora as a > referenced datastream, then I have a script that creates a thumbnail > of the image (if one is accessible) and then ingest that thumbnail as > a managed datastream. > > You could consider making a thumbnail of the pdf as the managed, and a > link to the 'real' one on the filesystem or url as a referenced one. Also, another thing I considered was having the 'data' directory under Fedora be mounted to a SAN so that storage wouldn't be an issue. This way it would all be managed via Fedora (unsure if this would be a good or bad thing, I didn't test it out, just food for thought) P > > > > So I wonder, what kind of content are other users storing? What are > > the maximum sizes of stored datastreams observed? And do you ingest > > them with FOXML in one go, or use something like an API-M call to > > add the datastream after the object has already been created? > > > > Any thoughts appreciated. > > > > Etienne Posthumus > > resident propellerhead > > TU Delft Library > > Netherlands > > --- > > http://www.library.tudeflt.nl/ > > -------------------------------------------------------------------- > > ----- This SF.Net email is sponsored by the Moblin Your Move > > Developer's challenge Build the coolest Linux based applications > > with Moblin SDK & win great prizes Grand prize is a trip for two to > > an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ Fedora-commons-users > > mailing list [email protected] > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users -- Phil Cryer | Open Source Dev Lead | web www.mobot.org | skype phil.cryer ------------------------------ Message: 2 Date: Tue, 11 Nov 2008 11:56:08 -0600 From: "Bill Tantzen" <[EMAIL PROTECTED]> Subject: [Fedora-commons-users] Datastream size attribute To: <[email protected]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="us-ascii" Am I doing something wrong? Or? On ingest, the SIZE attribute of my CONTROL_GROUP=X datastreams is set correctly, but when I call modifyDatastreamByValue, the SIZE is always set to 0. Should this value be computed automatically? Am I making the call incorrectly? This is fedora 3.0, btw... Cheers! Bill Bill Tantzen University of Minnesota Libraries [EMAIL PROTECTED] 612-626-9949 (office) 612-325-1777 (cell) ________________________________________________________________ Penny for your thoughts now, my boy Bill. -- Bruce Springsteen ------------------------------ Message: 3 Date: Tue, 11 Nov 2008 19:25:56 +0100 From: "Uwe Klosa" <[EMAIL PROTECTED]> Subject: Re: [Fedora-commons-users] Sizes of managed content datastreams in running repositories To: "Posthumus, Etienne" <[EMAIL PROTECTED]> Cc: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="iso-8859-1" In DiVA we're storing PDF as managed content. The biggest file we had for now was 150MB. When we migrated the content from our old repository we used a quit different approach. We created FOXML files on the disk containing only two XML content streams containing all metadata, pointers to the files, and names and types for the datastreams. We created a client which uses the API from the fedora-client.jar. The client uses the upload function of Fedora and the addDatastream method. We did not have any OutOfMemoryErrors with that approach. I think if you embed such large datastreans you should use -Xmx2G Xms2G. Uwe On Tue, Nov 11, 2008 at 5:18 PM, Posthumus, Etienne <[EMAIL PROTECTED]>wrote: > We are in the process of migrating several hundred gigabytes of > repository content from a CMS to a Fedora 3.x installation. > One of the issues that we have is the decision whether to store the assets > (mostly PDF files at the moment) as managed or external content. > Some of the PDF files can be several hundred megabytes in size. > > The strategy for the conversion (until now) was to create FOXML on-disk > with several datastreams embedded, and then do ingest using the client > command-line scripts. With the large PDF files embedded as datastreams, the > Java client crashes with out of memory errors, even when I increase the heap > size to seemingly sufficient sizes ( -Xms512m -Xmx640m) > > So I wonder, what kind of content are other users storing? What are the > maximum sizes of stored datastreams observed? And do you ingest them with > FOXML in one go, or use something like an API-M call to add the datastream > after the object has already been created? > > Any thoughts appreciated. > > Etienne Posthumus > resident propellerhead > TU Delft Library > Netherlands > --- > http://www.library.tudeflt.nl/ > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Fedora-commons-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 4 Date: Tue, 11 Nov 2008 10:01:03 -0500 From: "Bill Branan" <[EMAIL PROTECTED]> Subject: Re: [Fedora-commons-users] fedora/activemq messaging - can add stomp transport? To: [EMAIL PROTECTED] Cc: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="iso-8859-1" The ActiveMQ documentation on stomp (http://activemq.apache.org/stomp.html) includes an example xml configuration file which can be used to start both the tcp and stomp transports. You would need to make sure that the config file is on your classpath and use the XBean URI syntax ( http://activemq.apache.org/broker-xbean-uri.html) when specifying the provider url in the fedora.fcfg. A couple other ActiveMQ links which may be helpful: http://activemq.apache.org/run-broker.html http://activemq.apache.org/broker-configuration-uri.html Of course, you can also run ActiveMQ outside of Fedora where you'd be free to configure it however you'd like. Bill On Tue, Nov 11, 2008 at 8:59 AM, <[EMAIL PROTECTED]> wrote: > Hi. Please excuse my ignorance, since I am not familiar with ActiveMQ or it > 's configurations. > > I would like to register a client as a fedora messaging subscriber using > the Ruby stomp bindings. > > Is it possible to add the stomp transport to the embedded activemq broker? > > > I tried this without success: > > in fedora.fcfg: > > <param name="java.naming.provider.url" > value="vm:(broker:(tcp://localhost:61616,stomp://localhost:61613))"/> > > This gives a java class loader error. > > Any help or insight would be appreciated. Thanks. > > Juan > > *******Juan C. Rodriguez******* > > Sr. Programmer Analyst > > LAN Administrator, Department of Surgery > > LAN Administrator, Department of Neurosurgery > > Memorial Sloan-Kettering Cancer Center > > Rodriguj at mskcc dot org > > > ===================================================================== > > Please note that this e-mail and any files transmitted with it may be > privileged, confidential, and protected from disclosure under > applicable law. If the reader of this message is not the intended > recipient, or an employee or agent responsible for delivering this > message to the intended recipient, you are hereby notified that any > reading, dissemination, distribution, copying, or other use of this > communication or any of its attachments is strictly prohibited. If > you have received this communication in error, please notify the > sender immediately by replying to this message and deleting this > message, any attachments, and all copies and backups from your > computer. > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Fedora-commons-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ------------------------------ _______________________________________________ Fedora-commons-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-users End of Fedora-commons-users Digest, Vol 21, Issue 8 *************************************************** ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Fedora-commons-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
