On Tue, 2008-11-11 at 17:18 +0100, Posthumus, Etienne wrote: > We are in the process of migrating several hundred gigabytes of > repository content from a CMS to a Fedora 3.x installation. > One of the issues that we have is the decision whether to store the > assets (mostly PDF files at the moment) as managed or external > content. > Some of the PDF files can be several hundred megabytes in size. > > The strategy for the conversion (until now) was to create FOXML > on-disk with several datastreams embedded, and then do ingest using > the client command-line scripts. With the large PDF files embedded as > datastreams, the Java client crashes with out of memory errors, even > when I increase the heap size to seemingly sufficient sizes ( -Xms512m > -Xmx640m)
This is similar to what I did with our Tropicos Images collection - I didn't want to bring in all of the images, they amounted to over a TB, so instead I use a link to the image that I ingest to fedora as a referenced datastream, then I have a script that creates a thumbnail of the image (if one is accessible) and then ingest that thumbnail as a managed datastream. You could consider making a thumbnail of the pdf as the managed, and a link to the 'real' one on the filesystem or url as a referenced one. P > > So I wonder, what kind of content are other users storing? What are > the maximum sizes of stored datastreams observed? And do you ingest > them with FOXML in one go, or use something like an API-M call to add > the datastream after the object has already been created? > > Any thoughts appreciated. > > Etienne Posthumus > resident propellerhead > TU Delft Library > Netherlands > --- > http://www.library.tudeflt.nl/ > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ Fedora-commons-users mailing > list [email protected] > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users -- Phil Cryer | Open Source Dev Lead | web www.mobot.org | skype phil.cryer ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Fedora-commons-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
