On Tue, 2008-11-11 at 17:18 +0100, Posthumus, Etienne wrote:
> We are in the process of migrating several hundred gigabytes of
> repository content from a CMS to a Fedora 3.x installation.
> One of the issues that we have is the decision whether to store the
> assets (mostly PDF files at the moment) as managed or external
> content.
> Some of the PDF files can be several hundred megabytes in size.
>  
> The strategy for the conversion (until now) was to create FOXML
> on-disk with several datastreams embedded, and then do ingest using
> the client command-line scripts. With the large PDF files embedded as
> datastreams, the Java client crashes with out of memory errors, even
> when I increase the heap size to seemingly sufficient sizes ( -Xms512m
> -Xmx640m)

This is similar to what I did with our Tropicos Images collection - I
didn't want to bring in all of the images, they amounted to over a TB,
so instead I use a link to the image that I ingest to fedora as a
referenced datastream, then I have a script that creates a thumbnail of
the image (if one is accessible) and then ingest that thumbnail as a
managed datastream.

You could consider making a thumbnail of the pdf as the managed, and a
link to the 'real' one on the filesystem or url as a referenced one.

P

>  
> So I wonder, what kind of content are other users storing? What are
> the maximum sizes of stored datastreams observed? And do you ingest
> them with FOXML in one go, or use something like an API-M call to add
> the datastream after the object has already been created?
>  
> Any thoughts appreciated.
>  
> Etienne Posthumus
> resident propellerhead
> TU Delft Library
> Netherlands
> ---
> http://www.library.tudeflt.nl/
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________ Fedora-commons-users mailing 
> list [email protected] 
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
-- 
Phil Cryer | Open Source Dev Lead | web www.mobot.org | skype phil.cryer


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to