On 2 January 2014 13:59, Hilton Gibson <[email protected]> wrote:

> PDF/A-3 makes only a single, fairly monumental change. In the PDF/A-2
> specification users were allowed to embed files, but only PDF/A files.
> PDF/A-3 now allows the embedding of any arbitrary file format, including
> XML, CSV, CAD, images and any others.
>
> At first glance this sounds like a gigantic betrayal of everything that
> the format has stood for. Why define a subset of PDF attributes to ensure
> the long-term comprehension of the file if you’re going to turn around and
> allow the kitchen sink to be embedded within it? (You can follow some of
> the original discussion of this change here.)
>
>
> http://blogs.loc.gov/digitalpreservation/2012/11/all-in-embedded-files-in-pdfa/?loclr=blogsig
>
> This is very bad news for digital preservation because it is now
> possible to "hide" proprietary digital inside the PDF/A digital container.
> What will future researchers think when they stumble upon these "hidden"
> closed formats that they will not be able to use?
>
> What were they thinking??
>

There are probably nice, inventive ways to abuse this. Probably by having a
proprietary application that uses the format as a container, but then has
all the meat of what it's doing in embedded files - although that wouldn't
really be usable as a PDF/A in the standard way, anyway. But taking a step
back, the alternative to not being allowed to embed arbitrary file data is
that all of that data must be held separately. Yes, that means you can
easily perform preservation activities around those files. But it also
increases the likelihood that someone will get the PDF/A file, and not the
additional arbitrary files.

Given the choice between not having the files at all, and having the files
embedded in the PDF/A - albeit possibly in a 'dead' format - then for many
people having the files will be a clear winner. Dead formats can generally
still be resurrected by some means (get an emulator, run a file conversion,
etc.). It's still more useful than having no file.

If you are actively involved in preserving PDF/A files, then the "static
readable" component remains the same regardless. You've just got the
possibility of extra, arbitrary files inside the PDF/A - in which case,
treat it like an archive (like zip, tar, etc.). Index the embedded files,
extract the embedded files and run preservation tasks against them as
necessary. Create new PDF/A-3 bundles.

At no point have you degraded what is comprehensible about the PDF/A -
you've just added stuff that might not be.

Rule No 1 in digital preservation - capture everything. If you don't
capture it, you can't preserve it. To that end, this should be a good thing
for preservation. We just need to be aware of an extra hoop that we can /
should jump through for format migration.

G
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to