Fwd: EPUB infrastructure (was: Re: handle file in zip with ArchiveAPI)

Jason Duell Tue, 16 Oct 2012 12:06:36 -0700

forwarding from [email protected]: reply to that list...


Jason


-------- Original Message --------
Subject:        EPUB infrastructure (was: Re: handle file in zip with 
ArchiveAPI)
Date:   Tue, 16 Oct 2012 17:59:09 +0300
From:   Henri Sivonen <[email protected]>
To:     [email protected]



On Sat, Oct 13, 2012 at 12:41 AM, Jonas Sicking <[email protected]> wrote:

On Fri, Oct 12, 2012 at 8:08 AM, Jacky Chun <[email protected]> wrote:

Mozilla just has released ArchiveAPI in firefox 17 beta. It is allowed we to 
access in file .zip and base-zip such as epub and this is answer to FileSystem 
API of Google.


The ArchiveReader API is absolutely not a replacement for the
FileSystem API. The goal of the ArchiveReader API is simply to allow
reading contents of .zip files. For example to allow downloading a set
of resources which comprise a game level using a single network
request. Which also compresses any resources which can be compressed.


In particular, ArchiveReader is not an ideal match for the EPUB case,
because you need hierarchical URLs to resolve in such a way that
fetching them results in reading from the zip file.

Additionally Gecko has for a long time supported a non-standard jar:
protocol which allows reading contents from .zip files directly. This
is likely something that is worth looking at standardizing.


(Disclaimer: I don't know if an EPUB reader is a B2G v1 requirement.
What I say below doesn't take B2G v1 schedule into account. Instead,
I'm outlining what I think would be a sensible implementation when
designing without the constraints of the v1 schedule.)

I think it would make sense to push OCF handling into Necko and to
introduce a companion non-jar: URL scheme called widget:. Compared to
merely reading from a zip file, reading from an EPUB zip file (OCF)
involves the following additional things:

* If there is an OPF manifest entry for the resource, the content type
of the resulting nsIChannel should come from the manifest entry. (It’s
necessary to support non-manifest resources, though. Perhaps as
application/octet-stream. At least Adobe products of the 2009 vintage
[or thereabouts] don’t create manifest entries for fonts.)
* If the manifest entry for resource specifies the content type that
Gecko does not support, there should be a redirect to the fallback
resource for that resource.
* If there is a font mangling entry in META-INF/encryption.xml for the
resource, the nsIChannel for the resource should expose the unmangled
and inflated resource. (Mangled resources are "stored" on the zip
level and deflated independently of the zip container. Developer tools
should probably not offer to facilitate the Saving As of mangled fonts
to avoid upsetting font vendors. We should probably support Adobe
mangling in addition to IDPF mangling, since Adobe products of 2009
vintage created books with Adobe mangling. The schemes are almost
identical.)
* Attempts to dereference the root URL of a book should redirect to
the first item in the spine that does not have linear="no"

Potentially, the navigation between spine items when the user tries to
move to the next page from the last page of a spine item could be
C++-assisted, too, to the same extent as moving between pages within
one spine item.

I'd use the widget URL scheme (http://www.w3.org/TR/widgets-uri/) or a
renamed-for-bikeshed version thereof for addressing into OCF. I’d
introduce a JavaScript API for registering an EPUB file with Necko.
Necko would assign an unguessable random (but persistent) value as the
authority component of the widget: URL for the book. When
dereferencing a widget: URL, Necko would first match the authority
component to the EPUB file that was registered with Necko when Necko
generated the authority component value and then match hierarchical
part relative to the zip file root with the OCF-specific extra
processing described in the bulleted list above.

This way, each book would get its own origin and the books themselves
couldn't choose their origin or discover their location in the file
system that hosts them. (If the origin was generated from the UUID of
the book, and malicious book could use the UUID of another book and
potentially read its localStorage or the like.) The randomly-generated
authority part in the widget: URL should be persistent to allow the
use of bookmarks, etc., with the URLs that point into books. (Even
better if the mappings between the randomly-assigned widget: authority
parts could be synced across different devices to make bookmarks sync
useful with books when the user has the same book on multiple
devices.)

In addition to the usual same-origin policy, it would probably be
appropriate to make loads for embedded content fail from outside a
given books origin. That is, a a book probably shouldn't be allowed to
include images from outside its own OCF even though Web sites are
allowed to include different-origin images.

I think it would make sense to handle OPF <guide> & NCX (EPUB2) or the
extraction of the corresponding data from the Navigation Document
(EPUB3) in the JavaScript app.

--
Henri Sivonen
[email protected]
http://hsivonen.iki.fi/
_______________________________________________
dev-webapi mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-webapi




_______________________________________________
dev-tech-network mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-network

Fwd: EPUB infrastructure (was: Re: handle file in zip with ArchiveAPI)

Reply via email to