Hi Eric,

Many thanks for the helpful info which looks quite promising. 

I'm trying to get this to work on SuSe 13.1 and am struggling with the
libraries you require. I managed to get poppler installed and am now doing
battle with fontforge, which I have downloaded from your fork, and compiled:

    http://fontforge.sourceforge.net/index.html#source

    fontforge_full-20120731-b.tar

and now get:

    $> fontforge --version
        Copyright (c) 2000-2012 by George Williams.
        Executable based on sources from 14:57 GMT 31-Jul-2012.
        Library based on sources from 14:57 GMT 31-Jul-2012.
        fontforge 20120731
        libfontforge 20120731

but from the pdf2htmlEX/ directory:

    $> cmake .
        -- checking for module 'libfontforge>=2.0.0'
        --   package 'libfontforge>=2.0.0' not found
        CMake Error at /usr/share/cmake/Modules/FindPkgConfig.cmake:279 
(message):
        A required package was not found
        Call Stack (most recent call first):
        /usr/share/cmake/Modules/FindPkgConfig.cmake:333 
(_pkg_check_modules_internal)
        CMakeLists.txt:75 (pkg_check_modules)

        -- Configuring incomplete, errors occurred!

So, not being a regular c-compiler-nerd, I'm a bit stuck. Any ideas welcome...

-- 
Ciao

Richard Foley

Supporting Naked Activities

http://www.naktiv.net

On Thu, Jun 05, 2014 at 11:30:18AM +0200, Eric Dod?mont wrote:
> I have studied the PDF to ePub fixed layout conversion these last weeks and
> wrote down my findings in a little ebook (20 pages):
> 
> A Practical Guide to Convert a PDF File to an ePub Version 3 Fixed Layout
> File: With Free Open Source Tools.
> https://play.google.com/store/books/details?id=1pytAwAAQBAJ
> 
> This is the beginning of the book (the rest is mainly technical stuffs to
> make the conversion from pdf to html, then from html to epub):
> 
> Chapter 1: Fixed Layout
> 
> Different file formats exist for fixed layout ebooks. Bellow a list of the
> main ones:
> 
> - PDF (Portable Document Format) [.pdf]
> - DjVu (D?ja Vu) [.djvu]
> - ePub (electronic Publication) [.epub]
> - Apple iBooks (similar to ePub) [.ibooks]
> - Amazon Kindle (similar to ePub) [.kf8]
> 
> In this book, we will focus mainly on the conversion of a PDF file to a
> fixed layout ePub file. This is possible since the version 3 of the ePub
> format which includes now the fixed layout mode in addition to the
> traditional flowing text mode.
> 
> This type of conversion can be very useful as the page layout programs
> (e.g. Scribus) are always exporting the final result as a PDF (optimized
> for paper or online publication).
> 
> The "ePub 3.0 Fixed Layout (FXL) Format Specifications" published by the
> International Digital Publishing Forum (IDPF) can be found here:
> 
> http://www.idpf.org/epub/fxl
> 
> A "Field Guide to Fixed Layout for E-Books" published by the Book Industry
> Study Group (BISG) is available for free here:
> 
> http://www.bisg.org/publications/field-guide-fixed-layout-e-books
> 
> The ePub version 3 format uses all the modern Web technologies like HTML5,
> CSS3, JS, SVG, XML, XHTML, WOFF, etc.
> 
> Important remarks:
> 
> 1) This book is only about fixed layout ePub. Fixed layout can be used if
> the book has a sophisticated layout with lots of images. Such fixed layout
> books are made with desktop publishing (DTP) programs like Scribus, Adobe
> InDesign, Quark XPress, or Microsoft Publisher. For books with only text or
> with few images, a flowing text ePub is more suitable and more easy to do.
> 
> 2) Most of the PDF to ePub converters do not work for sophisticated layout
> because they convert a fixed layout PDF into a flowing text ePub, which
> gives most of the time an ugly and unusable result unless the file is
> heavily adapted. They just extract the text and the images from the PDF,
> and put then sequentially into a flowing text ePub with all the layout gone.
> 
> 3) Most of the ePub viewers do not support (yet) the fixed layout. If you
> try to display a fixed layout ePub with such viewer, the result will be
> ugly and unusable. Two good ePub viewers supporting the fixed layout are
> Google Play Books (for tablets running under Google Android or Apple iOS
> (iPad)) and Readium (for laptops or desktops running under Microsoft
> Windows, Apple OS X (Mac), or GNU Linux; it is a Google Chrome browser
> extension). Most of the time, small screens are not suitable for fixed
> layout books. Such books should be read on tablets, not on smartphones.
> 
> * Conversion Methods
> 
> There are three main methods to convert a PDF file to an ePub fixed layout
> file:
> 
> 1) Method 1: Bitmap image only + Hidden text
> 
> Each ePub page is a bitmap image (PNG8, possibly PNG24 or JPEG) of an exact
> replica of the PDF page. This bitmap image is the result of the rendering
> of the text (using vector fonts), bitmap images, and vector images. To
> maintain accessibility (select text, copy/paste text, search text, text to
> speech, etc.), an invisible text layer is added on top of the image. This
> is also the way used to convert a PDF file to a DjVu file. Some PDF files
> are also made like that, mainly when they are the results of scanning paper
> books (the text layer is made by OCR).
> 
> 2) Method 2: Image + Text
> 
> Probably the best method, but more sophisticated than the first one, is to
> add on each ePub page a bitmap image (JPEG, possibly PNG) which is made of
> all bitmap and vector images of the PDF page, or a bitmap and vector image
> (SVG). The text is not converted in a bitmap image or inserted in the SVG
> file, but added on the ePub page by using XHTML5 and CSS3. The CSS uses: a)
> absolute positioning to put the text at the exact same place than in the
> PDF page; b) styles and fonts for the text to look exactly the same as in
> the PDF page. These two last steps are challenging, because HTML5 cannot
> always do what the PDF format can; lots of free and commercial tools exist,
> but most of the time cannot do that correctly when it comes to fixed layout.
> 
> 3) Method 3: SVG only
> 
> The bitmap images, the vector images, and the text are embedded in SVG
> files (one SVG per page). The text should be rendered as true text (with
> fonts), not just outlines of the glyphs (vector images). Also called: SVG
> in the spine (no XHTML).
> 
> In the following of this book, I will only focus on the second method
> (image + text).
> 
> * Conversion Tools
> 
> There are free open source and commercial tools to convert PDF to
> ePub3-fxl, but some have drawbacks. For example, one of these tools give a
> very good visual result, but the text accessibility has a problem: no
> spaces are present. The tool puts words at the correct positions, but does
> not care of the spaces between the words. When you copy/paste a phrase, all
> the spaces are gone. Or, if you search a word, the word is not found
> (unless this word is between parenthesis by example). In fact, all phrases
> are very long words.
> 
> The tool and the method I will describe below is free, and give a very good
> result for the visual aspect and for the text accessibility. The tool I
> will use is pdf2htmlEX, developed by Lu Wang (speudo: coolwanglu), a
> Chinese PhD student at the Department of Computer Science and Engineering
> of the Hong Kong University of Science and Technology. You can find it here:
> 
> http://coolwanglu.github.io/pdf2htmlEX
> 
> This tool, as its name tells us, does a conversion of the PDF pages to HTML
> pages, and does not produce an ePub file. To get an ePub3-fxl file, I will
> show how to use the result produced by pdf2htmlEX, to create the ePub3-fxl
> file. It means mainly: a) remove the HTML viewer that pdf2htmlEX produces
> and integrates in the result; b) create all the files required by the ePub
> format and wrap the result into one unique file.
> 
> Best regards,
> 
> Eric Dod?mont
> 
> 
> On 5 June 2014 11:16, Peter Nermander <peter at nermander.se> wrote:
> 
> > It doesn't fix my problem, but it helps understand why it's sufficiently
> > > complex that the tool is not there, yet. The original point still stands
> > > though, and this makes it clearer, (at least to me), why Scribus is the
> > > right
> > > place to export the PDF, which Scribus knows how to write. Therefore it
> > > would
> > > also know how to export the epub correctly as well. I think.
> > >
> > >
> > No, it's still not that easy. Seems I have to take an example.
> >
> > Imagine that you on each page have 3 pictures with a caption. The caption
> > is next to the picture (not above or below). The picture and caption are
> > separate frames.
> >
> > Now, the pictures alternates between being at the left side (with the
> > caption to the right) and at the right side (with the caption to the
> > right). When you export to epub you surely want all the captions to go
> > either above or below each picture (same for all pictures). But could you
> > describe the algorithm Scribus should use to decide in what order it shall
> > export the pictures and the captions?
> >
> > Going from top left to bottom right will not work well. Note also that
> > going from top left to bottom right can be done sideways first (most
> > relevant for this case) or down first (more relevant for a regular 2 column
> > layout).
> >
> > /Peter
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> > http://lists.scribus.net/pipermail/scribus/attachments/20140605/1dfc8202/attachment.html
> > >
> > ___
> > Scribus Mailing List: scribus at lists.scribus.net
> > Edit your options or unsubscribe:
> > http://lists.scribus.net/mailman/listinfo/scribus
> > See also:
> > http://wiki.scribus.net
> > http://forums.scribus.net
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <http://lists.scribus.net/pipermail/scribus/attachments/20140605/546c14bb/attachment.html>
> ___
> Scribus Mailing List: scribus at lists.scribus.net
> Edit your options or unsubscribe:
> http://lists.scribus.net/mailman/listinfo/scribus
> See also:
> http://wiki.scribus.net
> http://forums.scribus.net

Reply via email to