AW: Referencing multiple pages for index entries

Georg Datterl Wed, 19 Aug 2009 06:20:42 -0700

Hi Laurent,

> In real world use cases, it's acceptable to support index entries 
> only at the end of the numbering sequence or in another numbering 
> sequence, so let's do post-processing. There are plenty of issues 
> to solve but they are mostly related to `I/Os` and XSL and Novelang 
> design so I won't discuss them in this list.


I'm just thinking: If we are restricted to an index in a separate page-sequence 
after the actual entries, wouldn't it be possible DURING the layout (when 
creating the KnuthSequence?) to look forward (or back) and modify the entries 
(which already should know their page number), and then layout once?

> One question left, however. I wonder how to hint FO document for 
> generating Area Tree or Intermediate Format that I could reparse 
> easily, for locating pages containing index entries, and extracting 
> index keys and lists of page numbers.


I don't know if that helps you, but I generate the area tree as a DOM Document 
and then read data from it. The blocks I'm interested in are marked with known 
ids.

    private org.w3c.dom.Document multipass(StreamSource source) {
        try {
            FopFactory fopFactory = getFopFactory();
            FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
            SAXTransformerFactory mpFactory = getMultipassFactory();
            Transformer transformer =  mpFactory.newTransformer();
            TransformerHandler handler = mpFactory.newTransformerHandler();
            DOMResult domResult = new DOMResult();
            handler.setResult(domResult);

            org.apache.fop.render.Renderer targetRenderer =
            foUserAgent.getRendererFactory().createRenderer(
                            foUserAgent, MimeConstants.MIME_PDF);

            XMLRenderer renderer = new XMLRenderer();
            renderer.mimicRenderer(targetRenderer);
            renderer.setContentHandler(handler);
            renderer.setUserAgent(foUserAgent);

            foUserAgent.setRendererOverride(renderer);
            
            Fop fop = fopFactory.newFop(foUserAgent);
            Result res = new SAXResult(fop.getDefaultHandler());
            transformer.transform(source, res);
            org.w3c.dom.Document doc = domResult.getNode();
// killing all but the last page from the document, because of performance 
reasons. IMPORTANT, trust me!!
        while 
(!doc.getDocumentElement().getLastChild().equals(doc.getDocumentElement().getFirstChild()))
 {
            
doc.getDocumentElement().removeChild(doc.getDocumentElement().getFirstChild());
        }
// read the data from the document, get the entries to kill (Strings in Set res 
are reference ids of pagenumber entries) 
                XPathFactory factory=XPathFactory.newInstance();
            XPath xPath=factory.newXPath();
            NodeList nl = 
(NodeList)xPath.evaluate("//blo...@prod-id='"+pageKey+"']", doc, 
XPathConstants.NODESET);
            if (nl.getLength()<=0) {
                return res;
            }
            Node root = nl.item(0);
            for (String key : references.keySet()) {
                Set<String> uniques = new HashSet<String>();
                Set<String> toCheck = references.get(key);
                if (toCheck.size()>1) { // bei einem: immer unique, also egal.
                    for (String check : toCheck) {
                        String val = 
xPath.evaluate(".//te...@prod-id='"+check+"']/word/text()", root, 
XPathConstants.STRING).toString();
                        if (uniques.contains(val)) {
                            res.add(check);
                        }
                        else {
                            uniques.add(val);
                        }
                    }
                }
            }
                // With this ids I iterate over my structure and remove 
elements with ids in the set. Which probably won't help you much.
        } catch (TransformerException e) {
            System.out.println(e.getMessage());
            e.printStackTrace();
        } catch (FOPException e) {
            System.out.println(e.getMessage());
            e.printStackTrace();
//        } catch (IOException e) {
//            System.out.println(e.getMessage());
//            e.printStackTrace();
        } catch (SAXException e) {
            System.out.println(e.getMessage());
            e.printStackTrace();
        }
        return null;
    }


Mit freundlichen Grüßen
 
Georg Datterl
 
------ Kontakt ------
 
Georg Datterl
 
Geneon media solutions gmbh
Gutenstetter Straße 8a
90449 Nürnberg
 
HRB Nürnberg: 17193
Geschäftsführer: Yong-Harry Steiert 

Tel.: 0911/36 78 88 - 26
Fax: 0911/36 78 88 - 20
 
www.geneon.de
 
Weitere Mitglieder der Willmy MediaGroup:
 
IRS Integrated Realization Services GmbH:    www.irs-nbg.de 
Willmy PrintMedia GmbH:                            www.willmy.de
Willmy Consult & Content GmbH:                 www.willmycc.de 
-----Ursprüngliche Nachricht-----
Von: Laurent Caillette [mailto:[email protected]] 
Gesendet: Mittwoch, 19. August 2009 14:56
An: [email protected]
Betreff: RE: Referencing multiple pages for index entries


Thanks Georg, the "Index and Pagenumbers" discussion is of great interest.

To sum up:
- By design, FOP doesn't re-layout after page number citations (as shown by 
Andreas D.). So a FOP extension won't solve my case if I want nicely-formatted 
index entries.
- Complete support of `XSL 1.1` indexes means supporting growing and shrinking 
entries. When not at the very end of the page sequence, this implies multi-pass 
layout.
- By design, FOP doesn't support multi-pass layout.
- But FOP allows post-processing through Area Tree Format or Intermediate 
Format.

In real world use cases, it's acceptable to support index entries only at the 
end of the numbering sequence or in another numbering sequence, so let's do 
post-processing. There are plenty of issues to solve but they are mostly 
related to `I/Os` and XSL and Novelang design so I won't discuss them in this 
list.

One question left, however. I wonder how to hint FO document for generating 
Area Tree or Intermediate Format that I could reparse easily, for locating 
pages containing index entries, and extracting index keys and lists of page 
numbers.

Thanks all,

c.



-----Message d'origine-----
De : Georg Datterl [mailto:[email protected]] Envoyé : mercredi 19 août 
2009 12:08 À : [email protected] Objet : AW: Referencing multiple 
pages for index entries

Hi Laurent,

I had the same problem, except for the "5-7". I only had to remove multiple 
entries with identical page numbers. A search for buzzword index in the 
archives should unearth that thread ("Index and Pagenumbers").


__________ Information provenant d'ESET NOD32 Antivirus, version de la base des 
signatures de virus 4347 (20090819) __________

Le message a été vérifié par ESET NOD32 Antivirus.

http://www.eset.com

AW: Referencing multiple pages for index entries

Reply via email to