Hi Laurent,
> In real world use cases, it's acceptable to support index entries
> only at the end of the numbering sequence or in another numbering
> sequence, so let's do post-processing. There are plenty of issues
> to solve but they are mostly related to `I/Os` and XSL and Novelang
> design so I won't discuss them in this list.
I'm just thinking: If we are restricted to an index in a separate page-sequence
after the actual entries, wouldn't it be possible DURING the layout (when
creating the KnuthSequence?) to look forward (or back) and modify the entries
(which already should know their page number), and then layout once?
> One question left, however. I wonder how to hint FO document for
> generating Area Tree or Intermediate Format that I could reparse
> easily, for locating pages containing index entries, and extracting
> index keys and lists of page numbers.
I don't know if that helps you, but I generate the area tree as a DOM Document
and then read data from it. The blocks I'm interested in are marked with known
ids.
private org.w3c.dom.Document multipass(StreamSource source) {
try {
FopFactory fopFactory = getFopFactory();
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
SAXTransformerFactory mpFactory = getMultipassFactory();
Transformer transformer = mpFactory.newTransformer();
TransformerHandler handler = mpFactory.newTransformerHandler();
DOMResult domResult = new DOMResult();
handler.setResult(domResult);
org.apache.fop.render.Renderer targetRenderer =
foUserAgent.getRendererFactory().createRenderer(
foUserAgent, MimeConstants.MIME_PDF);
XMLRenderer renderer = new XMLRenderer();
renderer.mimicRenderer(targetRenderer);
renderer.setContentHandler(handler);
renderer.setUserAgent(foUserAgent);
foUserAgent.setRendererOverride(renderer);
Fop fop = fopFactory.newFop(foUserAgent);
Result res = new SAXResult(fop.getDefaultHandler());
transformer.transform(source, res);
org.w3c.dom.Document doc = domResult.getNode();
// killing all but the last page from the document, because of performance
reasons. IMPORTANT, trust me!!
while
(!doc.getDocumentElement().getLastChild().equals(doc.getDocumentElement().getFirstChild()))
{
doc.getDocumentElement().removeChild(doc.getDocumentElement().getFirstChild());
}
// read the data from the document, get the entries to kill (Strings in Set res
are reference ids of pagenumber entries)
XPathFactory factory=XPathFactory.newInstance();
XPath xPath=factory.newXPath();
NodeList nl =
(NodeList)xPath.evaluate("//blo...@prod-id='"+pageKey+"']", doc,
XPathConstants.NODESET);
if (nl.getLength()<=0) {
return res;
}
Node root = nl.item(0);
for (String key : references.keySet()) {
Set<String> uniques = new HashSet<String>();
Set<String> toCheck = references.get(key);
if (toCheck.size()>1) { // bei einem: immer unique, also egal.
for (String check : toCheck) {
String val =
xPath.evaluate(".//te...@prod-id='"+check+"']/word/text()", root,
XPathConstants.STRING).toString();
if (uniques.contains(val)) {
res.add(check);
}
else {
uniques.add(val);
}
}
}
}
// With this ids I iterate over my structure and remove
elements with ids in the set. Which probably won't help you much.
} catch (TransformerException e) {
System.out.println(e.getMessage());
e.printStackTrace();
} catch (FOPException e) {
System.out.println(e.getMessage());
e.printStackTrace();
// } catch (IOException e) {
// System.out.println(e.getMessage());
// e.printStackTrace();
} catch (SAXException e) {
System.out.println(e.getMessage());
e.printStackTrace();
}
return null;
}
Mit freundlichen Grüßen
Georg Datterl
------ Kontakt ------
Georg Datterl
Geneon media solutions gmbh
Gutenstetter Straße 8a
90449 Nürnberg
HRB Nürnberg: 17193
Geschäftsführer: Yong-Harry Steiert
Tel.: 0911/36 78 88 - 26
Fax: 0911/36 78 88 - 20
www.geneon.de
Weitere Mitglieder der Willmy MediaGroup:
IRS Integrated Realization Services GmbH: www.irs-nbg.de
Willmy PrintMedia GmbH: www.willmy.de
Willmy Consult & Content GmbH: www.willmycc.de
-----Ursprüngliche Nachricht-----
Von: Laurent Caillette [mailto:[email protected]]
Gesendet: Mittwoch, 19. August 2009 14:56
An: [email protected]
Betreff: RE: Referencing multiple pages for index entries
Thanks Georg, the "Index and Pagenumbers" discussion is of great interest.
To sum up:
- By design, FOP doesn't re-layout after page number citations (as shown by
Andreas D.). So a FOP extension won't solve my case if I want nicely-formatted
index entries.
- Complete support of `XSL 1.1` indexes means supporting growing and shrinking
entries. When not at the very end of the page sequence, this implies multi-pass
layout.
- By design, FOP doesn't support multi-pass layout.
- But FOP allows post-processing through Area Tree Format or Intermediate
Format.
In real world use cases, it's acceptable to support index entries only at the
end of the numbering sequence or in another numbering sequence, so let's do
post-processing. There are plenty of issues to solve but they are mostly
related to `I/Os` and XSL and Novelang design so I won't discuss them in this
list.
One question left, however. I wonder how to hint FO document for generating
Area Tree or Intermediate Format that I could reparse easily, for locating
pages containing index entries, and extracting index keys and lists of page
numbers.
Thanks all,
c.
-----Message d'origine-----
De : Georg Datterl [mailto:[email protected]] Envoyé : mercredi 19 août
2009 12:08 À : [email protected] Objet : AW: Referencing multiple
pages for index entries
Hi Laurent,
I had the same problem, except for the "5-7". I only had to remove multiple
entries with identical page numbers. A search for buzzword index in the
archives should unearth that thread ("Index and Pagenumbers").
__________ Information provenant d'ESET NOD32 Antivirus, version de la base des
signatures de virus 4347 (20090819) __________
Le message a été vérifié par ESET NOD32 Antivirus.
http://www.eset.com