Hello, we have implemented in the Calenco CMS a method based on the IF FOP intermediary XML Format, which is an XML description of each resulting page. It is based on the following simple algorithm:
1. Generate the original document IF XML 2. Generate the IF XML based on new sources 3. Compare with an XSLT the 2 IF documents to extract whatever information we need We have been using this successfully in production to generate PDF with only the modified pages between 2 different versions of a document. HTH, NeoDoc NeoDoc Camille Bégnis Gérant cami...@neodoc.fr Tél: 04.42.52.24.20 http://www.neodoc.fr/ 789, rue de la gare F-13770 Venelles qrcode Le 01/11/2016 à 15:42, Bergfrid Skaara a écrit : > I´d like to know the overall difference between versions X and Y of a > single PDF - automatically so we don't need to inspect and compare > page by page manually (simplifying release process). > > I need to know all PDFs (and what content within them) that have > changed as a result of commits X,Y, Z (simplifying review and release > process). > > I´d like notifications if a commit targeted only at feature A ends up > changing PDFs unrelated to feature A. This would typically indicate > profiling errors or misplaced includes that need to be fixed. > > Inserting build numbers or some similar ID from the CI environment > into the PDF metadata would also help. > > The challenge with comparing PDFs is all the noice you get from layout > changes (white space) and info in headers and footers such as dates > and version numbers. > > And if you go the convert-PDFto-text-before compare route, would it > not be better to compare the intermediate FO files rather than waste > time going through the entire publishing pipeline first? > > We are not looking to replace our current CI environment. Extending > the current build-logic is not a problem, but I´m not sure what the > new logic should look like. > > Bergfrid Skaara Dias > > On Wed, Oct 26, 2016 at 6:48 PM, Stefan Seefeld <ste...@seefeld.name > <mailto:ste...@seefeld.name>> wrote: > > On 26.10.2016 11:04, Bergfrid Skaara wrote: > > Hi, > > > > We use Git to version control our modular DocBook XML code base. I´d > > like to enforce stricter change management than what simply > inspecting > > the Git log manually offers. Specifically, I want to trace each > > modular DocBook XML fie that has been changed up to the PDFs > that will > > be changed as a result. > > > > Tracing the ancestor files through a sequence of xi:includes is > > trivial. My challenges are: > > > > 1. Profiling. I need to trace ancestor elements taking profiling > into > > consideration. > > 2. Entities. We use entities extensively for both aliases and reused > > text. Is there a way to track effects of changed entities without > > starting with a brute force search of all DocBook XML files > using that > > entity? > > > > Are there any tools, standalone or add-ons to oXygen, that support > > this or similar behavior, or am I better off writing my own > script? In > > case of script, which option is better: XSLT or any scripting > language > > facilitating text parsing? > > I'm not quite sure what you mean by "change management", and what > it is > that you want to enforce, and neither what exactly you want to trace. > > Generating a PDF from XML sources typically requires some build logic, > so I think the best you can do is use that very build logic and then > compare (or validate) the generated PDF (or any intermediate formats, > such as FO). That can easily be done in a CI environment (such as > Travis-CI), so you can fully automate that such that the same > process is > executed for each push. > > Stefan > > -- > > ...ich hab' noch einen Koffer in Berlin... > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > docbook-apps-unsubscr...@lists.oasis-open.org > <mailto:docbook-apps-unsubscr...@lists.oasis-open.org> > For additional commands, e-mail: > docbook-apps-h...@lists.oasis-open.org > <mailto:docbook-apps-h...@lists.oasis-open.org> > >