I'm using Symphony 3 and LibreOffice 3.3.2. They both display the document with the same overall structure. That is, page X has the same footer, header, footnotes, comments and main text in both applications.
As you mention, i think my only chance for now is to try to understand the underlying logic these applications use to render the document as a series of pages. On Tue, Sep 27, 2011 at 4:03 PM, Dennis E. Hamilton <[email protected]> wrote: > > I think the answer is you can't get there from here today, and it will be an > unpredictable time before the answer would change. > > - Dennis > > JUST FOR FUN, More questions: > > Where are you seeing what the pages are? > > That is, what are you looking at where you see what is page X, what is on > page X, and what are those things that apply to it (headers, footers, notes, > frames, tables, etc.). What do you have to say to go to page X directly and > have it in view? > > It is important that the OpenDocument Format is not page oriented (in > contrast with final forms like PDFs that are). I think you understand that > from the APIs. > > It is some ODF Consumer that puts together the presentation you are looking > at. There is no normative answer to those questions looking at the ODF > format alone. It is pretty much all determined by an ODF Consumer. What > Consumer are you using that you see the pages that you are interested in? > > For the time being, it appears that you need to rely on the programmability > of that consumer, if any, to be able to derive page-relative actions, because > you are interested in features of the rendered document, not the recorded > format. > > Unless there is a simpler way of addressing a concrete case that could work > well enough in the short term. (Mining PDFs might be better, but there might > not be enough structure left. There are doubtless tools for working on PDFs > that might address your problem.) > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Ram > Kane > Sent: Monday, September 26, 2011 06:56 > To: [email protected] > Subject: Re: Is there a way to extract text on a page basis from odt ? > > Thanks all for the replies. > > > > It seems best to revisit the problem statement and extract a > > grounded case: What is the problem that needs to be solved; > > what are the constraints on an acceptable solutions. > > > > Ram, can you please say more about the problem you want to solve? > > What would be the simplest-acceptable result? > > > I need to extract content for a given page inside a doc. By content i > mean header, footer, footnotes, comments, main text from body. > I need to have the option of extracting each of these elements of the > page separately (extracting header for page X, footer for page X, body > text for page X) and not just getting all the content as a single > string. > > I've uploaded a doc that i found on your svn to use as an example here > -> http://goo.gl/OMIEw > > Using the example doc and assuming that i need to extract content for > page 1, i'd need to extract: > > _ header ("ODFDOM in a header") > _ footer ("ODFDOM in a footer") > _ footnotes for page ("ODFDOM in a footnote") > _ main text and all additional content in the page body (" ODFDOM > in a title ODFDOM in a section header ODFDOM in paragraph1 ..." >
