RE: Paginating Content

Piroumian Konstantin Thu, 06 Jun 2002 08:27:17 -0700

Thanks for a very good explanation and the RT. You've sounded some of my
doubts and added some more to think about.


To summarize the RT: the paginator is rather flexible, but it's not very
well suited for documentation pagination. We have a few good options for
documentation pagination and they are not yet implemented. Right? 

The options are:
        - use only top-level sections for pagination - already implemented,
but not very useful for docs
        - count sections/paragraphs - a little more advanced version of the
first one
        - count chars in some intellectual way - the best way, but not
implemented, have some algorythmical issues and can require some additional
analysis on word/sentence level, e.g. do not break a word on pagination,
etc.

The last two options require also to 're-well-form' the resulting XML, which
can be also non trivial.

Another thing that seems a little limiting to me (or maybe I read not very
careful?) is the pagination rules are static. I can imagin a situation when
we will need to use set some pagination params dynamically, e.g. the item
count. Did I miss it or it's not there?

Are there any solutions, ideas? We could use a Serializer - it's the only
component that can output non-well-formed XML - but in that case we will
end-up with a none well formed HTML.

So, my opinion on pagination is this: we need to count chars/words, break
the page and re-well-form the result. Maybe something like a reverse
Recorder (it can record SAX events and then can fire their end events when
called) can be used to implement 're-well-form' feature?



--
Konstantin Piroumian 
[EMAIL PROTECTED]


> -----Original Message-----
> From: Stefano Mazzocchi [mailto:[EMAIL PROTECTED]] 
> Sent: Thursday, June 06, 2002 4:52 PM
> To: [EMAIL PROTECTED]; Apache Cocoon
> Subject: Paginating Content
> 
> 
> Konstantin Piroumian wrote:
> 
> > Hm... Does anybody have an idea on how to paginate the content?
> 
> Ok, damn it, I don't have time to make mark this up, but 
> since it's the
> content that is useful, here's a small tutorial for the Paginator.
> 
>                                    - 0 -
> 
> Paginator Transformer
> =====================
> 
> classname: org.apache.cocoon.transformation.paginatation.Paginator
> location: scratchpad (available in both cocoon 2.1-dev and 2.0.3-dev)
> 
> Design idea
> -----------
> 
> The paginator is a 'FilterTransformer' on pagination 
> steroids. It works
> filtering SAX events things out and counting page.
> 
> The design isn't very efficient since it has to process the 
> entire file
> to extract a single page. It works nicely with few tens of 
> pages, but I
> would seriously suggest *against* using it for books or very big
> documents.
> 
> The good news is that its cacheable, so if the document doesn't change
> and the same page is requested, there is no need to reprocess the
> document.
> 
> Anyway, for static generation, all this doesn't really matter.
> 
> A simple example of use
> -----------------------
> 
> Suppose you have an XML file like this
> 
>  <a>
>   <b/>
>   <b/>
>   <b/>
>   <b/>
>   <b/>
>   <b/>
>   <b/>
>  </a>
> 
> and you want to paginate this having 3 <b> elements per page. In order
> to achieve this, you write a simple "pagesheet" (which contains the
> instructions for the filter, much like a stylesheet gives instructions
> to an xslt processor) like this:
> 
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0";>
>  <rules>
>   <count type="element" name="b" num="3"/>
>  </rules>
> </pagesheet>
> 
> then you connect the two with a sitemap snippet like this:
> 
>    <map:match pattern="page(*)">
>     <map:generate src="document.xml"/>
>     <map:transform type="paginate" src="pagesheets/images.xml">
>       <map:parameter name="page" value="{2}"/>
>     </map:transform>
>     <map:serialize type="xml"/>
>    </map:match>
> 
> and accessing the URI page(1) yields
> 
>  <a>
>   <b>
>   <b>
>   <b>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"; 
>      current="1" 
>      total="3"
>      current-uri="page(1)"
>      clean-uri="page"
>   />
>  </a>
> 
> which can be easily transformed into something more meaningful.
> 
> Note that the transformer processes all the pages to obtain 
> the 'total'.
> There is no way around this.
> 
> Adding navigation
> -----------------
> 
> The problem with XSLT-based pagination is that the logic is 
> very complex
> to define in XSLT and is rarely reusable across different pagination
> needs. This was the main reason for the creation of a custom 
> components
> for this.
> 
> But since we have a full blown pagesheet language, there are 
> a few other
> things that we can make the Paginator do, most important, navigation.
> 
> For example, with this other pagesheet
> 
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0";>
>  <rules>
>   <count type="element" name="b" num="3"/>
>   <link type="unit" num="1"/>
>  </rules>
> </pagesheet>
> 
> indicates that the transformer must understand how the page 
> was encoded
> in the given URI and provide a link to the pages +/- 1 
> position, if they
> are available.
> 
> So, using the same environment as before we get
> 
>  <a>
>   <b>
>   <b>
>   <b>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"; 
>      current="1" 
>      total="3"
>      current-uri="page(1)"
>      clean-uri="page">
>    <page:link page="2" type="next" uri="page(2)"/>
>   </page:page>
>  </a>
> 
> which indicates
> 
>  1) there is no page 0, so no link is created.
>  2) the link goes to page 2, the type is 'next' (useful for
> visualization) and the URI is page(2) (useful for linking without
> XSLT-specific logic).
> 
> NOTE: the URI is re-encoded using the same pattern, this paginator
> assumes that the 'round brakets' are used to identify page numbering.
> 
> Now, without changing anything, requesting page(2) would yield
> 
>  <a>
>   <b>
>   <b>
>   <b>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"; 
>      current="2" 
>      total="3"
>      current-uri="page(2)"
>      clean-uri="page">
>    <page:link page="1" type="prev" uri="page(1)"/>
>    <page:link page="3" type="next" uri="page(3)"/>
>   </page:page>
>  </a>
> 
> while page(3) would yield:
> 
>  <a>
>   <b>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"; 
>      current="3" 
>      total="3"
>      current-uri="page(3)"
>      clean-uri="page">
>    <page:link page="2" type="prev" uri="page(2)"/>
>   </page:page>
>  </a>
> 
> NOTE: here there is only one <b> because the original document doesn't
> contain enough elements to fill the page entirely. It's the modulo of
> the division.
> 
> A real-life example
> -------------------
> 
> Here are a few pagesheets which are a little more complex:
> 
> Paginating the results from DirectoryGenerator:
> 
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0";>
>  <rules>
>   <count type="element" name="file"
> namespace="http://apache.org/cocoon/directory/2.0"; num="16"/>
>   <link type="unit" num="2"/>
>   <link type="range" value="5"/>
>  </rules>
> </pagesheet>
> 
> This says:
> 
>  1) paginate 16 files per page
>  2) provide me with links to +/- 1 and +/- 2 pages (when available)
>  3) provide me with linkts to +/- 5 (when available)
> 
> So, suppose we have a directory with 300 files and we request page 10,
> the generated page will be
> 
>  <dir:directory>
>   <dir:file ...>
> 
>   [other 15 dir:file]
> 
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"; 
>      current="10" 
>      total="19"
>      current-uri="dir(10)"
>      clean-uri="dir">
>    <page:range-link page="5" type="prev" uri="page(5)"/>
>    <page:link page="8" type="prev" uri="page(8)"/>
>    <page:link page="9" type="prev" uri="page(9)"/>
>    <page:link page="11" type="next" uri="page(11)"/>
>    <page:link page="12" type="next" uri="page(12)"/>
>    <page:range-link page="15" type="next" uri="page(15)"/>
>   </page:page>
>  </dir:directory>
> 
> Asymmetric pagination
> ---------------------
> 
> We have also the ability to indicate different rules for each 
> page, so:
> 
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0";>
>  <rules page="1">
>   <count type="element" name="b" num="5"/>
>   <link type="unit" num="1"/>
>  </rules>
>  <rules>
>   <count type="element" name="b" num="10"/>
>   <link type="unit" num="2"/>
>  </rules>
> </pagesheet>
> 
> Count types
> -----------
> 
> The paginator works by counting stuff. It's up to you to 
> define what you
> want to use for counting and you do so with the attributes of the
> <count> element in the pagesheet.
> 
> This element supports 2 required attributes:
> 
>  num="" -> a number indicating how many times the thing to 
> count must be
> present in this page.
> 
>  type="" -> the type of counting that the paginator must perform. Only
> one type is currently implemented and two are currently supported.
> 
>     type="element" -> makes the paginator counts the 
> startElement() SAX
> events
> 
>     type="chars" -> (not currently implemented!) makes the paginator
> count the chars inclued in the page.
> 
> In case type="element" is used, two other attributes become useful:
> 
>  name="" -> the name of the element (without namespace prefix!)
> 
>  namespace="" -> the URI of the namespace (if not specified, 
> the default
> NS is used)
> 
>                                       - o -
> 
> Ok, from now on some RT on the future of this transformer:
> 
> Using the paginator for docs
> ----------------------------
> 
> I originally wrote the paginator to paginate a directory 
> listing and it
> works great for paginating counting elements. For docs, it could be
> possible to paginate by counting sections or subsections, but this
> doesn't necessarely yield visually balanced pages (which is the reason
> for web pagination).
> 
> This is why I assumed a way to count by chars, even if I didn't go as
> far as implementing it because while paginating by counting 
> elements is
> ok (sounds trivial, but it's not! think of nesting!) paginating by
> counting chars is a real pain, due to the algorithms that must perform
> 'chunking'.
> 
> I mean, assume you have a document like this:
> 
>  <p>this is some <strong>text</strong> that happens 
>  to be <em>chuncked</em></p>
>              ^
>              |
>                                                
> and suppose that counting the chars leads you to the chunking point
> indicated by the arrow above. Cutting the page there results in XML
> which is not well-formed. Providing a way to 're-well-form' the XML
> truncates words. So, we must provide a way to 're-well-form' the XML
> until the first 'block-level' element is encountered (p in this case).
> But this means that the pagesheet must contain at least the list of
> 'block-delimiting' elements (and the current Pagesheet parser 
> parser and
> object model doesn't support this notion).
> 
> Result: pagination at the char-level is not trivial and requires a
> little bit of work on the transformer
> 
> Nesting behavior
> ----------------
> 
> If counting by chars is a pain, even counting elements is not easy.
> Assume you have this:
> 
>  <a>
>   <b>
>    <a>
>     <b>
>      <a>
>       <b/>
>      </a>
>     </b>
>    </a>
>   </b>
>  </a>
> 
> and you want to paginate using one <b> per page, what do the 
> pages look
> like? ok, I'll give you some space to think about it.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Ok, here is my solution (but I'm not sure it's the best):
> 
> page 1:
> 
>  <a>
>   <b>
>    <a>
>     <a/>
>    </a>
>   </b>
>  </a>
> 
> page 2:
> 
>  <a>
>   <a>
>    <b>
>     <a/>
>    </b>
>   </a>
>  </a>
> 
> page 3:
> 
>  <a>
>   <a>
>    <a>
>     <b/>
>    </a>
>   </a>
>  </a>
> 
> I'm pretty sure the current code is buggy someplace because for deep
> nesting like this one, it looses some SAX events someplace and ends up
> making the SAX stream non-well-formed and chocking the subsequent
> transformers which are sensible to well-formness (such as XSLT).
> 
> Note: the above might look like a mental exercise to many, but if you
> think about our Document DTD 1.1, you'll find nested <section> and
> paginating those results in very similar problems. But I'm not sure if
> the solution adopted above is meaningful for a real-case 
> pagination. I'm
> up to suggestions in on this.
> 
> Improving the concept
> ---------------------
> 
> One possible way to improve the concept is to count by XPath results,
> that is you might want to count by 'sections included in sections'.
> 
> Also, another way to improve the system is providing 
> booleans: you might
> want to count 'sessions AND chapters' (probably, XPath helps here as
> well).
> 
> Ok, anyway, hope this helps and sorry for taking so long to 
> write this.
> 
> -- 
> Stefano Mazzocchi      One must still have chaos in oneself to be
>                           able to give birth to a dancing star.
> <[EMAIL PROTECTED]>                             Friedrich Nietzsche
> --------------------------------------------------------------------
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, email: [EMAIL PROTECTED]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

RE: Paginating Content

Reply via email to