[ http://issues.apache.org/jira/browse/FOR-677?page=all ]

David Crossley updated FOR-677:
-------------------------------

    Fix Version/s: 0.9
                       (was: 0.8-dev)

Moving this issue to next release. As said above:
"We need to have Forrest "relativise" and "absolutise" the links, or make the 
linkmap intelligent enough to relaise that "root/index.html" is the same as 
"/root/index.html"

The latter happens in the Cocoon Linkrewriter Block.


> leading slash in gathered URIs causes double the number of links to be 
> processed
> --------------------------------------------------------------------------------
>
>                 Key: FOR-677
>                 URL: http://issues.apache.org/jira/browse/FOR-677
>             Project: Forrest
>          Issue Type: Bug
>          Components: Core operations
>    Affects Versions: 0.7, 0.8-dev
>            Reporter: David Crossley
>             Fix For: 0.9
>
>
> Doing 'forrest' starts at the virtual document called linkmap.html where the 
> Cocoon crawler gathers the initial set of links, then starts crawling and 
> generating pages. Any new links are pushed onto the linkmap. However, for 
> some sites, such as our own "seed-sample" and our "site-author", there is a 
> sudden jump in the number of URIs remaining to be processed.
> This is due to a URI with a leading slash (e.g. /samples/faq.html). When that 
> URI is processed, it gains a whole new set of links all with leading slashes, 
> and so the list of URIs is potentially doubled.
> This issue could be due to a user error, i.e. adding a link that deliberately 
> begins with a slash. Sometimes, that is unavoidable.
> However, we do have a sitemap transformer to "relativize" and "absolutize" 
> the links. Should it always trim the leading slash? Or are there cases where 
> that should not happen, so cannot generalise?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira