[ 
https://issues.apache.org/jira/browse/DROIDS-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Frovarp updated DROIDS-144:
-----------------------------------

    Fix Version/s:     (was: 0.2.0)
                   0.3.0
    
> The AlreadyVisitedFilter should not ignore the parameters of the URI
> --------------------------------------------------------------------
>
>                 Key: DROIDS-144
>                 URL: https://issues.apache.org/jira/browse/DROIDS-144
>             Project: Droids
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.2.0
>            Reporter: Eugen Paraschiv
>             Fix For: 0.3.0
>
>         Attachments: DROIDS-144.patch
>
>
> Thiis filter strips the parameters from the URI and stores only the resulting 
> URI as key in it's visited map. This severely limits the filter, because 
> multiple URIs are now ignored because the filter sees them as visited, when 
> in fact they're not. 
> An example - these are pages to be crawled: 
> http://www.domain.com/abc/?page=0&start=
> http://www.domain.com/abc/?page=1&start=
> Once the first one is analyzed, only the host, and path are considered: 
> http://www.domain.com/abc/
> and so the second URI will be rejected as already visited, when in fact it's 
> a completely new page. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to