Hi,

> Url validator plugin reject this kind of url because of ..  .
> I had a look RFC 2396 and w3c standarts. There is no constraint
> about .. except these /../ and /.. kind of statements.

Also Unix systems accept files containing two dots "abc..xyz.txt".
urlfilter-validator should be relaxed to allow such path names.
But paths containing "/../" or "/.." in final position should
be still rejected.

Can you open a Jira to fix this?

> So http://localhost/dir1/dir2/../example1.html kind of urls handled 
> automaticly.

Of course, the URL is valid and the server resolves the path.
But the point is: if even the most trivial URL variants are accepted,
the struggle against duplicates will be lost before it begins.
When operating Nutch, such URLs will harm!
And urlfilter-validator checks whether the configured URL
normalizers work appropriately.

Thanks,
Sebastian

On 04/04/2014 02:59 PM, Mustafa Sertac Turkel wrote:
> hi all,
> 
> I have a seedlist file. The file includes a url something like this:
> 
> http://www.example.com/example-example..-16067h.htm
> 
> Url validator plugin reject this kind of url because of ..  .I had a look RFC 
> 2396 and w3c standarts. There is no constraint about .. except these /../ and 
> /.. kind of statements.
> 
> to try this I prepared a local system which is included folder hierarchy like 
> this
> 
> -dir1 
> ---example1.html
> ---dir2
> ------example2.html
> 
> I set up a apache server and closed url-validator plugin. And i added this 
> http://localhost/dir1/dir2/../example1.html in my seedlist : 
> 
> As I expected, fetched example1.html .
> 
> So http://localhost/dir1/dir2/../example1.html kind of urls handled 
> automaticly.
> 
> I think, 
> 
> Thus http://localhost/dir1/dir2/../example1.html kind of url handle 
> automaticly. So http://example.com/exa..mple.html kind of urls should not be 
> reject. 
> 
> Or should? Is there any point that I missed?. What do you think about this 
> topic.
> 
> 
> Thank you.
> 
> Best Regards.   
> 

Reply via email to