Hi, > Url validator plugin reject this kind of url because of .. . > I had a look RFC 2396 and w3c standarts. There is no constraint > about .. except these /../ and /.. kind of statements.
Also Unix systems accept files containing two dots "abc..xyz.txt". urlfilter-validator should be relaxed to allow such path names. But paths containing "/../" or "/.." in final position should be still rejected. Can you open a Jira to fix this? > So http://localhost/dir1/dir2/../example1.html kind of urls handled > automaticly. Of course, the URL is valid and the server resolves the path. But the point is: if even the most trivial URL variants are accepted, the struggle against duplicates will be lost before it begins. When operating Nutch, such URLs will harm! And urlfilter-validator checks whether the configured URL normalizers work appropriately. Thanks, Sebastian On 04/04/2014 02:59 PM, Mustafa Sertac Turkel wrote: > hi all, > > I have a seedlist file. The file includes a url something like this: > > http://www.example.com/example-example..-16067h.htm > > Url validator plugin reject this kind of url because of .. .I had a look RFC > 2396 and w3c standarts. There is no constraint about .. except these /../ and > /.. kind of statements. > > to try this I prepared a local system which is included folder hierarchy like > this > > -dir1 > ---example1.html > ---dir2 > ------example2.html > > I set up a apache server and closed url-validator plugin. And i added this > http://localhost/dir1/dir2/../example1.html in my seedlist : > > As I expected, fetched example1.html . > > So http://localhost/dir1/dir2/../example1.html kind of urls handled > automaticly. > > I think, > > Thus http://localhost/dir1/dir2/../example1.html kind of url handle > automaticly. So http://example.com/exa..mple.html kind of urls should not be > reject. > > Or should? Is there any point that I missed?. What do you think about this > topic. > > > Thank you. > > Best Regards. >