Hi all it's just one example about this
when I wanted to mirror one wordpress-based site (http://media-mera.ru) I have noticed that the process takes too much time I have found that the reason is "reply to" links which are provided for every user comment: http://media-mera.ru/articles/socially_useful?replytocom=6192#respond http://media-mera.ru/articles/socially_useful?replytocom=6194#respond http://media-mera.ru/articles/socially_useful?replytocom=6358#respond etc actually these addreses is just the same page http://media-mera.ru/articles/socially_useful so I have added -R to the command line -R '*replytocom=*' but nothing changed and after some googling I have found the note about wget behaviour: ------- http://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html#Types-of-Files .. Note that these two options do not affect the downloading of html files (as determined by a ‘.htm’ or ‘.html’ filename prefix). This behavior may not be desirable for all users, and may be changed for future versions of Wget. .. ------- yes, I absolutely agree, it should be changed, judged by wget output the total downloaded traffic exceeds resulted saved mirror in 10 times! PS wget is running on this site 30 minutes, httrack - only 1,5 PPS while was writing, I have found even special wordpress plugin which is intended to reduce traffic of "replytocom" links - http://wordpress.org/extend/plugins/replytocom-redirector/ -- with best regards Dmitry Bolshakov
