Hello Raj,
This site loads its content via Javascript, so you need a protocol plugin
that supports it. HtmlUnit does not seem to work with this site, but
Selenium does. Please change your protocol plugin accordingly in you
plugin.includes configuration directive.
I tested it with our own parser as i have no Nutch here at the moment. But
it has support for Selenium so it should work, even though the version is a
bit outdated.
Regards,
Markus
Op za 17 dec. 2022 om 10:28 schreef Raj Chidara :
>
> Hi
> I am not able to crawl this site https://www.ich.org/. Can any one
> suggest a solution for this. This site does not has robots.txt file. When
> I try to check robots.txt, site is shown as under construction and
> returning response status 200. Could it be any reason for issue?
>
>
>
> Thanks and Regards
>
> Raj Chidara
>
>
>
>
>
>
>