Re: Not able to crawl ich

2022-12-17 Thread Markus Jelsma
Hello Raj,

This site loads its content via Javascript, so you need a protocol plugin
that supports it. HtmlUnit does not seem to work with this site, but
Selenium does. Please change your protocol plugin accordingly in you
plugin.includes configuration directive.

I tested it with our own parser as i have no Nutch here at the moment. But
it has support for Selenium so it should work, even though the version is a
bit outdated.

Regards,
Markus

Op za 17 dec. 2022 om 10:28 schreef Raj Chidara :

>
> Hi
>   I am not able to crawl this site https://www.ich.org/.  Can any one
> suggest a solution for this.  This site does not has robots.txt file.  When
> I try to check robots.txt, site is shown as under construction and
> returning response status 200.  Could it be any reason for issue?
>
>
>
> Thanks and Regards
>
> Raj Chidara
>
>
>
>
>
>
>


Not able to crawl ich

2022-12-17 Thread Raj Chidara

Hi
  I am not able to crawl this site https://www.ich.org/.  Can any one suggest a 
solution for this.  This site does not has robots.txt file.  When I try to 
check robots.txt, site is shown as under construction and returning response 
status 200.  Could it be any reason for issue?



Thanks and Regards

Raj Chidara