Hi, I wrote a custom FetchSchedule.: *public class CustomDefaultFetchSchedule extends DefaultFetchSchedule
Than I try to use Metada Field. But this field is always null: *ByteBuffer blang = page.getFromMetadata(new Utf8(Metadata.LANGUAGE)); For this reason I Override setField method in my custom CustomDefaultFetchSchedule class: @Override public Set<WebPage.Field> getFields() { FIELDS.addAll(super.getFields()); FIELDS.add(WebPage.Field.METADATA); return FIELDS; } But there is nothing. And metadata field is still null. When I add metadata field in GeneratorJob class , eveything is okey and metadata field is not empty: static { FIELDS.add(WebPage.Field.FETCH_TIME); FIELDS.add(WebPage.Field.SCORE); FIELDS.add(WebPage.Field.STATUS); FIELDS.add(WebPage.Field.METADATA); } I don't want to change nutch own source. How can I use METADATA field in a custom fetchSchedule class? Nutch 2.1 / HBASE On Mon, Apr 1, 2013 at 4:57 PM, Canan GİRGİN <canankara...@gmail.com> wrote: > Hi All, > > I want to crawl only sites that their language is XXX. I wrote a > ParseFilter for detect the language of sites and put data metadata column. > I can prevent crawling outlinks, which site is none XXX language, with this > plugin. But I can not prevent to re-crawling of main page. Is there any > filter can I use? Is it possible with any FetchSchedule?(I need to use > metadata column data for filtering url) > > Not: Content-Language or Accept-Language is not suitable for my case. > > Nutch2.1/Hbase >