I am using below libraries.

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.search.Hit;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.Sort;
import org.apache.lucene.store.FSDirectory;


On Mon, Sep 7, 2009 at 1:13 PM, Katsuki
FUJISAWA<katsuki.fujisawa...@gmail.com> wrote:
> Hi,
> I am new to nutch.
> Now I am trying to do crawing from Java servlet program without using
> bin/nutch commnad.
> When nutch 0.9 index file made by main method of
> org.apache.nutch.crawl.Crawl class can be read from program.
> But when nutch 1.0 index file  made by main method of
> org.apache.nutch.crawl.Crawl class can not be read from program.
> Also read capability of index file by using luke is below.
> index file of nutch 0.9
> by bin/nutch command    readable.
> by main method of Crawl class    readable.
> index file of nutch 1.0
> by bin/nutch command    readable.
> by main method of Crawl class    unreadable.
> Does anybody know reason why?
> And give me a infomation please.
> My program code sample is below.
> *************************************************************
> FSDirectory indexDir = null;
> indexDir = FSDirectory.getDirectory( "C:\\nutch-1.0\\crawl\\index", false );
> IndexSearcher indexSearcher = new IndexSearcher( indexDir );
> List<DisplayBean> displayBeanList = new ArrayList<DisplayBean>();
> Hits hits = indexSearcher.search( new MatchAllDocsQuery());
> Iterator<Hit> i = hits.iterator();
> int cnt = 0;
> while (i.hasNext()){
>        if(cnt > 2) break;
>        Hit hit = (Hit)i.next();
>        DisplayBean displayBean = new DisplayBean();
>        displayBean.setUrl(hit.get("url"));
>        displayBean.setTitle(hit.get("title"));
>        displayBean.setTstamp(hit.get("tstamp"));
>        displayBeanList.add(displayBean);
>        cnt++;
> }
> indexSearcher.close();

Reply via email to