I am using below libraries. import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.search.Hit; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.Sort; import org.apache.lucene.store.FSDirectory;
Fujisawa On Mon, Sep 7, 2009 at 1:13 PM, Katsuki FUJISAWA<katsuki.fujisawa...@gmail.com> wrote: > Hi, > > I am new to nutch. > Now I am trying to do crawing from Java servlet program without using > bin/nutch commnad. > When nutch 0.9 index file made by main method of > org.apache.nutch.crawl.Crawl class can be read from program. > But when nutch 1.0 index file made by main method of > org.apache.nutch.crawl.Crawl class can not be read from program. > > > Also read capability of index file by using luke is below. > > index file of nutch 0.9 > by bin/nutch command readable. > by main method of Crawl class readable. > > index file of nutch 1.0 > by bin/nutch command readable. > by main method of Crawl class unreadable. > > > Does anybody know reason why? > And give me a infomation please. > > My program code sample is below. > > ************************************************************* > FSDirectory indexDir = null; > > indexDir = FSDirectory.getDirectory( "C:\\nutch-1.0\\crawl\\index", false ); > IndexSearcher indexSearcher = new IndexSearcher( indexDir ); > > List<DisplayBean> displayBeanList = new ArrayList<DisplayBean>(); > > Hits hits = indexSearcher.search( new MatchAllDocsQuery()); > > Iterator<Hit> i = hits.iterator(); > int cnt = 0; > while (i.hasNext()){ > if(cnt > 2) break; > > Hit hit = (Hit)i.next(); > DisplayBean displayBean = new DisplayBean(); > displayBean.setUrl(hit.get("url")); > displayBean.setTitle(hit.get("title")); > displayBean.setTstamp(hit.get("tstamp")); > > displayBeanList.add(displayBean); > > cnt++; > } > > indexSearcher.close(); >