[ https://issues.apache.org/jira/browse/NUTCH-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1076: ---------------------------------------- Fix Version/s: 1.7 > Solrindex has no documents following bin/nutch solrindex when using > protocol-file > --------------------------------------------------------------------------------- > > Key: NUTCH-1076 > URL: https://issues.apache.org/jira/browse/NUTCH-1076 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.3 > Environment: Ubuntu Linux 10.04 server > JDK 1.6 > Nutch 1.3 > Solr 3.1.0 > Reporter: Seth Griffin > Assignee: Markus Jelsma > Labels: nutch, protocol-file, solrindex > Fix For: 1.7 > > > Note: When using protocol-http I am able to update solr effortlessly. > To test this I have a single pdf file that I am trying to index in my urls > directory. > I execute: > bin/nutch crawl urls > Output: > solrUrl is not set, indexing will be skipped... > crawl started in: crawl-20110805151045 > rootUrlDir = urls > threads = 10 > depth = 5 > solrUrl=null > Injector: starting at 2011-08-05 15:10:45 > Injector: crawlDb: crawl-20110805151045/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: finished at 2011-08-05 15:10:48, elapsed: 00:00:02 > Generator: starting at 2011-08-05 15:10:48 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: jobtracker is 'local', generating exactly one partition. > Generator: Partitioning selected urls for politeness. > Generator: segment: crawl-20110805151045/segments/20110805151050 > Generator: finished at 2011-08-05 15:10:51, elapsed: 00:00:03 > Fetcher: Your 'http.agent.name' value should be listed first in > 'http.robots.agents' property. > Fetcher: starting at 2011-08-05 15:10:51 > Fetcher: segment: crawl-20110805151045/segments/20110805151050 > Fetcher: threads: 10 > QueueFeeder finished: total 1 records + hit by time limit :0 > fetching file:///home/nutch/nutch-1.3/runtime/local/indexdir/Altec.pdf > -finishing thread FetcherThread, activeThreads=9 > -finishing thread FetcherThread, activeThreads=8 > -finishing thread FetcherThread, activeThreads=7 > -finishing thread FetcherThread, activeThreads=6 > -finishing thread FetcherThread, activeThreads=5 > -finishing thread FetcherThread, activeThreads=4 > -finishing thread FetcherThread, activeThreads=3 > -finishing thread FetcherThread, activeThreads=2 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > -activeThreads=0 > Fetcher: finished at 2011-08-05 15:10:53, elapsed: 00:00:02 > ParseSegment: starting at 2011-08-05 15:10:53 > ParseSegment: segment: crawl-20110805151045/segments/20110805151050 > ParseSegment: finished at 2011-08-05 15:10:56, elapsed: 00:00:03 > CrawlDb update: starting at 2011-08-05 15:10:56 > CrawlDb update: db: crawl-20110805151045/crawldb > CrawlDb update: segments: [crawl-20110805151045/segments/20110805151050] > CrawlDb update: additions allowed: true > CrawlDb update: URL normalizing: true > CrawlDb update: URL filtering: true > CrawlDb update: Merging segment data into db. > CrawlDb update: finished at 2011-08-05 15:10:57, elapsed: 00:00:01 > Generator: starting at 2011-08-05 15:10:57 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: jobtracker is 'local', generating exactly one partition. > Generator: 0 records selected for fetching, exiting ... > Stopping at depth=1 - no more URLs to fetch. > LinkDb: starting at 2011-08-05 15:10:58 > LinkDb: linkdb: crawl-20110805151045/linkdb > LinkDb: URL normalize: true > LinkDb: URL filter: true > LinkDb: adding segment: > file:/home/nutch/nutch-1.3/runtime/local/crawl-20110805151045/segments/20110805151050 > LinkDb: finished at 2011-08-05 15:10:59, elapsed: 00:00:01 > crawl finished: crawl-20110805151045 > Then with a clean solr index (stats output from stats.jsp below): > searcherName : Searcher@14dd758 main > caching : true > numDocs : 0 > maxDoc : 0 > reader : > SolrIndexReader{this=1ee148b,r=ReadOnlyDirectoryReader@1ee148b,refCnt=1,segments=0} > readerDir : > org.apache.lucene.store.NIOFSDirectory@/home/solr/apache-solr-3.1.0/example/solr/data/index > lockFactory=org.apache.lucene.store.NativeFSLockFactory@987197 > indexVersion : 1312575204101 > openedAt : Fri Aug 05 15:13:24 CDT 2011 > registeredAt : Fri Aug 05 15:13:24 CDT 2011 > warmupTime : 0 > I then execute: > bin/nutch solrindex http://localhost:8983/solr/ crawl-20110805151045/crawldb/ > crawl-20110805151045/linkdb/ crawl-20110805151045/segments/* > bin/nutch output: > SolrIndexer: starting at 2011-08-05 15:15:48 > SolrIndexer: finished at 2011-08-05 15:15:50, elapsed: 00:00:01 > solr output: > Aug 5, 2011 3:15:50 PM org.apache.solr.update.DirectUpdateHandler2 commit > INFO: start > commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher <init> > INFO: Opening Searcher@15f1f9c main > Aug 5, 2011 3:15:50 PM org.apache.solr.update.DirectUpdateHandler2 commit > INFO: end_commit_flush > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm > INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main > > fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm > INFO: autowarming result for Searcher@15f1f9c main > > fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm > INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main > > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm > INFO: autowarming result for Searcher@15f1f9c main > > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm > INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main > > queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=1,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm > INFO: autowarming result for Searcher@15f1f9c main > > queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm > INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main > > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm > INFO: autowarming result for Searcher@15f1f9c main > > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.core.QuerySenderListener newSearcher > INFO: QuerySenderListener sending requests to Searcher@15f1f9c main > Aug 5, 2011 3:15:50 PM org.apache.solr.core.QuerySenderListener newSearcher > INFO: QuerySenderListener done. > Aug 5, 2011 3:15:50 PM org.apache.solr.core.SolrCore registerSearcher > INFO: [] Registered new searcher Searcher@15f1f9c main > Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher close > INFO: Closing Searcher@14dd758 main > > fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > > queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=1,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > Aug 5, 2011 3:15:50 PM org.apache.solr.update.processor.LogUpdateProcessor > finish > INFO: {commit=} 0 8 > Aug 5, 2011 3:15:50 PM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/update > params={waitSearcher=true&waitFlush=true&wt=javabin&commit=true&version=2} > status=0 QTime=8 > output from stats.jsp: > stats: > searcherName : Searcher@15f1f9c main > caching : true > numDocs : 0 > maxDoc : 0 > reader : > SolrIndexReader{this=1ee148b,r=ReadOnlyDirectoryReader@1ee148b,refCnt=1,segments=0} > readerDir : > org.apache.lucene.store.NIOFSDirectory@/home/solr/apache-solr-3.1.0/example/solr/data/index > lockFactory=org.apache.lucene.store.NativeFSLockFactory@987197 > indexVersion : 1312575204101 > openedAt : Fri Aug 05 15:15:50 CDT 2011 > registeredAt : Fri Aug 05 15:15:50 CDT 2011 > warmupTime : 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira