Hi,

This is most likely an URL filter issue. Check all URL filters. There's also a 
test program for URL filtering. Try it out.

http://wiki.apache.org/nutch/CommandLineOptions

Cheers,

ps. Moved to user@nutch as it's more appropriate there.

> I have problems with running injector in nutch-1.4 on hadoop, same
> command with nutch-1.3 works fine. As you can see, list of URLs is
> loaded from hdfs correctly Map input records=66906 but no records are on
> map ouput. Could it be some problems with broken filtering?
> 
> ponto:(crawler)runtime/deploy>bin/nutch inject /czcrawl/db /czcrawl/seeds
> 11/10/13 17:56:25 INFO crawl.Injector: Injector: starting at 2011-10-13
> 17:56:25
> 11/10/13 17:56:25 INFO crawl.Injector: Injector: crawlDb: /czcrawl/db
> 11/10/13 17:56:25 INFO crawl.Injector: Injector: urlDir: /czcrawl/seeds
> 11/10/13 17:56:25 INFO crawl.Injector: Injector: Converting injected
> urls to crawl db entries.
> 11/10/13 17:56:28 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 11/10/13 17:56:29 INFO mapred.JobClient: Running job: job_201110091645_0032
> 11/10/13 17:56:30 INFO mapred.JobClient:  map 0% reduce 0%
> 11/10/13 17:56:52 INFO mapred.JobClient:  map 50% reduce 0%
> 11/10/13 17:56:53 INFO mapred.JobClient:  map 100% reduce 0%
> 11/10/13 17:57:05 INFO mapred.JobClient:  map 100% reduce 100%
> 11/10/13 17:57:10 INFO mapred.JobClient: Job complete:
> job_201110091645_0032 11/10/13 17:57:10 INFO mapred.JobClient: Counters:
> 27
> 11/10/13 17:57:10 INFO mapred.JobClient:   Job Counters
> 11/10/13 17:57:10 INFO mapred.JobClient:     Launched reduce tasks=1
> 11/10/13 17:57:10 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=20455
> 11/10/13 17:57:10 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     Total time spent by all
> maps waiting after reserving slots (ms)=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     Rack-local map tasks=1
> 11/10/13 17:57:10 INFO mapred.JobClient:     Launched map tasks=2
> 11/10/13 17:57:10 INFO mapred.JobClient:     Data-local map tasks=1
> 11/10/13 17:57:10 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10356
> 11/10/13 17:57:10 INFO mapred.JobClient:   File Input Format Counters
> 11/10/13 17:57:10 INFO mapred.JobClient:     Bytes Read=1283144
> 11/10/13 17:57:10 INFO mapred.JobClient:   File Output Format Counters
> 11/10/13 17:57:10 INFO mapred.JobClient:     Bytes Written=86
> 11/10/13 17:57:10 INFO mapred.JobClient:   FileSystemCounters
> 11/10/13 17:57:10 INFO mapred.JobClient:     FILE_BYTES_READ=6
> 11/10/13 17:57:10 INFO mapred.JobClient:     HDFS_BYTES_READ=1283358
> 11/10/13 17:57:10 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=89486
> 11/10/13 17:57:10 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
> 11/10/13 17:57:10 INFO mapred.JobClient:   Map-Reduce Framework
> 11/10/13 17:57:10 INFO mapred.JobClient:     Map output materialized
> bytes=12
> 11/10/13 17:57:10 INFO mapred.JobClient:     Map input records=66906
> 11/10/13 17:57:10 INFO mapred.JobClient:     Reduce shuffle bytes=6
> 11/10/13 17:57:10 INFO mapred.JobClient:     Spilled Records=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     Map output bytes=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     Map input bytes=1280141
> 11/10/13 17:57:10 INFO mapred.JobClient:     Combine input records=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     SPLIT_RAW_BYTES=214
> 11/10/13 17:57:10 INFO mapred.JobClient:     Reduce input records=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     Reduce input groups=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     Combine output records=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     Reduce output records=0
> 11/10/13 17:57:10 INFO mapred.JobClient:     Map output records=0

Reply via email to