Tulay Muezzinoglu created NUTCH-2440:
----------------------------------------

             Summary: DbResource does not accept crawlid
                 Key: NUTCH-2440
                 URL: https://issues.apache.org/jira/browse/NUTCH-2440
             Project: Nutch
          Issue Type: Bug
          Components: REST_api
    Affects Versions: 2.3, 2.4
            Reporter: Tulay Muezzinoglu
            Priority: Critical
             Fix For: 2.4


DbResource is initiating DbReaders with null crawlids. This blocks querying 
correct table/collection if crawlid is set during fetch. 

For example in mongodb, by default all data is stored in "webpage" collection. 
Let say you set crawlid as "tech" for fetch, then all data gets stored in 
"tech_webpage" collection. But during rest call to /db end point, since you 
cannot specify crawlid, it will query "webpage" collection.

I am thinking either DBFilter can be changed to read in crawlid, or resource 
path can include crawlid. I am open to suggestions and then can make PR.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to