CrawlDbMerger: wrong computation of last fetch time
---
Key: NUTCH-532
URL: https://issues.apache.org/jira/browse/NUTCH-532
Project: Nutch
Issue Type: Bug
Reporter: Emmanuel Joke
[
https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-532:
Attachment: NUTCH-532.patch
Patch provided.
CrawlDbMerger: wrong computation of last fetch time
[
https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-532:
Attachment: (was: NUTCH-532.patch)
CrawlDbMerger: wrong computation of last fetch time
[
https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-532:
Attachment: NUTCH-532.patch
CrawlDbMerger: wrong computation of last fetch time
LinkDbMerger: url normlaized is not updated in the key and inlinks list
---
Key: NUTCH-533
URL: https://issues.apache.org/jira/browse/NUTCH-533
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-533:
Attachment: NUTCH-533.patch
Patch provided
LinkDbMerger: url normlaized is not updated in the key
[
https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516357
]
Doğacan Güney commented on NUTCH-530:
-
Ehm, I am not sure about this... After this, we call updateDbScore twice,
[
https://issues.apache.org/jira/browse/NUTCH-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516358
]
Emmanuel Joke commented on NUTCH-526:
-
Could you please wait again few days ?
I would like to wait for a
[
https://issues.apache.org/jira/browse/NUTCH-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-531:
Attachment: NUTCH-531-draft.patch
I agree with you. IMHO, a simple change to getContentType should
[
https://issues.apache.org/jira/browse/NUTCH-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516362
]
Doğacan Güney commented on NUTCH-533:
-
Looks good to me.
+1
LinkDbMerger: url normlaized is not updated in the
[
https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516364
]
Doğacan Güney commented on NUTCH-532:
-
Does this calculation:
res.getFetchTime() -
[
https://issues.apache.org/jira/browse/NUTCH-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516365
]
Doğacan Güney commented on NUTCH-514:
-
Since no one commented, I am assuming that no one wants to see 404 and
[
https://issues.apache.org/jira/browse/NUTCH-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516366
]
Doğacan Güney commented on NUTCH-528:
-
This is my personal nit, but the cli options look weird. Why not something
[
https://issues.apache.org/jira/browse/NUTCH-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516367
]
Doğacan Güney commented on NUTCH-529:
-
Could you also add a junit test case? (actually, since NodeWalker is used
[
https://issues.apache.org/jira/browse/NUTCH-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516408
]
Andrzej Bialecki commented on NUTCH-533:
-
+1. Please fix the typo (present also in the original file): empy
[
https://issues.apache.org/jira/browse/NUTCH-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516428
]
Andrzej Bialecki commented on NUTCH-514:
-
+1 we're only humans with 24 hours in a day .. ;)
Actually, this
[
https://issues.apache.org/jira/browse/NUTCH-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney closed NUTCH-514.
---
Resolved and committed.
Indexer should only index pages with fetch status SUCCESS
[
https://issues.apache.org/jira/browse/NUTCH-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney resolved NUTCH-514.
-
Resolution: Fixed
Assignee: Doğacan Güney
Committed in rev. 561092.
Indexer should only
[
https://issues.apache.org/jira/browse/NUTCH-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-529:
Summary: NodeWalker.skipChildren doesn't work for more than 1 child. (was:
NodeWalker.skipChildren
[
https://issues.apache.org/jira/browse/NUTCH-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-533:
Summary: LinkDbMerger: url normalized is not updated in the key and inlinks
list (was:
[
https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516602
]
Emmanuel Joke commented on NUTCH-530:
-
I'm sure to follow your point regarding the outlinks number.
I don't
[
https://issues.apache.org/jira/browse/NUTCH-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516613
]
Hudson commented on NUTCH-514:
--
Integrated in Nutch-Nightly #166 (See
[
https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516618
]
Emmanuel Joke commented on NUTCH-532:
-
res.getFetchTime() - Math.round(res.getFetchInterval() * 1000d); always
23 matches
Mail list logo