[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505448
]
Doğacan Güney commented on NUTCH-443:
-
Chris, did you get a chance to look at this? If you are busy, I can assign
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505501
]
Chris A. Mattmann commented on NUTCH-443:
-
Doğacan,
Whoops :) This one kind of fell off the radar screen.
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495696
]
Doğacan Güney commented on NUTCH-443:
-
I am not sure I follow you Andrzej. My patch already does a very similar
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495797
]
Andrzej Bialecki commented on NUTCH-443:
-
Indeed... I forgot that we need crawl_parse to collect new
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495357
]
Doğacan Güney commented on NUTCH-443:
-
Well... That's embarrassing. It seems I forgot to include the necessary
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476600
]
Andrzej Bialecki commented on NUTCH-443:
-
Almost there ... ParseResult seemed to tidy up this patch quite a
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476611
]
Doğacan Güney commented on NUTCH-443:
-
* you create the fake CrawlDatum-s in ParseOutputFormat, and then set
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476297
]
Andrzej Bialecki commented on NUTCH-443:
-
Overall the idea of this improvement looks very useful, but I'm -1
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476361
]
nutch.newbie commented on NUTCH-443:
Hi:
We were really counting on this patch that it will make it to trunk as
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473383
]
Doğacan Güney commented on NUTCH-443:
-
Regarding the ObjectWritable: since in this case all data is composed of
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473114
]
Andrzej Bialecki commented on NUTCH-443:
-
The contract for ParseUtil.getFirstParseEntry() seems unclear -
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473129
]
Doğacan Güney commented on NUTCH-443:
-
Andrzej:
Thanks for taking the time to review this.
The contract for
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473141
]
Andrzej Bialecki commented on NUTCH-443:
-
Didn't know this, will change this too. (Why is Nutch not using
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473184
]
Doğacan Güney commented on NUTCH-443:
-
Andrzej:
Why does fetcher need to synchronize? Why does the order fetcher
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472669
]
nutch.newbie commented on NUTCH-443:
Chris:
I been testing NUTCH-444 and NUTCH-443 lately. Renaud and Dogacan
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472733
]
Renaud Richardet commented on NUTCH-443:
hi All,
Glad to see that this patch is moving forward :-)
I have
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472821
]
Doug Cutting commented on NUTCH-443:
this patch in some places removes the log guards
Most of the log guards
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471991
]
nutch.newbie commented on NUTCH-443:
Dogacan:
It works rather ok, But hen I changed the parse-plugins.xml a bit
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471998
]
nutch.newbie commented on NUTCH-443:
Hi..
After swaping the parse-plugin.xml i.e. the following way .. (and
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471620
]
Dogacan Güney commented on NUTCH-443:
-
This is pretty much the merge of our work(except parse-rss, it kept
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471703
]
nutch.newbie commented on NUTCH-443:
I tried the patch with about 100 rss feed. Some problems
1. atom+xml
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471743
]
nutch.newbie commented on NUTCH-443:
After doing some quick research seems like feedparser dont do atom 1.0. The
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471747
]
Gal Nitzan commented on NUTCH-443:
--
Actually, I have tested Rome after feedparser failed with OutOfMemoy. Rome has
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471754
]
nutch.newbie commented on NUTCH-443:
Gal:
Thanks for the feedback and the test you have done. If Nutch is going
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471780
]
Chris A. Mattmann commented on NUTCH-443:
-
Nutch Newbie,
What exactly do you mean when you mention Apache
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471806
]
nutch.newbie commented on NUTCH-443:
Chris:
Frankly my comments are regarding feedparser and I must say I am
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471857
]
Dogacan Güney commented on NUTCH-443:
-
nutch.newbie:
I fail to see what the problem is. If feedparser doesn't
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471878
]
Renaud Richardet commented on NUTCH-443:
Nutch Newbie, Gal, Chris
It's great that you discuss alternative
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471260
]
Dogacan Güney commented on NUTCH-443:
-
Ok, this is the second attempt(sorry that I am sending patches in a
29 matches
Mail list logo