[ https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987602#comment-13987602 ]
Julien Nioche commented on NUTCH-1714: -------------------------------------- bq. I do not know if you have tested the patch, but it fixes the problem with last update. I did test it (hence my assertion that it did not work) but must have done something wrong, which is not surprising given that I had various patches on the code. I tried again from a clean copy of the repo and it solves the issue indeed. Thanks bq. The reason for the readdb problem is that it tries to get all fields from webpage table, and it uses WebPage._ALL_FIELDS array to achieve this. However, this array also contains __gdirty field which is used to save dirty fields of the persistent class. This field is not stored in database. Thus, when db is queried with this field, no results will be returned. Thanks for the explanation bq. In the patch I have removed __gdirty field directly from the fields sent to the query, since it is always at the first positon of the _ALL_FIELDS array. This will fix the problem. However, I will also send a mail to dev@gora and discuss if we should remove this field from persistent class' _ALL_FIELDS array. Then, we can use WebPage._ALL_FIELDS directly in here. Good idea. I will comment about the filtering on NUTCH-1674 and do more testing before I commit this patch Thanks for your work! Julien > Nutch 2.x upgrade to Gora 0.4 > ----------------------------- > > Key: NUTCH-1714 > URL: https://issues.apache.org/jira/browse/NUTCH-1714 > Project: Nutch > Issue Type: Improvement > Reporter: Alparslan Avcı > Assignee: Alparslan Avcı > Fix For: 2.3 > > Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, > NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch > > > Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the > details in this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)