[ https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994098#comment-13994098 ]
Lewis John McGibbney edited comment on NUTCH-1714 at 5/10/14 1:19 AM: ---------------------------------------------------------------------- Hi [~alparslan.avci] and [~jnioche]: some comments 1. bq. About this problem, I think it is not about gora-hbase-0.4 and exists from the beginning of Gora project for HBase. There is nothing 'bad' here, what is wrong will become clear if you look into the following code https://github.com/apache/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/query/HBaseScannerResult.java#L65 2. bq. The code has changed since the last patch and we are now getting : [~jnioche], this is addressed in my new patch... must have been a trivial mistake/revert on [~alparslan.avci]'s patch :) 3. [~jnioche] bq. when the parser fails. This is due to status.getArgs() returning null. I've now hopefully fixed this in my new patch. 4. [~jnioche] bq. WebTableReader should also remove the dirty field in processDumpJob {code:title=WebTableReader.java|borderStyle=solid} WebPage page = new WebPage(); ArrayList<String> queryFields = new ArrayList<String>(); for (int i = 1; i < WebPage._ALL_FIELDS.length; i++) { queryFields.add(page.getSchema().getFields().toString()); } query.setFields((String[]) queryFields.toArray()); {code} I am not particularly happy with this (and I am actively testing it so still have my own comments to pass on) if you can suggest a better way to remove the Field at position 0 in the array then we can go with that. I also don't really like the cast within the call to query.setFields. WDYT? [~jnioche], regarding your most recent _observations_, I will also add to these once I've seen my crawler(s) running for a bit longer over a number of different scenarios. 5. Finally (for now) I've skipped the failing tests in TestGoraStorage... this is due to to problems with MemStore which we are actively working on for 0.5 release. Thanks for the comments, these are excellent and this is not particularly easy as Gora 0.4 was a MAJOR release with many changes over all back ends. Persistency is something we need to get right so I don't mind taking time to get this right. was (Author: lewismc): Hi [~alparslan.avci] and [~jnioche]: some comments 1. bq. About this problem, I think it is not about gora-hbase-0.4 and exists from the beginning of Gora project for HBase. There is nothing 'bad' here, what is wrong will become clear if you look into the following code https://github.com/apache/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/query/HBaseScannerResult.java#L65 2. bq. The code has changed since the last patch and we are now getting : [~jnioche], this is addressed in my new patch... must have been a trivial mistake/revert on [~alparslan.avci]'s patch :) 3. [~jnioche] bq. when the parser fails. This is due to status.getArgs() returning null. I've now hopefully fixed this in my new patch. 4. [~jnioche] bq. WebTableReader should also remove the dirty field in processDumpJob {code:title=WebTableReader.java|borderStyle=solid} WebPage page = new WebPage(); ArrayList<String> queryFields = new ArrayList<String>(); for (int i = 1; i < WebPage._ALL_FIELDS.length; i++) { queryFields.add(page.getSchema().getFields().toString()); } query.setFields((String[]) queryFields.toArray()); {code} I am not particularly happy with this (and I am actively testing it so still have my own comments to pass on) if you can suggest a better way to remove the Field at position 0 in the array then we can go with that. I also don't really like the cast within the call to query.setFields. WDYT? [~jnioche], regarding your most recent _observations_, I will also add to these once I've seen my crawler(s) running for a bit longer over a number of different scenarios. Finally (for now) I've skipped the failing tests in TestGoraStorage... this is due to to problems with MemStore which we are actively working on for 0.5 release. Thanks for the comments, these are excellent and this is not particularly easy as Gora 0.4 was a MAJOR release with many changes over all back ends. Persistency is something we need to get right so I don't mind taking time to get this right. > Nutch 2.x upgrade to Gora 0.4 > ----------------------------- > > Key: NUTCH-1714 > URL: https://issues.apache.org/jira/browse/NUTCH-1714 > Project: Nutch > Issue Type: Improvement > Reporter: Alparslan Avcı > Assignee: Alparslan Avcı > Fix For: 2.3 > > Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, > NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch > > > Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the > details in this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)