[ 
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994098#comment-13994098
 ] 

Lewis John McGibbney edited comment on NUTCH-1714 at 5/10/14 1:19 AM:
----------------------------------------------------------------------

Hi [~alparslan.avci] and [~jnioche]: some comments
1.
bq. About this problem, I think it is not about gora-hbase-0.4 and exists from 
the beginning of Gora project for HBase.
There is nothing 'bad' here, what is wrong will become clear if you look into 
the following code
https://github.com/apache/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/query/HBaseScannerResult.java#L65

2.
bq. The code has changed since the last patch and we are now getting : 
[~jnioche], this is addressed in my new patch... must have been a trivial 
mistake/revert on [~alparslan.avci]'s patch :)

3. 
[~jnioche]
bq. when the parser fails. This is due to status.getArgs() returning null. 
I've now hopefully fixed this in my new patch. 

4. 
[~jnioche]
bq. WebTableReader should also remove the dirty field in processDumpJob 
{code:title=WebTableReader.java|borderStyle=solid}
    WebPage page = new WebPage();
    ArrayList<String> queryFields = new ArrayList<String>();
    for (int i = 1; i < WebPage._ALL_FIELDS.length; i++) {
      queryFields.add(page.getSchema().getFields().toString());
    }
    query.setFields((String[]) queryFields.toArray());
{code}
I am not particularly happy with this (and I am actively testing it so still 
have my own comments to pass on) if you can suggest a better way to remove the 
Field at position 0 in the array then we can go with that. I also don't really 
like the cast within the call to query.setFields. WDYT?

[~jnioche], regarding your most recent _observations_, I will also add to these 
once I've seen my crawler(s) running for a bit longer over a number of 
different scenarios.

5.
Finally (for now) I've skipped the failing tests in TestGoraStorage... this is 
due to to problems with MemStore which we are actively working on for 0.5 
release.

Thanks for the comments, these are excellent and this is not particularly easy 
as Gora 0.4 was a MAJOR release with many changes over all back ends. 
Persistency is something we need to get right so I don't mind taking time to 
get this right.



was (Author: lewismc):
Hi [~alparslan.avci] and [~jnioche]: some comments
1.
bq. About this problem, I think it is not about gora-hbase-0.4 and exists from 
the beginning of Gora project for HBase.
There is nothing 'bad' here, what is wrong will become clear if you look into 
the following code
https://github.com/apache/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/query/HBaseScannerResult.java#L65

2.
bq. The code has changed since the last patch and we are now getting : 
[~jnioche], this is addressed in my new patch... must have been a trivial 
mistake/revert on [~alparslan.avci]'s patch :)

3. 
[~jnioche]
bq. when the parser fails. This is due to status.getArgs() returning null. 
I've now hopefully fixed this in my new patch. 

4. 
[~jnioche]
bq. WebTableReader should also remove the dirty field in processDumpJob 
{code:title=WebTableReader.java|borderStyle=solid}
    WebPage page = new WebPage();
    ArrayList<String> queryFields = new ArrayList<String>();
    for (int i = 1; i < WebPage._ALL_FIELDS.length; i++) {
      queryFields.add(page.getSchema().getFields().toString());
    }
    query.setFields((String[]) queryFields.toArray());
{code}
I am not particularly happy with this (and I am actively testing it so still 
have my own comments to pass on) if you can suggest a better way to remove the 
Field at position 0 in the array then we can go with that. I also don't really 
like the cast within the call to query.setFields. WDYT?

[~jnioche], regarding your most recent _observations_, I will also add to these 
once I've seen my crawler(s) running for a bit longer over a number of 
different scenarios.

Finally (for now) I've skipped the failing tests in TestGoraStorage... this is 
due to to problems with MemStore which we are actively working on for 0.5 
release.

Thanks for the comments, these are excellent and this is not particularly easy 
as Gora 0.4 was a MAJOR release with many changes over all back ends. 
Persistency is something we need to get right so I don't mind taking time to 
get this right.


> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>
>                 Key: NUTCH-1714
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1714
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Alparslan Avcı
>            Assignee: Alparslan Avcı
>             Fix For: 2.3
>
>         Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, 
> NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the 
> details in this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to