[jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.0

Michael Gottesman (JIRA) Thu, 12 Jun 2008 16:07:45 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604682#action_12604682
 ]


Michael Gottesman commented on NUTCH-634:
-----------------------------------------

So actually, I remembered to make it have an ASF, but forgot to redo the diff 
=p. Sorry. But it looks like Lincoln's patch suffices. Also here is a quick 
rundown on your comments.

1. I just put in FileOnlySequenceFileOutputFormat because it was the last bug I 
was getting. I was a little annoyed at the time so I just stuck it in. There is 
actually a native hadoop way of doing this via a static class. I have seen it 
before in the code, I just dont remember exactly where.

2. About the code indenting. I was screwing with my emacs trying to get it to 
do that. But I figured you were more interested in the code and I could deal 
with that latter =p.

3. Generics easy fix =).

4. The reason that I did the shallowcopy thing even with the metadata, it was 
not clear to me at the time (I remember being distinctly very tired) since it 
is of type byte[] if it would be considered a native type or an object. Now of 
course, I realize that I was really smoking something there... but thats 
besides the point =p.

5. The IndexDoc.copyConstructor() was just put in because I was not sure if a 
deep clone would be needed or not.

So in sum all of what you suggest should be easy changes. =). I will redownload 
the trunk and do the svn from the trunk, and correct those points.

> Patch - Nutch - Hadoop 0.17.0
> -----------------------------
>
>                 Key: NUTCH-634
>                 URL: https://issues.apache.org/jira/browse/NUTCH-634
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Michael Gottesman
>            Assignee: Andrzej Bialecki 
>             Fix For: 0.9.0
>
>         Attachments: diff, hadoop-0.17.patch
>
>
> This is a patch so that Nutch can be used with Hadoop 0.17.0. The patch is 
> located at http://pastie.org/212001
> The patch compiles and passes all current Nutch unit tests.
> I have tested that the crawler side of Nutch (i.e. inject, generate, fetch, 
> parse, merge w/crawldb) definetly works, but have not tested the lucene 
> indexing part. It might work, but it might not. 
> *NOTE* - the two main bugs that had to be overcome were not noticed by any of 
> the unit tests. The bugs only came up during actual testing. The bugs were:
> 1. Changes to the Hadoop Iterator
> 2. Addition of Serialization to MapReduce Framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.0

Reply via email to