[ https://issues.apache.org/jira/browse/NUTCH-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604682#action_12604682 ]
Michael Gottesman commented on NUTCH-634: ----------------------------------------- So actually, I remembered to make it have an ASF, but forgot to redo the diff =p. Sorry. But it looks like Lincoln's patch suffices. Also here is a quick rundown on your comments. 1. I just put in FileOnlySequenceFileOutputFormat because it was the last bug I was getting. I was a little annoyed at the time so I just stuck it in. There is actually a native hadoop way of doing this via a static class. I have seen it before in the code, I just dont remember exactly where. 2. About the code indenting. I was screwing with my emacs trying to get it to do that. But I figured you were more interested in the code and I could deal with that latter =p. 3. Generics easy fix =). 4. The reason that I did the shallowcopy thing even with the metadata, it was not clear to me at the time (I remember being distinctly very tired) since it is of type byte[] if it would be considered a native type or an object. Now of course, I realize that I was really smoking something there... but thats besides the point =p. 5. The IndexDoc.copyConstructor() was just put in because I was not sure if a deep clone would be needed or not. So in sum all of what you suggest should be easy changes. =). I will redownload the trunk and do the svn from the trunk, and correct those points. > Patch - Nutch - Hadoop 0.17.0 > ----------------------------- > > Key: NUTCH-634 > URL: https://issues.apache.org/jira/browse/NUTCH-634 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Michael Gottesman > Assignee: Andrzej Bialecki > Fix For: 0.9.0 > > Attachments: diff, hadoop-0.17.patch > > > This is a patch so that Nutch can be used with Hadoop 0.17.0. The patch is > located at http://pastie.org/212001 > The patch compiles and passes all current Nutch unit tests. > I have tested that the crawler side of Nutch (i.e. inject, generate, fetch, > parse, merge w/crawldb) definetly works, but have not tested the lucene > indexing part. It might work, but it might not. > *NOTE* - the two main bugs that had to be overcome were not noticed by any of > the unit tests. The bugs only came up during actual testing. The bugs were: > 1. Changes to the Hadoop Iterator > 2. Addition of Serialization to MapReduce Framework -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.