Which patch are you referring to? The patch I just added *only* addressed the index/segments confusion and was created by executing 'svn diff' from the trunk root.
-lincoln -- lincolnritter.com On Thu, Jun 12, 2008 at 3:32 PM, Andrzej Bialecki (JIRA) <[EMAIL PROTECTED]> wrote: > > [ > https://issues.apache.org/jira/browse/NUTCH-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604667#action_12604667 > ] > > Andrzej Bialecki commented on NUTCH-634: > ----------------------------------------- > > The attached diff is not a valid patch created with 'svn diff'. Please create > a patch using 'svn diff', from the top of the source tree of Nutch trunk/. > > I'm not sure whether the FileOnlySequenceFileOutputFormat is the right answer > to the problem of _logs directories ... I think the existence of these > directories is caused by a setting in Hadoop contiguration, > hadoop.job.history.user.location, which defaults to the output directory > (which sounds awfully strange to me to use this as a default!). Further > investigation is needed before we mess up things on our side. ;) > > The code formatting on these two new files and in some other places doesn't > conform to the Nutch formatting, which is basically the Sun style with 2 > space indents. Please note also that you use different curly brace placement > than the Sun style advises. > > Generics on the CrawlDbReducer are too general, instead of > > bq. implements > Reducer<WritableComparable,Writable,WritableComparable,Writable> > > it should be > > bq. implements Reducer<Text, CrawlDatum, Text, CrawlDatum> > > Similar tightening should be done in other places where you added generics. > > The CrawlDatum.shallowCopy() method is dangerous IMHO - newly created copies > still contain references to the same metaData instance, which may be modified > any time by the framework as you iterate through the input items. We should > do a deep clone using WritableUtils.clone(). > > IndexDoc.copyConstructor() should be replaced by a deep clone(). > > > > > >> Patch - Nutch - Hadoop 0.17.0 >> ----------------------------- >> >> Key: NUTCH-634 >> URL: https://issues.apache.org/jira/browse/NUTCH-634 >> Project: Nutch >> Issue Type: Improvement >> Affects Versions: 0.9.0 >> Reporter: Michael Gottesman >> Assignee: Andrzej Bialecki >> Fix For: 0.9.0 >> >> Attachments: diff, hadoop-0.17.patch >> >> >> This is a patch so that Nutch can be used with Hadoop 0.17.0. The patch is >> located at http://pastie.org/212001 >> The patch compiles and passes all current Nutch unit tests. >> I have tested that the crawler side of Nutch (i.e. inject, generate, fetch, >> parse, merge w/crawldb) definetly works, but have not tested the lucene >> indexing part. It might work, but it might not. >> *NOTE* - the two main bugs that had to be overcome were not noticed by any >> of the unit tests. The bugs only came up during actual testing. The bugs >> were: >> 1. Changes to the Hadoop Iterator >> 2. Addition of Serialization to MapReduce Framework > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >