[jira] Updated: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement
[ http://issues.apache.org/jira/browse/NUTCH-246?page=all ] Stefan Groschupf updated NUTCH-246: --- Attachment: injectWithCurTimeMapper.patch setFetchTime moved to Mapper. > segment size is never as big as topN or crawlDB size in a distributed > deployement > - > > Key: NUTCH-246 > URL: http://issues.apache.org/jira/browse/NUTCH-246 > Project: Nutch > Type: Bug > Versions: 0.8-dev > Reporter: Stefan Groschupf > Priority: Minor > Fix For: 0.8-dev > Attachments: injectWithCurTime.patch, injectWithCurTimeMapper.patch > > I didn't reopen NUTCH-136 since it is may related to the hadoop split. > I tested this on two different deployement (with 10 ttrackers + 1 jobtracker > and 9 ttracks and 1 jobtracker). > Defining map and reduce task number in a mapred-default.xml does not solve > the problem. (is in nutch/conf on all boxes) > We verified that it is not a problem of maximum urls per hosts and also not > a problem of the url filter. > Looks like the first job of the Generator (Selector) already got to less > entries to process. > May be this is somehow releasted to split generation or configuration inside > the distributed jobtracker since it runs in a different jvm as the jobclient. > However we was not able to find the source for this problem. > I think that should be fixed before publishing a nutch 0.8. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [ot] binary subversion diffs
Subversion basically uses plain diff so I believe what you ask for isn't possible. But if somebody knows otherwise I'd also appreciate a note. D. Stefan Groschupf wrote: Hi, does any body know how to do svn diff's that contains binary content, like jars or images? I was not able to find any useful information in the documentation or internet. Thanks for any hints. Stefan - blog: http://www.find23.org company: http://www.media-style.com
[ot] binary subversion diffs
Hi, does any body know how to do svn diff's that contains binary content, like jars or images? I was not able to find any useful information in the documentation or internet. Thanks for any hints. Stefan - blog: http://www.find23.org company: http://www.media-style.com
Re: 0.8 release?
It seems to be identical Piotr, thanks. D. Piotr Kosiorowski wrote: I had problems with DOS/Unix new lines and some (still unsolved) environment settings on my linux box - I will try to solve it. Anyway I was able to apply the patch on Cygwin. Could you please have a look at it so we will be sure I have not applied it wrongly (I think it is correct but I did it so many times that I want to cross check). Regards Piotr Dawid Weiss wrote: What kind of problems? If you need something, let me know. D. Piotr Kosiorowski wrote: I got some problems while applying Dawid clustering patch (my linux environment looks not to be setu correctly) - but I switched to cygwin and it looks ok. I will try to commit it today/tommorow. Regards Piotr On 4/12/06, Chris Mattmann <[EMAIL PROTECTED]> wrote: Hi Guys, Any progress on the 0.8 release? Was there any resolution about which JIRA issues to complete before the 0.8 release? We had a bit of conversation there and some ideas, but no definitive answer... Thanks for your help, and sorry to pester ;) Cheers, Chris __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
Re: Content-Type inconsistency?
I would like to come back on this issue: The Content object holds two content-types: 1. The raw content-type from the protocol layer (http header in case of http) in the Content's metadata 2. The guessed content-type in a private field content-type. When a ParseData object is created, it takes only the Content's metadata. So, the ParseData can only access the raw content type and not the one guessed. What I suggest is : 1. add a content-type parameter in the ParseData constructors (so that Parsers can pass the guessed content-type to ParseData). 2. The Content object stores the guessed content-type in it's metadata in a special attribute named for instance GUESSED_CONTENT_TYPE, so that the ParseData can access it I think 1. is really cleanest way to implement this, but there is a lot of code impacted => all the parsers. Solution 2. have no impact on APIs, so the code changes are very small. Suggestions? Comments? Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
Re: 0.8 release?
I had problems with DOS/Unix new lines and some (still unsolved) environment settings on my linux box - I will try to solve it. Anyway I was able to apply the patch on Cygwin. Could you please have a look at it so we will be sure I have not applied it wrongly (I think it is correct but I did it so many times that I want to cross check). Regards Piotr Dawid Weiss wrote: What kind of problems? If you need something, let me know. D. Piotr Kosiorowski wrote: I got some problems while applying Dawid clustering patch (my linux environment looks not to be setu correctly) - but I switched to cygwin and it looks ok. I will try to commit it today/tommorow. Regards Piotr On 4/12/06, Chris Mattmann <[EMAIL PROTECTED]> wrote: Hi Guys, Any progress on the 0.8 release? Was there any resolution about which JIRA issues to complete before the 0.8 release? We had a bit of conversation there and some ideas, but no definitive answer... Thanks for your help, and sorry to pester ;) Cheers, Chris __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
[jira] Commented: (NUTCH-245) DTD for plugin.xml configuration files
[ http://issues.apache.org/jira/browse/NUTCH-245?page=comments#action_12374339 ] Jerome Charron commented on NUTCH-245: -- I would prefer to change the "ugly" parts of the DTD now (before a future 1.0) and suggest to change it to something like (and to change the plugin.xml and Plugin Manifest Reader too): > DTD for plugin.xml configuration files > -- > > Key: NUTCH-245 > URL: http://issues.apache.org/jira/browse/NUTCH-245 > Project: Nutch > Type: New Feature > Components: fetcher, indexer, ndfs, searcher, web gui > Versions: 0.7.2, 0.7.1, 0.7, 0.6, 0.8-dev > Environment: Power PC Dual Processor 2.0 Ghz, Mac OS X 10.4, although > improvement is independent of environment > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Priority: Minor > Attachments: NUTCH-245.Mattmann.patch.txt > > Currently, the plugin.xml file does not have a DTD or XML Schema associated > with it, and most people just go look at an existing plugin's plugin.xml file > to determine what are the allowable elements, etc. There should be an > explicit plugin DTD file that describes the plugin.xml file. I'll look at the > code and attach a plugin.dtd file for the Nutch conf directory later today. > This way, people can use the DTD file to automatically (using tools such as > XMLSpy) generate plugin.xml files that can then be validated. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: 0.8 release?
What kind of problems? If you need something, let me know. D. Piotr Kosiorowski wrote: I got some problems while applying Dawid clustering patch (my linux environment looks not to be setu correctly) - but I switched to cygwin and it looks ok. I will try to commit it today/tommorow. Regards Piotr On 4/12/06, Chris Mattmann <[EMAIL PROTECTED]> wrote: Hi Guys, Any progress on the 0.8 release? Was there any resolution about which JIRA issues to complete before the 0.8 release? We had a bit of conversation there and some ideas, but no definitive answer... Thanks for your help, and sorry to pester ;) Cheers, Chris __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
haddoop
Why ndfs renamed to hadoop ? Why sub project with mapred is named by hadoop ? ;-)