Re: [Fwd: Crawler submits forms?]
Doug Cutting wrote: Andrzej Bialecki wrote: Please also don't forget that the trunk/ will soon be invaded by the code from mapred, I guess some time around the middle of January (Doug?) Thinking about this more, perhaps we should do it sooner. There's already a branch for 0.7.x releases, so what point is there in not merging mapred to trunk now? We'd have fewer branches to maintain, and start getting nightly builds of mapred. Folks who require 0.7.x compatibility can continue to use (and patch) the 0.7.x branch. Objections? Doug +1. I think this is good time to merge now as the mapred is fully usable. -- Sami Siren
Re: [Fwd: Crawler submits forms?]
Andrzej Bialecki wrote: Yes, we just need to make sure that all important bits from trunk are on the 0.7 branch, before we start. I will sync mapred with the trunk prior to the merge, so we should still be able to get anything we need after mapred is merged back to trunk. BTW, we're pretty closely following the recommendations in: http://svnbook.red-bean.com/en/1.1/ch04s04.html#svn-ch-4-sect-4.4 The mapred branch is a 'feature' branch. At the end of this section they describe how to merge a feature branch back into the trunk. Doug
Re: [Fwd: Crawler submits forms?]
Doug Cutting wrote: Andrzej Bialecki wrote: I agree. I just thought that we would prepare the relase based on the code in trunk/ , and in that case we would like to wait with the merge before we do the release. My definition of trunk is that it should be where the majority of development happens. It is what we should build nightly, etc. Major versions should be branched from trunk, and point releases created as tags from the version branches. A development branch (e.g., mapred) should be used when a few developers need to make radical changes and do not want to disrupt other developers. So if most developers are now comfortable working on mapred, then we no longer need to keep it in a branch. And we already have a version branch for 0.7, so we don't need to reserve trunk for that. Does this analysis sound right? Yes, we just need to make sure that all important bits from trunk are on the 0.7 branch, before we start. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [Fwd: Crawler submits forms?]
Andrzej Bialecki wrote: I agree. I just thought that we would prepare the relase based on the code in trunk/ , and in that case we would like to wait with the merge before we do the release. My definition of trunk is that it should be where the majority of development happens. It is what we should build nightly, etc. Major versions should be branched from trunk, and point releases created as tags from the version branches. A development branch (e.g., mapred) should be used when a few developers need to make radical changes and do not want to disrupt other developers. So if most developers are now comfortable working on mapred, then we no longer need to keep it in a branch. And we already have a version branch for 0.7, so we don't need to reserve trunk for that. Does this analysis sound right? Doug
Re: [Fwd: Crawler submits forms?]
Doug Cutting wrote: Andrzej Bialecki wrote: Please also don't forget that the trunk/ will soon be invaded by the code from mapred, I guess some time around the middle of January (Doug?) Thinking about this more, perhaps we should do it sooner. There's already a branch for 0.7.x releases, so what point is there in not merging mapred to trunk now? We'd have fewer branches to maintain, and start getting nightly builds of mapred. Folks who require 0.7.x compatibility can continue to use (and patch) the 0.7.x branch. Objections? Doug I agree. I just thought that we would prepare the relase based on the code in trunk/ , and in that case we would like to wait with the merge before we do the release. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [Fwd: Crawler submits forms?]
Doug Cutting wrote: Andrzej Bialecki wrote: Please also don't forget that the trunk/ will soon be invaded by the code from mapred, I guess some time around the middle of January (Doug?) Thinking about this more, perhaps we should do it sooner. There's already a branch for 0.7.x releases, so what point is there in not merging mapred to trunk now? We'd have fewer branches to maintain, and start getting nightly builds of mapred. Folks who require 0.7.x compatibility can continue to use (and patch) the 0.7.x branch. Objections? Doug +1. Looking at the questions on mailing lists I do not think many people use trunk now. Piotr
Re: [Fwd: Crawler submits forms?]
Andrzej Bialecki wrote: Please also don't forget that the trunk/ will soon be invaded by the code from mapred, I guess some time around the middle of January (Doug?) Thinking about this more, perhaps we should do it sooner. There's already a branch for 0.7.x releases, so what point is there in not merging mapred to trunk now? We'd have fewer branches to maintain, and start getting nightly builds of mapred. Folks who require 0.7.x compatibility can continue to use (and patch) the 0.7.x branch. Objections? Doug
Re: [Fwd: Crawler submits forms?]
+1 - I wanted to suggest exactly this approach - but we should try to keep in mind not to introduce new features without serious reason (especially not backward compatible ones). Piotr On 12/14/05, Jérôme Charron <[EMAIL PROTECTED]> wrote: > > > What people think if we collect a list of issues and make a voting > > iteration? > > +1 > >
Re: [Fwd: Crawler submits forms?]
> What people think if we collect a list of issues and make a voting > iteration? +1
Re: [Fwd: Crawler submits forms?]
http://issues.apache.org/jira/browse/NUTCH-125 On its way ... ;-) I'll add it during this week. There are some more issues that are very small issues and some there are also some patches from the community. What people think if we collect a list of issues and make a voting iteration? Stefan
Re: [Fwd: Crawler submits forms?]
Zaheed Haque wrote: what about the following: http://issues.apache.org/jira/browse/NUTCH-125 On its way ... ;-) I'll add it during this week. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [Fwd: Crawler submits forms?]
what about the following: http://issues.apache.org/jira/browse/NUTCH-125 Cheers On 12/13/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Jérôme Charron wrote: > > >+1 for a 0.7.2 release. > > > > > > +1. > > Things are going well on the mapred branch, all basic tools are almost > in place, so after this release we will probably start merging... so, > this looks like the last release of the 0.7.x line (from the code in > trunk/ - I'm sure there will be maintenance releases afterwards). > > >I think we can wait for the enhancement proposed by Chris today: Adding an > >alias in parse-plugin.xml file and use a content-type/extension-id mapping > >instead of content-type/plugin-id. > > > > > > IMHO, this needs to be really well tested before going into a release > ... possibilities for confusion are great. > > >For further improvements, the new mime-type repository based on freedesktop > >mime-type will be needed. > >I cannot reasonably include this in 0.7.2, but I think it will be in trunk > >by the end of the year. > > > > > > > > Please also don't forget that the trunk/ will soon be invaded by the > code from mapred, I guess some time around the middle of January (Doug?) ... > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > -- Best Regards Zaheed Haque Phone : +46 735 06 E.mail: [EMAIL PROTECTED]
Re: [Fwd: Crawler submits forms?]
Jérôme Charron wrote: +1 for a 0.7.2 release. +1. Things are going well on the mapred branch, all basic tools are almost in place, so after this release we will probably start merging... so, this looks like the last release of the 0.7.x line (from the code in trunk/ - I'm sure there will be maintenance releases afterwards). I think we can wait for the enhancement proposed by Chris today: Adding an alias in parse-plugin.xml file and use a content-type/extension-id mapping instead of content-type/plugin-id. IMHO, this needs to be really well tested before going into a release ... possibilities for confusion are great. For further improvements, the new mime-type repository based on freedesktop mime-type will be needed. I cannot reasonably include this in 0.7.2, but I think it will be in trunk by the end of the year. Please also don't forget that the trunk/ will soon be invaded by the code from mapred, I guess some time around the middle of January (Doug?) ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [Fwd: Crawler submits forms?]
+1 for a 0.7.2 release. Here are the issues/revisions I can merge to 0.7 branch. These changes mainly concern the parser-factory changes (NUTCH-88) http://issues.apache.org/jira/browse/NUTCH-112 http://issues.apache.org/jira/browse/NUTCH-135 http://svn.apache.org/viewcvs.cgi?rev=356532&view=rev http://svn.apache.org/viewcvs.cgi?rev=355809&view=rev http://svn.apache.org/viewcvs.cgi?rev=354398&view=rev http://svn.apache.org/viewcvs.cgi?rev=326889&view=rev http://svn.apache.org/viewcvs.cgi?rev=321250&view=rev http://svn.apache.org/viewcvs.cgi?rev=321231&view=rev http://svn.apache.org/viewcvs.cgi?rev=306808&view=rev http://svn.apache.org/viewcvs.cgi?rev=293370&view=rev http://svn.apache.org/viewcvs.cgi?rev=292865&view=rev http://svn.apache.org/viewcvs.cgi?rev=292035&view=rev <[EMAIL PROTECTED]> Piotr, what about the italian translation? 0.7.2 could be a good candidate for a commit. no? >> This has been fixed in the mapred branch, but that patch is not in > >> 0.7.1 . This alone might be a reason to make a 0.7.2 release. http://svn.apache.org/viewcvs.cgi?view=rev&rev=348533 > I would be happy to see some more parser selection problems fixed but > > looks like Jerome is working hard also to get stuff fixed, may we can > > wait until that. I think we can wait for the enhancement proposed by Chris today: Adding an alias in parse-plugin.xml file and use a content-type/extension-id mapping instead of content-type/plugin-id. For further improvements, the new mime-type repository based on freedesktop mime-type will be needed. I cannot reasonably include this in 0.7.2, but I think it will be in trunk by the end of the year. What reasonable target date can we planned for a 0.7.2 ? Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
Re: [Fwd: Crawler submits forms?]
If we are going to make 0.7.2 release I would like to commit a patch for http://issues.apache.org/jira/browse/NUTCH-112 and probably for some build problems people are raporting (missing src folder in nutch-extension plugin). I will look at them in next few days. Regards Piotr Stefan Groschupf wrote: This has been fixed in the mapred branch, but that patch is not in 0.7.1. This alone might be a reason to make a 0.7.2 release. May we can get fixed some more parser selection related issue until next days also and get this into a 0.7.2 release. I would be happy to see some more parser selection problems fixed but looks like Jerome is working hard also to get stuff fixed, may we can wait until that. Stefan
Re: [Fwd: Crawler submits forms?]
This has been fixed in the mapred branch, but that patch is not in 0.7.1. This alone might be a reason to make a 0.7.2 release. May we can get fixed some more parser selection related issue until next days also and get this into a 0.7.2 release. I would be happy to see some more parser selection problems fixed but looks like Jerome is working hard also to get stuff fixed, may we can wait until that. Stefan
[Fwd: Crawler submits forms?]
FYI This has been fixed in the mapred branch, but that patch is not in 0.7.1. This alone might be a reason to make a 0.7.2 release. Doug Original Message Subject: Crawler submits forms? Date: Tue, 13 Dec 2005 16:57:34 - From: Andy Read <[EMAIL PROTECTED]> Reply-To: nutch-agent@lucene.apache.org Organization: Azurite Systems Ltd. To: Hi, I'm using nutch to create a site search facility for a couple of site. I upgraded from 0.6 to 0.7.1 a few days ago and have just noticed that blank users are being registered on my site at the exact times the cron job runs the crawl tool to re-index the site. This means that the crawler is now submitting a post request from the registration form! Is this a new 'feature' of 0.7 or 0.7.1? I can't find any mention in changes.txt and I can't find any config option referring to it. Surely the crawler should never submit form input? Any help appreciated. Thanks, Andy Read www.azurite.co.uk