Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Sami Siren

Doug Cutting wrote:

Andrzej Bialecki wrote:

Please also don't forget that the trunk/ will soon be invaded by the 
code from mapred, I guess some time around the middle of January (Doug?) 



Thinking about this more, perhaps we should do it sooner.  There's 
already a branch for 0.7.x releases, so what point is there in not 
merging mapred to trunk now?  We'd have fewer branches to maintain, and 
start getting nightly builds of mapred.  Folks who require 0.7.x 
compatibility can continue to use (and patch) the 0.7.x branch.  
Objections?


Doug


+1. I think this is good time to merge now as the mapred is fully usable.

--
 Sami Siren




Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Doug Cutting

Andrzej Bialecki wrote:
Yes, we just need to make sure that all important bits from trunk are on 
the 0.7 branch, before we start.


I will sync mapred with the trunk prior to the merge, so we should still 
be able to get anything we need after mapred is merged back to trunk.


BTW, we're pretty closely following the recommendations in:

http://svnbook.red-bean.com/en/1.1/ch04s04.html#svn-ch-4-sect-4.4

The mapred branch is a 'feature' branch.  At the end of this section 
they describe how to merge a feature branch back into the trunk.


Doug


Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Andrzej Bialecki

Doug Cutting wrote:


Andrzej Bialecki wrote:

I agree. I just thought that we would prepare the relase based on the 
code in trunk/ , and in that case we would like to wait with the 
merge before we do the release.



My definition of trunk is that it should be where the majority of 
development happens.  It is what we should build nightly, etc.


Major versions should be branched from trunk, and point releases 
created as tags from the version branches.


A development branch (e.g., mapred) should be used when a few 
developers need to make radical changes and do not want to disrupt 
other developers.


So if most developers are now comfortable working on mapred, then we 
no longer need to keep it in a branch.  And we already have a version 
branch for 0.7, so we don't need to reserve trunk for that.


Does this analysis sound right?



Yes, we just need to make sure that all important bits from trunk are on 
the 0.7 branch, before we start.


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Doug Cutting

Andrzej Bialecki wrote:
I agree. I just thought that we would prepare the relase based on the 
code in trunk/ , and in that case we would like to wait with the merge 
before we do the release.


My definition of trunk is that it should be where the majority of 
development happens.  It is what we should build nightly, etc.


Major versions should be branched from trunk, and point releases created 
as tags from the version branches.


A development branch (e.g., mapred) should be used when a few developers 
need to make radical changes and do not want to disrupt other developers.


So if most developers are now comfortable working on mapred, then we no 
longer need to keep it in a branch.  And we already have a version 
branch for 0.7, so we don't need to reserve trunk for that.


Does this analysis sound right?

Doug


Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Andrzej Bialecki

Doug Cutting wrote:


Andrzej Bialecki wrote:

Please also don't forget that the trunk/ will soon be invaded by the 
code from mapred, I guess some time around the middle of January (Doug?) 



Thinking about this more, perhaps we should do it sooner.  There's 
already a branch for 0.7.x releases, so what point is there in not 
merging mapred to trunk now?  We'd have fewer branches to maintain, 
and start getting nightly builds of mapred.  Folks who require 0.7.x 
compatibility can continue to use (and patch) the 0.7.x branch.  
Objections?


Doug



I agree. I just thought that we would prepare the relase based on the 
code in trunk/ , and in that case we would like to wait with the merge 
before we do the release.


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Piotr Kosiorowski

Doug Cutting wrote:

Andrzej Bialecki wrote:

Please also don't forget that the trunk/ will soon be invaded by the 
code from mapred, I guess some time around the middle of January (Doug?) 



Thinking about this more, perhaps we should do it sooner.  There's 
already a branch for 0.7.x releases, so what point is there in not 
merging mapred to trunk now?  We'd have fewer branches to maintain, and 
start getting nightly builds of mapred.  Folks who require 0.7.x 
compatibility can continue to use (and patch) the 0.7.x branch.  
Objections?


Doug

+1. Looking at the questions on mailing lists I do not think many people 
use trunk now.


Piotr


Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Doug Cutting

Andrzej Bialecki wrote:
Please also don't forget that the trunk/ will soon be invaded by the 
code from mapred, I guess some time around the middle of January (Doug?) 


Thinking about this more, perhaps we should do it sooner.  There's 
already a branch for 0.7.x releases, so what point is there in not 
merging mapred to trunk now?  We'd have fewer branches to maintain, and 
start getting nightly builds of mapred.  Folks who require 0.7.x 
compatibility can continue to use (and patch) the 0.7.x branch.  Objections?


Doug


Re: [Fwd: Crawler submits forms?]

2005-12-14 Thread Piotr Kosiorowski
+1 - I wanted to suggest exactly this approach - but we should try to keep
in mind not to introduce new features without serious reason (especially not
backward compatible ones).
Piotr

On 12/14/05, Jérôme Charron <[EMAIL PROTECTED]> wrote:
>
> > What people think if we collect a list of issues and make a voting
> > iteration?
>
> +1
>
>


Re: [Fwd: Crawler submits forms?]

2005-12-14 Thread Jérôme Charron
> What people think if we collect a list of issues and make a voting
> iteration?

+1


Re: [Fwd: Crawler submits forms?]

2005-12-14 Thread Stefan Groschupf


http://issues.apache.org/jira/browse/NUTCH-125



On its way ... ;-) I'll add it during this week.


There are some more issues that are very small issues and some there are
also some patches from the  community.
What people think if we collect a list of issues and make a voting  
iteration?


Stefan 
 


Re: [Fwd: Crawler submits forms?]

2005-12-14 Thread Andrzej Bialecki

Zaheed Haque wrote:


what about the following:

http://issues.apache.org/jira/browse/NUTCH-125
 



On its way ... ;-) I'll add it during this week.

--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [Fwd: Crawler submits forms?]

2005-12-14 Thread Zaheed Haque
what about the following:

http://issues.apache.org/jira/browse/NUTCH-125

Cheers

On 12/13/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Jérôme Charron wrote:
>
> >+1 for a 0.7.2 release.
> >
> >
>
> +1.
>
> Things are going well on the mapred branch, all basic tools are almost
> in place, so after this release we will probably start merging... so,
> this looks like the last release of the 0.7.x line (from the code in
> trunk/ - I'm sure there will be maintenance releases afterwards).
>
> >I think we can wait for the enhancement proposed by Chris today: Adding an
> >alias in parse-plugin.xml file and use a content-type/extension-id mapping
> >instead of content-type/plugin-id.
> >
> >
>
> IMHO, this needs to be really well tested before going into a release
> ... possibilities for confusion are great.
>
> >For further improvements, the new mime-type repository based on freedesktop
> >mime-type will be needed.
> >I cannot reasonably include this in 0.7.2, but I think it will be in trunk
> >by the end of the year.
> >
> >
> >
>
> Please also don't forget that the trunk/ will soon be invaded by the
> code from mapred, I guess some time around the middle of January (Doug?) ...
>
> --
> Best regards,
> Andrzej Bialecki <><
>  ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>


--
Best Regards
Zaheed Haque
Phone : +46 735 06
E.mail: [EMAIL PROTECTED]


Re: [Fwd: Crawler submits forms?]

2005-12-13 Thread Andrzej Bialecki

Jérôme Charron wrote:


+1 for a 0.7.2 release.
 



+1.

Things are going well on the mapred branch, all basic tools are almost 
in place, so after this release we will probably start merging... so, 
this looks like the last release of the 0.7.x line (from the code in 
trunk/ - I'm sure there will be maintenance releases afterwards).



I think we can wait for the enhancement proposed by Chris today: Adding an
alias in parse-plugin.xml file and use a content-type/extension-id mapping
instead of content-type/plugin-id.
 



IMHO, this needs to be really well tested before going into a release 
... possibilities for confusion are great.



For further improvements, the new mime-type repository based on freedesktop
mime-type will be needed.
I cannot reasonably include this in 0.7.2, but I think it will be in trunk
by the end of the year.

 



Please also don't forget that the trunk/ will soon be invaded by the 
code from mapred, I guess some time around the middle of January (Doug?) ...


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [Fwd: Crawler submits forms?]

2005-12-13 Thread Jérôme Charron
+1 for a 0.7.2 release.
Here are the issues/revisions I can merge to 0.7 branch.
These changes mainly concern the parser-factory changes (NUTCH-88)

http://issues.apache.org/jira/browse/NUTCH-112
http://issues.apache.org/jira/browse/NUTCH-135
http://svn.apache.org/viewcvs.cgi?rev=356532&view=rev
http://svn.apache.org/viewcvs.cgi?rev=355809&view=rev
http://svn.apache.org/viewcvs.cgi?rev=354398&view=rev
http://svn.apache.org/viewcvs.cgi?rev=326889&view=rev
http://svn.apache.org/viewcvs.cgi?rev=321250&view=rev
http://svn.apache.org/viewcvs.cgi?rev=321231&view=rev
http://svn.apache.org/viewcvs.cgi?rev=306808&view=rev
http://svn.apache.org/viewcvs.cgi?rev=293370&view=rev
http://svn.apache.org/viewcvs.cgi?rev=292865&view=rev
http://svn.apache.org/viewcvs.cgi?rev=292035&view=rev

 <[EMAIL PROTECTED]>
Piotr, what about the italian translation?
0.7.2 could be a good candidate for a commit. no?

>> This has been fixed in the mapred branch, but that patch is not in
> >> 0.7.1 .  This alone might be a reason to make a 0.7.2 release.

http://svn.apache.org/viewcvs.cgi?view=rev&rev=348533

> I would be happy to see some more parser selection problems fixed but
> > looks like Jerome is working  hard also to get stuff fixed, may we  can
> > wait until that.

I think we can wait for the enhancement proposed by Chris today: Adding an
alias in parse-plugin.xml file and use a content-type/extension-id mapping
instead of content-type/plugin-id.
For further improvements, the new mime-type repository based on freedesktop
mime-type will be needed.
I cannot reasonably include this in 0.7.2, but I think it will be in trunk
by the end of the year.

What reasonable target date can we planned for a 0.7.2 ?

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


Re: [Fwd: Crawler submits forms?]

2005-12-13 Thread Piotr Kosiorowski

If we are going to make 0.7.2 release I would like to commit
a patch for http://issues.apache.org/jira/browse/NUTCH-112
and probably for some build problems people are raporting (missing src 
folder in nutch-extension plugin).

I will look at them in next few days.
Regards
Piotr
Stefan Groschupf wrote:
This has been fixed in the mapred branch, but that patch is not in  
0.7.1.  This alone might be a reason to make a 0.7.2 release.



May we can get fixed some more parser selection related issue until  
next days also and get this into a 0.7.2 release.
I would be happy to see some more parser selection problems fixed but  
looks like Jerome is working  hard also to get stuff fixed, may we  can 
wait until that.


Stefan




Re: [Fwd: Crawler submits forms?]

2005-12-13 Thread Stefan Groschupf
This has been fixed in the mapred branch, but that patch is not in  
0.7.1.  This alone might be a reason to make a 0.7.2 release.


May we can get fixed some more parser selection related issue until  
next days also and get this into a 0.7.2 release.
I would be happy to see some more parser selection problems fixed but  
looks like Jerome is working  hard also to get stuff fixed, may we  
can wait until that.


Stefan 


[Fwd: Crawler submits forms?]

2005-12-13 Thread Doug Cutting

FYI

This has been fixed in the mapred branch, but that patch is not in 
0.7.1.  This alone might be a reason to make a 0.7.2 release.


Doug

 Original Message 
Subject: Crawler submits forms?
Date: Tue, 13 Dec 2005 16:57:34 -
From: Andy Read <[EMAIL PROTECTED]>
Reply-To: nutch-agent@lucene.apache.org
Organization: Azurite Systems Ltd.
To: 

Hi,

I'm using nutch to create a site search facility for a couple of site.

I upgraded from 0.6 to 0.7.1 a few days ago and have just noticed that blank
users are being registered on my site at the exact times the cron job runs
the crawl tool to re-index the site.  This means that the crawler is now
submitting a post request from the registration form!  Is this a new
'feature' of 0.7 or 0.7.1?  I can't find any mention in changes.txt and I
can't find any config option referring to it.  Surely the crawler should
never submit form input?

Any help appreciated.

Thanks,

Andy Read

www.azurite.co.uk