Re: Fwd: Maven configuration

2017-10-31 Thread Raffaele Palmieri
Hi Sebastian, effectively you're right. Initially I forked repo, then there
have been many updates, and preferred to start from scratch. For me it is
important to know that this could a feasible alternative to current
configuration, eventually well accepted from the community.
In the next days, I'll try to get back to history, let you know,
best regards,
RP.

2017-10-31 18:23 GMT+01:00 Sebastian Nagel :

> Hi Raffaele,
>
> thanks for your work. If time I'll have a look at it.
>
> One questions: the repository does not contain any of the existing commits,
> the entire history is lost. That makes it impossible to track the origin
> of the code.
> Is it possible for you to do the work starting from a fresh checkout
> of https://github.com/apache/nutch/ (master branch)?
>
> Best,
> Sebastian
>
>
> On 10/27/2017 04:02 PM, Raffaele Palmieri wrote:
> > Dear developers,
> > as follow-up of proposal, I share the experimetal project with mavenized
> > Nutch: https://github.com/zirafel/apache_nutch
> > Comments and feedbacks are well accepted. For the deployment process,
> assembly plugin could be used.
> > Do you think it is feasible?
> > Best,
> > RP.
> >
> >
> > -- Forwarded message --
> > From: *Raffaele Palmieri* >
> > Date: 2017-09-21 18:33 GMT+02:00
> > Subject: Maven configuration
> > To: dev@nutch.apache.org 
> >
> >
> > Hi devs,
> > I've seen that maven configuration argument had been discussed and there
> is also an issue on Jira
> > (https://issues.apache.org/jira/browse/NUTCH-1371 <
> https://issues.apache.org/jira/browse/NUTCH-1371>).
> > Our need is developing/integrating some plugins for Nutch and I realize
> that using current
> > configuration Ant/Ivy is quite difficult to use for the development and
> unit testing.
> > Is there any updates to migrate to Maven?
> > Our proposal is:
> > pom.xml (aggregator)
> > - main_module (actually src)
> > - nutch_plugins (pom plugin aggregator)
> > - creativecommons (actually /src/plugin/creativecommons)
> > - index-anchor
> > - ...
> > It's true that's a problem for who runs current configuration, but Maven
> could be a better solution
> > also for them,
> > Best,
> > RP.
> >
>
>


Re: Fwd: Maven configuration

2017-10-31 Thread Sebastian Nagel
Hi Raffaele,

thanks for your work. If time I'll have a look at it.

One questions: the repository does not contain any of the existing commits,
the entire history is lost. That makes it impossible to track the origin of the 
code.
Is it possible for you to do the work starting from a fresh checkout
of https://github.com/apache/nutch/ (master branch)?

Best,
Sebastian


On 10/27/2017 04:02 PM, Raffaele Palmieri wrote:
> Dear developers,
> as follow-up of proposal, I share the experimetal project with mavenized
> Nutch: https://github.com/zirafel/apache_nutch
> Comments and feedbacks are well accepted. For the deployment process, 
> assembly plugin could be used.
> Do you think it is feasible?
> Best,
> RP.
> 
> 
> -- Forwarded message --
> From: *Raffaele Palmieri*  >
> Date: 2017-09-21 18:33 GMT+02:00
> Subject: Maven configuration
> To: dev@nutch.apache.org 
> 
> 
> Hi devs,
> I've seen that maven configuration argument had been discussed and there is 
> also an issue on Jira
> (https://issues.apache.org/jira/browse/NUTCH-1371 
> ).
> Our need is developing/integrating some plugins for Nutch and I realize that 
> using current
> configuration Ant/Ivy is quite difficult to use for the development and unit 
> testing.
> Is there any updates to migrate to Maven?
> Our proposal is:
> pom.xml (aggregator)
> - main_module (actually src)
> - nutch_plugins (pom plugin aggregator)
> - creativecommons (actually /src/plugin/creativecommons)
> - index-anchor
>                 - ...
> It's true that's a problem for who runs current configuration, but Maven 
> could be a better solution
> also for them,
> Best,
> RP.
> 



Crawler-Commons 0.9 released

2017-10-31 Thread Julien Nioche
Happy Halloween!

We are glad to announce the 0.9 release of Crawler-Commons. See the
CHANGES.txt

file
included with the release for a full list of details. The main changes are
the removal of DOM-based sitemap parser as the SAX equivalent introduced in
the previous version has better performance and is also more robust.

You might need to change your code to replace SiteMapParserSAX with
SiteMapParser. The parser is now aware of namespaces, and by default does
not force the namespace to be the one recommended in the specification (
http://www.sitemaps.org/schemas/sitemap/0.9) as variants can be found in
the wild. You can set the behaviour using the method
*setStrictNamespace(boolean)*.

As usual, the version 0.9 contains numerous improvements and bugfixes and
all users are invited to upgrade to this version.
Thanks to all committers, contributors and users.

Julien