Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-06 Thread Julien Nioche
y it's not > working for me, > > it would be nice for it to be working for me :) > > > > That being said if it's working for others and there are at least 3 +1s > and more > > +1s than my lone -1 then Lewis can surely decide to move forward. > > > >

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-07 Thread Julien Nioche
Lewis, Looks like you've released 2.0. If so can you make an announcement to the mailing list + update the website. It's not really something that should go unnoticed. I know about the press release but surely it does not mean that NOTHING should be said about the release then. I see a 1.5 on a m

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-07 Thread Julien Nioche
roadband in Paris is nothing short of utterly abysmal to > say the very best. Please see my comments below > > On Sat, Jul 7, 2012 at 9:58 PM, Julien Nioche > wrote: > > Looks like you've released 2.0. If so can you make an announcement to the > > mailing list + upda

Re: [VOTE] Apache Nutch 1.5.1 RC#3

2012-07-08 Thread Julien Nioche
+1 all looks fine for both the bin and src releases Thanks Lewis On 7 July 2012 22:07, Lewis John Mcgibbney wrote: > *PING* > > Hi Everyone, > > I know there have been a good few threads going around with a power of > release candidates but I wonder if it is possible to get some feedback > on the

[DONE] Renamed branch nutchgora into 2.x

2012-07-10 Thread Julien Nioche
On 9 July 2012 20:12, Sebastian Nagel wrote: > +1 (it's just a name, mainly in svn and jira) > > Sebastian > > On 07/09/2012 12:37 PM, Julien Nioche wrote: > > Guys, > > > > Now that we've released 2.0, wouldn't it be better to rename the > &

Re: [ANNOUNCEMENT] Apache Nutch v1.5.1 Released

2012-07-10 Thread Julien Nioche
Great Job Lewis! Thanks a lot On 10 July 2012 15:40, lewis john mcgibbney wrote: > Good Afternoon Everyone, > > The Apache Nutch PMC are very pleased to announce the release of > Apache Nutch v1.5.1. This release is a maintenance release of the > popular mainstream > 1.5.X series of the Apache N

Re: Build failed in Jenkins: Nutch-nutchgora #307

2012-07-13 Thread Julien Nioche
Lewis, Any chance you could have a quick look at this and fix the reference to the SVN repo? It is now https://svn.apache.org/repos/asf/nutch/branches/2.x/ Thanks On 13 July 2012 05:15, Apache Jenkins Server wrote: > See > > -

Re: duplicate jar files by plugin dependencies

2012-08-10 Thread Julien Nioche
+1 to using the maven-dependency-plugin within our ANT script. I think I had put a preliminary version for 1.x in JIRA but we'd need to extend the mechanism to the plugins as well. On 10 August 2012 10:37, Lewis John Mcgibbney wrote: > Hi Seb, > > On Thu, Aug 9, 2012 at 10:38 PM, Sebastian Nagel

Re: [VOTE] Apache Nutch 2.1 Release Candidate Available

2012-10-01 Thread Julien Nioche
Shouldn't the dependency for gora-sql point to v 0.2.1? On 21 September 2012 16:07, Lewis John Mcgibbney wrote: > Hi Everyone, > > A candidate for Apache Nutch 2.1 is available at: > > http://people.apache.org/~lewismc/apache-nutch-2.1 > > The release candidate is a src.zip and src.tar.gz ONLY >

Re: [VOTE] Apache Nutch 2.1 Release Candidate Available

2012-10-01 Thread Julien Nioche
> > Lewis > > > On Mon, Oct 1, 2012 at 1:36 PM, Julien Nioche < > lists.digitalpeb...@gmail.com> wrote: > >> Shouldn't the dependency for gora-sql point to v 0.2.1? >> >> >> On 21 September 2012 16:07, Lewis John Mcgibbney < >> lewi

Re: [VOTE] Apache Nutch 2.1 Release Candidate Available

2012-10-01 Thread Julien Nioche
> On Mon, Oct 1, 2012 at 2:18 PM, Julien Nioche < > lists.digitalpeb...@gmail.com> wrote: > >> >> The sources look good otherwise, it compiles fine but on my machine the >> tests fail on TestGoraStorage with >> >> *[Server@1a467d4]: [Thread[HSQLDB Server @

Re: [ANNOUNCE] Apache Nutch 2.1 Released

2012-10-05 Thread Julien Nioche
Thanks Lewis and well done everyone! Enjoy your week end Julien On 5 October 2012 16:12, lewis john mcgibbney wrote: > Good Afternoon Everyone, > > The Apache Nutch PMC are very pleased to announce the release of > Apache Nutch v2.1. This release continues to provide Nutch users with > a simpli

Re: NUTCH-1370

2012-10-29 Thread Julien Nioche
Hi Lewis see comments below > > So I thought I'd take this one on tonight and see if I can resolve. > Basically, my high level question is as follows... > Is each line of a text file (seed file) which we attempt to inject > into the webdb considered as an individual map task? > no - each file in

Re: NUTCH-1370

2012-10-30 Thread Julien Nioche
> > On Mon, Oct 29, 2012 at 4:52 PM, Julien Nioche < > lists.digitalpeb...@gmail.com> wrote: > >> Hi Lewis >> >> see comments below >> >>> >>> So I thought I'd take this one on tonight and see if I can resolve. >>> Basically,

Re: [ANNOUNCE] Apache Ivy 2.3.0-RC2 released

2012-11-16 Thread Julien Nioche
IIRC the ant script should download the ivy jars if it is not installed. The reason why we ship the dep is for publishing the artefacts and override any existing version to make sure we use the right one. Bit of a detail IMHO Julien On 16 November 2012 02:49, Lewis John Mcgibbney wrote: > Hi All

Re: [VOTE] Apache Nutch 1.6 Release Candidate

2012-11-28 Thread Julien Nioche
Hi Lewis, A few comments / questions below : - CHANGES.txt contains dates in both MM/DD/ and DD/MM/ formats. Shall we write the month in text form e.g. 7th July 2012 from now on? - Don't we need to have signatures as part of the RC? Looks good otherwise! Thanks! Julien On 23

Re: Strategy for Assigning Issues by Version

2012-11-29 Thread Julien Nioche
Good idea! I suspect that most of them will be dating from a looong time ago and it won't be such a straightforward task to apply them, however this would be a good way of sorting them > Additionally, may I suggest (and please shoot me down here if I sound > cheeky) that we make it a priority in

Re: [ANNOUNCE] Apache Nutch 1.6 Released

2012-12-08 Thread Julien Nioche
Great stuff! Thanks Lewis On 8 December 2012 21:50, Lewis John Mcgibbney wrote: > Hi All, > > The Apache Nutch PMC are extremely pleased to announce the release of > Apache Nutch v1.6. This release includes over 20 bug fixes, the same > in improvements, as well as new functionalities including a

Re: Additional patch for NUTCH-1087

2012-12-17 Thread Julien Nioche
Hi Tristan Can you open a new issue instead, mark it as an improvement and link to the original issue? BTW you patch name seems to contain '-2.1' but is for trunk which is misleading Thanks Julien On 14 December 2012 18:46, Tristan Buckner wrote: > I added a patch to the already closed 1087 (

Re: Additional patch for NUTCH-1087

2012-12-17 Thread Julien Nioche
thought each patch was just incrementing the last number. > What should I call it? > > Tristan > > > On Dec 17, 2012, at 6:15 AM, "Julien Nioche" < > lists.digitalpeb...@gmail.com> wrote: > > Hi Tristan > > Can you open a new issue instead, mark it as a

Re: Additional patch for NUTCH-1087

2012-12-19 Thread Julien Nioche
ully, it seems that the 2.x branch diverges from what's in > trunk. Should I do two patches, one for trunk and one for 2.x? > > Tristan > > > On Dec 17, 2012, at 9:37 AM, Julien Nioche > wrote: > > Something like NUTCH-$JIRA_ISSUE_NUMER_NutchversionNumber_revision.p

[CALL FOR TESTING] NUTCH-1047 Pluggable indexing backends

2013-01-18 Thread Julien Nioche
Hi guys, I've just attached a patch to https://issues.apache.org/jira/browse/NUTCH-1047 which contains what should be the first working version for it. I would be very grateful if you could spend a bit of time trying it and see if you come up with any problems. Basically the idea here is to have

Re: [CALL FOR TESTING] NUTCH-1047 Pluggable indexing backends

2013-01-23 Thread Julien Nioche
Hi guys, So? All good and working perfectly? Shall we commit it? J. On 18 January 2013 16:15, Julien Nioche wrote: > Hi guys, > > I've just attached a patch to > https://issues.apache.org/jira/browse/NUTCH-1047 which contains what > should be the first working version for

Re: Addition to Pluggable Backends

2013-01-26 Thread Julien Nioche
Hi Lewis It would need some rewriting to adapt it to the end point implemented by the plugins but it would be a good starting point and worth looking at. It could also be something maintained externally and would not necessarily have to be part of the Nutch code. Basically we'd maintain the plugin

Re: [ANNOUNCEMENT] Welcome Kiran Chitturi as Apache Nutch PMC and Committer

2013-03-10 Thread Julien Nioche
Great to hear about your use of Nutch at your library and welcome on board Kiran! Julien On 10 March 2013 01:27, kiran chitturi wrote: > Thanks a lot guys for inviting me and for the wishes. > > I am a graduate student in Virginia Tech University doing my Masters in > Computer Science. I have b

Re: [ANNOUNCEMENT] Welcome Kiran Chitturi as Apache Nutch PMC and Committer

2013-03-11 Thread Julien Nioche
Hi Kiran, Your account has been created and added to the Nutch group so you should be able to commit. Your first task is to add yourself to the list of committers on the Nutch website. the instructions on how to do this should be somewhere on the Wiki. Thanks Julien On 10 March 2013 01:27, kira

Re: [WELCOME] Feng Lu as Apache Nutch PMC and Committer

2013-03-18 Thread Julien Nioche
Hi Feng, Congratulations on becoming a committer and welcome! [...] > A problem has been troubling me a long time is that what is the target of > nutch 1.x, Does nutch 1.x is just a transitional version of Nutch 2.x, or > they can coexist because Nutch 1.x has a different data processing metho

Re: [Nutch Wiki] Trivial Update of "PGOSimone" by PGOSimone

2013-03-25 Thread Julien Nioche
I thought we had to have a login / password to modify the Wiki. If so how come we got so much spam lately? Julien On 25 March 2013 04:26, Apache Wiki wrote: > Dear Wiki user, > > You have subscribed to a wiki page or wiki category on "Nutch Wiki" for > change notification. > > The "PGOSimone" p

Re: [Nutch Wiki] Trivial Update of "PGOSimone" by PGOSimone

2013-03-26 Thread Julien Nioche
+ 1 to remove subhankarray. We can always add him/her back if genuine On 25 March 2013 20:30, Lewis John Mcgibbney wrote: > Hi, > > > On Mon, Mar 25, 2013 at 6:05 AM, wrote: > >> >> Hey Julien, >> >> I heard on #asfinfra that any of our MoinMoin wikis have been attacked >> recently by SPAM. >>

Re: Wiki locked down, spam pages deleted

2013-04-04 Thread Julien Nioche
Thanks for taking the time to do it Kiran! On 4 April 2013 01:16, kiran chitturi wrote: > Hi! > > The wiki is locked down and there will be no more spam pages. > > With the help of others, I have deleted all of the spam pages. If anyone > sees anything, please report to us. > > Very few people

Re: Nutch/Solr expert

2013-04-13 Thread Julien Nioche
Hi Andrea See http://wiki.apache.org/nutch/Support for a list of people who can help. Am based in the UK but feel free to get in touch at nutch@digitalpebble.comif that's still of interest Julien On 13 April 2013 10:03, Andrea Lanzoni wrote: > Hi Lewis, I disturb you for I thought you may hel

Re: Partial Updates in Solr 4.1

2013-04-26 Thread Julien Nioche
Hi Tomas Nice to hear about punkspider and great that you are using Nutch. Can you please open a JIRA issue and attach a patch for this? https://wiki.apache.org/nutch/HowToContribute Thanks Julien On 26 April 2013 02:56, Jay Springbernate wrote: > Hey Nutchers! Hope you all are doing fine.

Re: Request for Backup Mentor(s) for GSoCq

2013-05-17 Thread Julien Nioche
Hi Lewis I am happy to be a backup mentor for this. Cheers Julien On 16 May 2013 18:19, Lewis John Mcgibbney wrote: > Hi All, > This year for the Nutch + Giraph LinkRank delegation and integration > proposal, we have two proposals currently sitting. > As a requirement, each proposal needs a b

Re: Generic LinkRank plugin for Nutch

2013-05-30 Thread Julien Nioche
Hi Ahmet, You don't need to use the ScoringFilters at all. The nutch.scoring.webgraph package can be taken as an example of how to do. It works fine as far as I know but what we wanted with the Giraph-based replacement was to have less code to maintain and also have something we could use in 2.x

Re: [VOTE] Apache Nutch 2.2 Release Candidate

2013-06-03 Thread Julien Nioche
Hi, The keys live at http://www.apache.org/dist/nutch/KEYS ( http://nutch.apache.org/dist/KEYS gives a 404) Aren't the .asc files supposed to be there as well? ( http://www.apache.org/dev/release-signing.html#basic-facts). Am getting : *md5sum -c apache-nutch-2.2-src.zip.md5 * md5sum: apache-nut

Job opening in Bristol, UK

2013-06-06 Thread Julien Nioche
[Apologies for cross posting] We are looking for a candidate with the following skills and expertise : * experience in web crawling, ideally with Apache Nutch * Storm, Hadoop and related technologies * strong Java skills * interest in text processing, NLP and ML * good social

Re: [RESULT] WAS: Re: [VOTE] Apache Nutch 2.2 Release Candidate

2013-06-08 Thread Julien Nioche
ay that subject to VOTE'ing, I am closing the VOTE. > The following VOTE'es have been tallied > > [7] +1, let's get it released!!! > Tejas Patil > Renato Marroquín Mogrovejo (non-binding) > Kiran Chitturi > Feng Lu > Julien Nioche > Sebastian Nagel > Lewi

Re: [DISCUSS] Nutch 1.7 ready for release?

2013-06-10 Thread Julien Nioche
+1 to release now but it would have been nice to do https://issues.apache.org/jira/browse/NUTCH-1527 as part of the same release. The main change introduced in this version is the pluggable indexer and having a first working version for ES would be a good illustration of how useful this feature is.

Re: [DISCUSS] Nutch 1.7 ready for release?

2013-06-10 Thread Julien Nioche
Have added the upgrade to Tika 1.3 to v1.7 https://issues.apache.org/jira/browse/NUTCH-1522. It should be quite straightforward to include and would be a shame not to do it for this release. Thoughts? On 10 June 2013 08:48, Julien Nioche wrote: > +1 to release now but it would have been n

Re: [DISCUSS] Nutch 1.7 ready for release?

2013-06-10 Thread Julien Nioche
Have just committed NUTCH-1522 for both 2-x and trunk On 10 June 2013 12:07, Julien Nioche wrote: > Have added the upgrade to Tika 1.3 to v1.7 > https://issues.apache.org/jira/browse/NUTCH-1522. It should be quite > straightforward to include and would be a shame not to do it

Re: Nutch Site

2013-06-18 Thread Julien Nioche
Hi Lewis, Brilliant! Thanks a lot Julien On 18 June 2013 05:32, Lewis John Mcgibbney wrote: > Hi All, > @Julien, > A while ago you mentioned about changing the Nutch site to be more direct > towards Downloads. I agreed with this but as I didn't deal with it then and > there, it got put to the

No 2.2 branch in svn?

2013-06-21 Thread Julien Nioche
Hi, I can't see a snapshot branch for 2.2 on http://svn.apache.org/viewvc/nutch/branches/ Isn't that in a script or do we roll it by hand? J. -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

Re: [VOTE] Apache Nutch 1.7 Release Candidate

2013-06-21 Thread Julien Nioche
Hi Lewis The release notes [ https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680&version=12323281] list issues marked as won't fix which is probably not a great idea. For instance it lists *- Port nutch-mongodb-indexer to Nutch* which is a won't fix but people could get the impr

Re: [VOTE] Apache Nutch 1.7 Release Candidate

2013-06-21 Thread Julien Nioche
> Thank you v much. > Lewis > > > On Fri, Jun 21, 2013 at 1:47 AM, Julien Nioche < > lists.digitalpeb...@gmail.com> wrote: > > > Hi Lewis > > > > The release notes [ > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=1

Re: [VOTE] Apache Nutch 1.7 Release Candidate

2013-06-21 Thread Julien Nioche
nd the various issues and do this now. I am on mobile > and Jira does not render well in my browser. > Can you please see to this? > Thank you > > > On Fri, Jun 21, 2013 at 12:54 PM, Julien Nioche < > lists.digitalpeb...@gmail.com> wrote: > >> Hi Lewis >> &

Re: [ANNOUNCE] Apache Nutch v1.7 Released

2013-06-27 Thread Julien Nioche
Thanks Lewis for taking care of the release. Great stuff! Julien On 27 June 2013 00:38, Lewis John Mcgibbney wrote: > N.B. Previous message doesn't seem to have been mod'd through under my @ > apache.org address so resending ;) > It has however been distributed to annou...@apache.org already >

Re: [ANNOUNCE] Apache Nutch v2.2.1 Released

2013-07-02 Thread Julien Nioche
Great stuff! Thanks Lewis On 2 July 2013 17:32, Lewis John Mcgibbney wrote: > Good Afternoon Everyone, > > The Apache Nutch PMC are very pleased to announce the immediate release of > Apache Nutch v2.2.1, we advise all current users and developers of the 2.X > series to upgrade to this release A

Re: Feed Plugin Crawl Links

2013-08-08 Thread Julien Nioche
Hi Rich, What you need to do is doable with a bit of hacking : My requirement is to have a single index entry for each feed item. My > choices are: > > 1) Use the "parse-tika" plugin, which crawls the links that each RSS item > contains, but collects none of the RSS item metadata (author, publi

Re: [jira] [Updated] (NUTCH-1622) Create Outlinks with metadata

2013-08-09 Thread Julien Nioche
e Outlinks with metadata > > > [ > https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Julien Nioche updated NUTCH-1622: > - > > Attachment: NUTCH-1622.patch > &

Re: 2.x vs. 1.x speed

2013-09-16 Thread Julien Nioche
e continue to work on this. > > > On Wednesday, August 7, 2013, Julien Nioche > > wrote: > > Hi Otis > > > > Definitely *not *the fetching speed. Actually everything but *not* the > > fetching speed. The fetcher is pretty much the same as 1.x and anyway the &g

Nutch talk at Lucene/SOLR Revolution EU 2013

2013-09-25 Thread Julien Nioche
Hi, I will be giving a talk on Nutch at Lucene/SOLR Revolution in Dublin (4/7 Nov). There should be quite a few interesting presentations as you can see on http://lucenerevolution.org/sessions as well as the training sessions ( http://lucenerevolution.org/training). Ping me on twitter if you wi

[ANNOUNCEMENT] 0.3 release of crawler-commons

2013-10-11 Thread Julien Nioche
Hi, Just to let you know that we have just release the version 0.3 of crawler-commons. Crawler-commons is a set of reusable Java components that implement functionality common to any web crawler. These components benefit from collaboration among various existing web crawler projects, and reduce du

Alternative to Forrest for Nutch website

2013-10-22 Thread Julien Nioche
Hi guys I am about to modify the list of committers on the website and realised that I had forgotten how frustrating it is to have to install Forrest etc... Any idea of what is used by other Apache projects for their websites? Thanks Julien -- * *Open Source Solutions for Text Engineering h

Re: Alternative to Forrest for Nutch website

2013-10-22 Thread Julien Nioche
be for the CMS, but I don't have time to set it > > up (luckily infra can help and we can reuse a lot of > > the skins). > > > > Cheers, > > Chris > > > > -Original Message- > > From: Julien Nioche > > Reply-To: "dev@nutch.apac

Re: Alternative to Forrest for Nutch website

2013-10-25 Thread Julien Nioche
site >> 24645 by: Julien Nioche >> 24646 by: Chris Mattmann >> 24647 by: Markus Jelsma >> 24648 by: Julien Nioche >> >> >> I am about to modify the list of committers on the website and realised >> that I had forgotten h

Lucene SOLR Revolution Dublin

2013-10-29 Thread Julien Nioche
Hi guys Anyone going to http://www.lucenerevolution.org/ next week? I'll be giving a talk on Nutch, ping me on twitter or email if you want to meet up and have a chat. Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http:/

All in one Crawl class

2013-11-14 Thread Julien Nioche
See https://issues.apache.org/jira/browse/NUTCH-1621 It has now been removed from both trunk and 2.x. I will update the Wiki pages accordingly over the next couple of days to reflect this change. As of the next releases of Nutch the crawl script will have to be used instead. It works just as well

Re: [DISCUSS] Release Trunk

2013-11-28 Thread Julien Nioche
Hi Lewis We've done quite a few things in 1.x since the previous release (e.g. generic deduplication, removing indexer.solr package, etc...) and the next 2.x release will be after the changes to GORA have been made, tested and used on the Nutch side so that could be quite a while. I am neutral a

Re: Nutch with YARN (aka Hadoop 2.0)

2013-12-09 Thread Julien Nioche
I don't think Nutch has been fully ported to the new mapreduce API which is a prerequisite for running it on Hadoop 2. I can't think of a reason why that the performance would be any different with Yarn. Julien On 9 December 2013 06:42, Tejas Patil wrote: > Has anyone tried out running Nutch o

Re: Nightly builds

2014-01-08 Thread Julien Nioche
Great stuff, thanks Lewis On 8 January 2014 12:00, Lewis John Mcgibbney wrote: > Hi Folks, > > On Wed, Jan 8, 2014 at 4:06 AM, wrote: > >> I'm working on getting the Jenkins job configuration stable again. >> Something seems to have been reset or in not correct. >> I'll update here once we are

Re: Renovating "Nutch Hadoop Tutorial" wiki page

2014-01-21 Thread Julien Nioche
Hi The whole thing has been replaced with http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorialwhich does exactly what you described. +1 to remove the old nutchhadooptutorial page J. On 21 January 2014 17:44, Tejas Patil wrote:

Re: Renovating "Nutch Hadoop Tutorial" wiki page

2014-01-22 Thread Julien Nioche
/nutch/Nutch2Cassandra >> [3] : http://wiki.apache.org/nutch/NutchTutorial >> >> Thanks, >> Tejas >> >> >> On Wed, Jan 22, 2014 at 2:47 AM, d_k wrote: >> >>> Actually what I would like to see is a Nutch 2.x tutorial at the same >>&

Nutch meetup / hackathon at BerlinBuzzwords next May?

2014-01-24 Thread Julien Nioche
Hi guys, I'll certainly be at BerlinBuzzwords and have submitted at talk on Nutch. What about having a Nutch meetup / hackathon / workshop? http://berlinbuzzwords.de/news/hackathons-workshops-berlin-buzzwords Julien -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.

Re: [DISCUSS] Release Trunk

2014-02-12 Thread Julien Nioche
and probably a few others but they could also be done later. > At least, these should be done before releasing: > NUTCH-1646 IndexerMapReduce to consider DB status > NUTCH-1413 Record response time > > Sebastian > > On 11/28/2013 05:49 PM, Julien Nioche wrote: > > Hi Lewis

Common Crawl's Move to Apache Nutch

2014-02-21 Thread Julien Nioche
Hi, Just in case you missed it, here is a blog post from Jordan Mendelson on how they moved to Nutch : http://commoncrawl.org/common-crawl-move-to-nutch/ Julien -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/dig

Re: [DISCUSS] Release 1.8?

2014-03-11 Thread Julien Nioche
+1 Thanks for your work on these issues guys! Julien On 11 March 2014 18:24, Markus Jelsma wrote: > Yes! Agreed! > > > > > Sebastian Nagel schreef: > > Hi everyone, > > NUTCH-1113 and NUTCH-1706 are fixed, > broken HostDb (NUTCH-1325) has been removed for now from trunk. > No open issues mark

Re: Who is moderating Nutch lists?

2014-03-13 Thread Julien Nioche
I don't think these lists are moderated. Don't think they should be either J On Thursday, 13 March 2014, Markus Jelsma wrote: > Well, thats not me, perhaps Chris? > > -Original message- > From: Lewis John Mcgibbney> > Sent: Wednesday 12th March 2014 15:56 > To: dev@nutch.apache.org > S

Re: [VOTE] Release Apache Nutch 1.8RC#2

2014-03-16 Thread Julien Nioche
+1 from me. Thanks everyone On Sunday, 16 March 2014, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > +1 from me! > > SIGS pass, CHECKSUMS pass: > > [chipotle:~/tmp/apache-nutch-1.8] mattmann% $HOME/bin/stage_apache_rc > apache-nutch 1.8-bin https://dist.apache.org/repos/dist/d

Re: [WELCOME] Nutch PMC Welcomes Talat Uyarer to PMC and Committer

2014-04-02 Thread Julien Nioche
Congratulations Talat and welcome on board! Julien On 2 April 2014 07:56, Talat Uyarer wrote: > Hi All, > > I am very excited now. :) Thanks a lot to everyone for inviting me. > I'm a software engineer and crawler team leader of my company in > Istanbul. I have been using Apache Nutch 2.X for

Re: Pushing content to Solr from Nutch

2014-04-10 Thread Julien Nioche
Hi Xavier Your config file looks a bit outdated. Here are the values set by default (see http://svn.apache.org/repos/asf/nutch/trunk/conf/nutch-default.xml) plugin.includes protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|*indexer-solr*|scoring-opic|urlnormalizer-(pass|r

Re: [DISCUSS] Roadmap for 2.3 Release

2014-04-30 Thread Julien Nioche
I'd exclude NUTCH-1741 for now and focus on the core updates (GORA, filters, etc...). See comments on NUTCH-1714 On 1 May 2014 07:27, Lewis John Mcgibbney wrote: > Hi Alparslan & Folks, > > OK so you can see the road map's here > > *http://s.apa

Re: [DISCUSS] Roadmap for 2.3 Release

2014-05-01 Thread Julien Nioche
lien On 1 May 2014 08:40, Talat Uyarer wrote: > I aggree with you Julien. Today Lewis change some issues's fix version > 2.3 to 2.4. Most of my issues :) May I ask, If I update these issues, can > I change fix version to 2.3 ? I need them. > > Thanks > Talat > &g

Re: [DISCUSS] Roadmap for 2.3 Release

2014-05-01 Thread Julien Nioche
is to leverage the brand new GORA filtering so that we get only the entries marked for a given job - see discussion on NUTCH-1714 . This should make Nutch 2.x a lot faster. We haven't released 2.x for some time and loads of interesting stuff has been done to it. It will be an exciting release

Re: Post process Nutch data

2014-05-05 Thread Julien Nioche
Hi As mentioned earlier in a different discussion on this list behemoth would be the right tool for this Julien On Monday, 5 May 2014, Srikanth Shankara Rao wrote: > > Hi All, > > I have crawled Nutch data using 1.8. Data is in HDFS. I would like to > post-process this data before indexing int

Re: Creating Windows bash files for nutch

2014-05-18 Thread Julien Nioche
Hi > Currently nutch isn't very friendly to windows users as it requires cygwin > to run and there are a lot of issues with Hadoop 1.x branch, which nutch > bundles with it, due to the "set tmp permission" issue. > > What do you think about doing two things: > 1. Move to Hadoop 2.4 to support win

Re: Creating Windows bash files for nutch

2014-05-18 Thread Julien Nioche
atch/cmd scripts for windows that don't require Cygwin. > > I was thinking of writing those scripts but wanted to check if people > think it's a good idea. > > > On Sunday, May 18, 2014, Julien Nioche > wrote: > >> Hi >> >> >>> Currently nutch

Nutch survey

2014-05-21 Thread Julien Nioche
Hi everyone! I had written a survey about Nutch and its uses and would be very grateful if you could take a couple of minutes to contribute : https://docs.google.com/forms/d/15Jg7dGoU2I1aHur3g5ia9qshCMES8hB1OLpf5q6sGXg/viewform This should help getting a clearer picture of the wider Nutch commun

Re: Nutch survey

2014-05-22 Thread Julien Nioche
those of you who haven't done the survey yet, please do take part. It will definitely help getting a better picture of who we are / what we do as a community. The survey will be online for a few weeks. Thanks Julien On 21 May 2014 16:07, Julien Nioche wrote: > Hi everyone! > >

Re: Nutch survey

2014-05-27 Thread Julien Nioche
nd will be very useful for getting a clearer picture of who we are as a community, what we like or not with Nutch etc... Survey => https://t.co/Xod5Z3Mm5E Please RT : https://twitter.com/digitalpebble/status/469130285284466688 Thanks Julien On 22 May 2014 08:10, Julien Nioche wrote: > Hi g

ApacheCon CFP closes June 25

2014-06-10 Thread Julien Nioche
Dear Nutch enthusiast, As you may be aware, ApacheCon will be held this year in Budapest, on November 17-23. (See http://apachecon.eu for more info.) The Call For Papers for that conference is still open, but will be closing soon. We need you talk proposals, to represent Nutch at ApacheCon. We ne

Travel assistance for ApacheCon EU, Budapest November 17-21 2014

2014-06-11 Thread Julien Nioche
The Travel Assistance Committee (TAC) is happy to anounce that we now accept applications for ApacheCon Europe 2014, 17-21 November in Budapest, Hungary Applications are welcome from individuals within the Apache community at-large, users, developers, educators, students, Committers, and Members,

Re: nutch elpais.com

2014-06-16 Thread Julien Nioche
Salut Yann, Not really answering your question but where did you get this config from? Some of its elements have been long deprecated (query-*, response-*, summary-*) Julien On 15 June 2014 10:20, Yann Levreau wrote: > hi everyone ! > > I'm sorry to disturb you but i need some assistance for

Version of Java in Jenkins

2014-06-17 Thread Julien Nioche
Lewis, https://issues.apache.org/jira/browse/NUTCH-1590 requires Java 1.7 for building the Javadoc. Does something need changing in Jenkins? BTW is there a WIKI page somewhere on how to configure Jenkins? Thanks Julien -- Open Source Solutions for Text Engineering http://digitalpebble.blogsp

Re: Nutch Extension for realtime processing

2014-06-18 Thread Julien Nioche
Hi Jake Great to hear about your ideas. Sounds like what you are proposing would be only "near" realtime as much would depend on the generation which is a batch step. How / when would the update step be called? Would this be a fetcher only i.e. does not recursively discover links. If so why not go

Re: Nutch Extension for realtime processing

2014-06-19 Thread Julien Nioche
nuously would be an enormous undertaking, requiring a major overhaul > of Nutch and a migration from MR. But creating a plugin-based hook to the > Fetcher seems to be relatively trivial. > > The storm-crawler project looks neat! We’ve contemplated building > something similar that wo

Nearing a 1.9 release?

2014-06-29 Thread Julien Nioche
Hi guys, We've done loads of good work on the trunk since the last release, in particular : - NUTCH-1736 - NUTCH-1647 - NUTCH-1793 whi

Re: Nearing a 1.9 release?

2014-07-07 Thread Julien Nioche
ution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC> and change their fix version back to 1.9 if you think they should be included in the next release. Thanks Julien On 29 June 2014 10:20, Julien Nioche wrote: > Hi guys, > > We've done loads of good work on the trunk s

[VOTE] Remove pom.xml from source

2014-07-15 Thread Julien Nioche
Hi, One of the frequent issues on the mailing list / JIRA is that users can be led to think that Nutch is built with Maven as they can see what looks like a perfectly valid pom.xml at the root of the project. It becomes clearer when reading the WIKI or FAQ that ANT should be used instead but it is

Re: [VOTE] Remove pom.xml from source

2014-07-15 Thread Julien Nioche
of IDEs works very good with maven. > > Talat > > > 2014-07-15 13:36 GMT+03:00 Julien Nioche : > > Hi, >> >> One of the frequent issues on the mailing list / JIRA is that users can >> be led to think that Nutch is built with Maven as they can see what looks >&

Re: [DISCUSS] [VOTE] Remove pom.xml from source

2014-07-15 Thread Julien Nioche
nerates the dependencies, and not e.g., the developer > list, etc. So, we need the pom.xml as the template that has that stuff, until someone cooks up a XSL combining solution with that original template > and then what ant deploy spits out, no? > > Cheers, > Chris > > > &

Re: Problems running some ant targets on recent trunk

2014-07-17 Thread Julien Nioche
Hi This is probably due to some of the recent changes I made e.g. https://issues.apache.org/jira/browse/NUTCH-1804 I'll have a look at this. Thanks Julien On 16 July 2014 23:10, Sebastian Nagel wrote: > Hi, > > I have some problems running ant targets on recent trunk: > > % ant runtime > fa

Re: Problems running some ant targets on recent trunk

2014-07-17 Thread Julien Nioche
> > In this case it is the target "compile-test" of lib-regex-filter which > fails. > Should it be really called for target "runtime"? > > > dir="../lib-regex-filter"/> > This is the source of the problem indeed, the second line should not be there : the test classes are not requi

Re: Problems running some ant targets on recent trunk

2014-07-21 Thread Julien Nioche
s for reporting it. On 17 July 2014 10:18, Julien Nioche wrote: > In this case it is the target "compile-test" of lib-regex-filter which >> fails. >> Should it be really called for target "runtime"? >> >> >> > dir="../lib-regex-f

Re: Push Nutch 1.9

2014-07-30 Thread Julien Nioche
Hi Lewis https://issues.apache.org/jira/browse/NUTCH-1755 is more at a discussion stage and can be done later. I have moved it to 1.10 I've just committed https://issues.apache.org/jira/browse/NUTCH-1561 - there are no more issues flagged for 1.9. +1 for a RC. This will be a terrific release wit

Re: Push Nutch 1.9

2014-08-07 Thread Julien Nioche
Lewis, Any chance you'd have time to spin a RC? Thanks Julien On 30 July 2014 21:14, Sebastian Nagel wrote: > +1 > > sebastian > > > 2014-07-30 10:56 GMT+02:00 Julien Nioche <mailto:lists.digitalpeb...@gmail.com>>: > > Hi Lewis > > http

Re: [VOTE] Apache Nutch 1.9 Release Candidate #1

2014-08-13 Thread Julien Nioche
Hi, +1 to release. Compilation and tests run fine. Signatures look good. Thanks Lewis! Julien On 13 August 2014 06:32, Lewis John Mcgibbney wrote: > VOTE'ing will be open for 'at-least' 72 hours to allow people enough time > to cast their VOTE's. > Thanks > Lewis > > > On Tue, Aug 12, 2014 a

Re: Incorrect download links for Nutch-1.9

2014-08-28 Thread Julien Nioche
Thanks for reporting this Jake, I'll fix this tomorrow (unless a fellow committer beats me to it) Julien On 27 August 2014 17:37, Jake Dodd wrote: > Hi all, > > I noticed that following the download links for Nutch 1.9 (from > http://nutch.apache.org/downloads.html) takes users to a series of

Re: Title of the page Version Control

2014-08-28 Thread Julien Nioche
Thanks for reporting this Alfonso, I'll fix this tomorrow (unless a fellow committer beats me to it) Julien On 28 August 2014 10:13, Alfonso Nishikawa wrote: > Greetings, > > I found that the page https://nutch.apache.org/version_control.html > states in it's title: "Apache Nutch™ - Gora Vers

Re: Incorrect download links for Nutch-1.9

2014-08-29 Thread Julien Nioche
Thanks Lewis On 28 August 2014 22:41, Lewis John Mcgibbney wrote: > Hi Jake, > Thank you so much for reporting. > Fixed. > Thank you, have a great day. > Lewis > > > On Wed, Aug 27, 2014 at 9:37 AM, wrote: > >> >> Hi all, >> >> I noticed that following the download links for Nutch 1.9 (from >>

Re: Title of the page Version Control

2014-08-29 Thread Julien Nioche
Done! Thanks again On 28 August 2014 16:41, Julien Nioche wrote: > Thanks for reporting this Alfonso, I'll fix this tomorrow (unless a fellow > committer beats me to it) > > Julien > > > > On 28 August 2014 10:13, Alfonso Nishikawa > wrote: > >> Gre

<    1   2   3   4   5   6   7   8   9   10   >