Re: [VOTE] Apache Nutch 2.2 Release Candidate

2013-06-03 Thread kiran chitturi
+1, This is important release for 2.x Series.

Tejas, I also got it 2 days late

On Fri, May 31, 2013 at 7:17 PM, lewis john mcgibbney wrote:

> Good Friday Everyone,
>
> Glad to get to a stage where we can VOTE on the release of the Apache
> Nutch 2.2 artifacts.
>
> We solved a stack of issues:
> http://s.apache.org/LPB
>
> SVN source tag:
> http://svn.apache.org/repos/asf/nutch/tags/release-2.2/
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapachenutch-044/
>
> Release artifacts:
> http://people.apache.org/~lewismc/nutch/nutch2.2/
>
> PGP release keys (signed using 4E21557F):
> http://nutch.apache.org/dist/KEYS
>
> Vote will be open for at least least 72 hours, however given this weather
> I suppose we can all be forgiven if it is not done over the weekend :0)
>
> I would like to say a huge thanks all contributors and committers from far
> and wide who helped with this release. It is another milestone for us to
> get here yet again.
>
> Have a great weekend
>
> Lewis
>
> [ ] +1, let's get it released!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> p.s. here's my +1
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: Nutch wiki down ?

2013-05-01 Thread kiran chitturi
Hi Tejas,

I think it was down for a while. I just checked and Solr wiki is also down
[1].


[1] - https://wiki.apache.org/solr


On Thu, May 2, 2013 at 1:08 AM, Tejas Patil wrote:

> Hi all,
>
> I have done few jira checkins this week and wanted to verify if that would
> need any wiki updates. Since past 2-3 days, I am not able to access nutch
> wiki pages[1]. It says:
>
> *wiki.apache.org is undergoing maintance.*
> *We should be back online at  UTC*
> *Infrabot on twitter contains more information.*
> *Thanks, ASF Infrastructure Team*
>
> The twitter page [0] in that note is not hinting to any specific
> update/activity wrt Nutch. Moreover, I am not able to figure out if there
> was any generic downtime for several projects as a part of a maintenance.
>
> Does anybody knows about this ?
>
> [0] : https://twitter.com/infrabot
> [1] : http://wiki.apache.org/nutch/
>
> Thanks,
> Tejas
>



-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-04-03 Thread kiran chitturi
Hurray :)

Our wiki is locked down. Donald, has helped in making the process much
quicker and he is also helping in deleting the spam pages.

If anyone want to edit wiki, and were not able to do so please contact any
one of us and we will add you to the wiki list.



On Wed, Apr 3, 2013 at 6:37 PM, kiran chitturi wrote:

> Thank you Ken.
>
> I am talking to Gavin McDonald right now, and we will have moin wiki
> access soon.
>
> No more spam mails :)
>
>
> On Wed, Apr 3, 2013 at 5:06 PM, Ken Krugler 
> wrote:
>
>> Hi Kiran,
>>
>> I was just chatting w/Steve Rowe, who handled this for the Solr project.
>> He said:
>>
>> It took less than a day, but I went on #asfinfra IRC channel and asked
>> some questions about the process, which may have gotten Gavin McDonald to
>> move on it sooner.
>>
>>
>> Since we're still getting slammed with spam, it might be worthwhile to do
>> the same.
>>
>> Thanks,
>>
>> -- Ken
>>
>>
>> On Apr 1, 2013, at 12:30pm, kiran chitturi wrote:
>>
>> I have posted the information on the JIRA issue page [0]. Let's hope the
>> issue will be taken care of soon.
>>
>>
>> [0] - https://issues.apache.org/jira/browse/INFRA-6081
>>
>>
>> On Mon, Apr 1, 2013 at 3:27 PM, Lewis John Mcgibbney <
>> lewis.mcgibb...@gmail.com> wrote:
>>
>>> Hi Kiran,
>>>
>>>
>>> On Mon, Apr 1, 2013 at 6:53 AM, wrote:
>>>
>>>> Re: Important : Bunch of Spam Created under Nutch Wiki!!
>>>> 22926 by: kiran chitturi
>>>>
>>>>
>>>> Hi guys,
>>>>
>>>> Do you know what is the destination for commit mails ? Can I give '
>>>> dev@nutch.apache.org' ?
>>>>
>>>
>>> No, we should put commit emails to the styatic archive here
>>> http://mail-archives.apache.org/mod_mbox/nutch-commits/
>>>
>>>
>>> Thanks for sorting this out Kiran, we are truly getting hounded with
>>> spam just now.
>>> Best
>>> Lewis
>>>
>>
>>
>>
>> --
>> Kiran Chitturi
>>
>> <http://www.linkedin.com/in/kiranchitturi>
>>
>>
>>
>>--
>>  Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>>
>>
>>
>>
>>
>>
>
>
> --
> Kiran Chitturi
>
> <http://www.linkedin.com/in/kiranchitturi>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-04-03 Thread kiran chitturi
Thank you Ken.

I am talking to Gavin McDonald right now, and we will have moin wiki access
soon.

No more spam mails :)


On Wed, Apr 3, 2013 at 5:06 PM, Ken Krugler wrote:

> Hi Kiran,
>
> I was just chatting w/Steve Rowe, who handled this for the Solr project.
> He said:
>
> It took less than a day, but I went on #asfinfra IRC channel and asked
> some questions about the process, which may have gotten Gavin McDonald to
> move on it sooner.
>
>
> Since we're still getting slammed with spam, it might be worthwhile to do
> the same.
>
> Thanks,
>
> -- Ken
>
>
> On Apr 1, 2013, at 12:30pm, kiran chitturi wrote:
>
> I have posted the information on the JIRA issue page [0]. Let's hope the
> issue will be taken care of soon.
>
>
> [0] - https://issues.apache.org/jira/browse/INFRA-6081
>
>
> On Mon, Apr 1, 2013 at 3:27 PM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>> Hi Kiran,
>>
>>
>> On Mon, Apr 1, 2013 at 6:53 AM,  wrote:
>>
>>> Re: Important : Bunch of Spam Created under Nutch Wiki!!
>>> 22926 by: kiran chitturi
>>>
>>>
>>> Hi guys,
>>>
>>> Do you know what is the destination for commit mails ? Can I give '
>>> dev@nutch.apache.org' ?
>>>
>>
>> No, we should put commit emails to the styatic archive here
>> http://mail-archives.apache.org/mod_mbox/nutch-commits/
>>
>>
>> Thanks for sorting this out Kiran, we are truly getting hounded with spam
>> just now.
>> Best
>> Lewis
>>
>
>
>
> --
> Kiran Chitturi
>
> <http://www.linkedin.com/in/kiranchitturi>
>
>
>
>--
>  Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-04-01 Thread kiran chitturi
I have posted the information on the JIRA issue page [0]. Let's hope the
issue will be taken care of soon.


[0] - https://issues.apache.org/jira/browse/INFRA-6081


On Mon, Apr 1, 2013 at 3:27 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Kiran,
>
>
> On Mon, Apr 1, 2013 at 6:53 AM,  wrote:
>
>> Re: Important : Bunch of Spam Created under Nutch Wiki!!
>> 22926 by: kiran chitturi
>>
>>
>> Hi guys,
>>
>> Do you know what is the destination for commit mails ? Can I give '
>> dev@nutch.apache.org' ?
>>
>
> No, we should put commit emails to the styatic archive here
> http://mail-archives.apache.org/mod_mbox/nutch-commits/
>
>
> Thanks for sorting this out Kiran, we are truly getting hounded with spam
> just now.
> Best
> Lewis
>



-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-04-01 Thread kiran chitturi
Hi guys,

Do you know what is the destination for commit mails ? Can I give '
dev@nutch.apache.org' ?

I am planning on giving the below information so far for creating a moin
wiki [1]

Wiki Name: Nutch
Usernames: LewisJohnMcgibbney, kiranchitturi, SebastianNagel, JulienNioche
Destination for Commit mails: dev@nutch.apache.org

Please let me know if any of the information is incorrect or needed any
modifications.

[1] -
http://wiki.apache.org/general/OurWikiFarm#per_wiki_access_control_-_tighten_your_wiki_just_a_little.2C_benefit_just_a_lot


On Sat, Mar 30, 2013 at 4:29 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hey Kiran,
>
> I think here:
>
> http://wiki.apache.org/general/OurWikiFarm#per_wiki_access_control_-_tighte
> n_your_wiki_just_a_little.2C_benefit_just_a_lot
>
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++
>
>
>
>
>
>
> -Original Message-
> From: kiran chitturi 
> Reply-To: "dev@nutch.apache.org" 
> Date: Saturday, March 30, 2013 12:55 PM
> To: "dev@nutch.apache.org" 
> Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
>
> >Does anyone know what details we need to provide for the new wiki
> >controls ?
> >
> >
> >
> >I have posted a JIRA [0] to control our spam but the infrabot is asking
> >more information [1]
> >
> >[0] - https://issues.apache.org/jira/browse/INFRA-6081
> >[1] -  http://www.apache.org/dev/infra-contact#what-we-need-to-know
> >
> >
> >
> >On Thu, Mar 28, 2013 at 3:18 PM, Mattmann, Chris A (388J)
> > wrote:
> >
> >Hi Kiran,
> >
> >Yes, my recommendation:
> >
> >1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel,
> >ask for help. If you don't have IRC, email
> >infrastruct...@apache.org <mailto:infrastruct...@apache.org>
> >and/or file a
> >https://issues.apache.org/jira/browse/INFRA
> ><https://issues.apache.org/jira/browse/INFRA> ticket
> >
> >2. Request that they enable ASAP ContributorsGroup only acls
> >
> >I know that many Apache wikis (MoinMon) are being attackedŠ
> >
> >Cheers,
> >Chris
> >
> >
> >++
> >Chris Mattmann, Ph.D.
> >Senior Computer Scientist
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 171-266B, Mailstop: 171-246
> >Email: chris.a.mattm...@nasa.gov
> >WWW:  http://sunset.usc.edu/~mattmann/
> >++
> >Adjunct Assistant Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++
> >
> >
> >
> >
> >-Original Message-
> >From: kiran chitturi 
> >Reply-To: "dev@nutch.apache.org" 
> >Date: Thursday, March 28, 2013 12:15 PM
> >To: "dev@nutch.apache.org" 
> >Subject: Fwd: Important : Bunch of Spam Created under Nutch Wiki!!
> >
> >>Thanks to Ken (check message below) for reporting our insecure wiki. I
> >>have checked it and anyone can create an fake account and edit any of our
> >>wiki pages or create new ones.
> >>
> >>
> >>When I first registered to the wiki, all the pages are immutable and
> >>Lewis had to add me to Contributors group to make changes to the wiki.
> >>
> >>
> >>Probably, the setting was hacked for now and that is the reason we are
> >>facing lot of spam.
> >>
> >>
> >>Can we contact the infra@apache and request them to lock down the wiki
> as
> >>the other groups did ?
> >>
> >>
> >>
> >>
> >>-- Forwarded message --
> >>From: Ken Krugler 
> >>Date: Thu, Mar 28, 2013 at 1:35 PM
> >>Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
> >>To: dev@nutch.apache.org
> >>
> >>
> >>Hi Kiran,
> >>
> >>On Mar

Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl with One Seed.

2013-03-31 Thread kiran chitturi
Hi Binoy,

Thanks for the reporting on the issue and debugging ?

Did you try using individual commands or crawl script instead of the crawl
command  ?

You can try running Nutch remotely [1]. This will help you in running
commands from shell and debug using Eclipse.

[1]
http://wiki.apache.org/nutch/RunNutchInEclipse#Remote_Debugging_in_Eclipse


On Sun, Mar 31, 2013 at 11:25 PM, Binoy d  wrote:

> Hi,
>
> I have Nutch 2.x set up with Mysql and am seeing a peculiar null pointer
> exception with a crawl with sample seeds from DMOZ. I decided to do fresh
> crawl with only  one url as seed and empty webpage table.
> I am running *org.apache.nutch.crawl.Crawler* from eclipse  with args *urls
> -dir /home/binoy/lab/dmoz/apache-url -solr http://localhost:8983/solr/
> -depth 1  -topN 1*
>
> the apache-url seed file has only one entry ("http://nutch.apache.org/";)
>
>
> I see the following nullpointer exception : Logs :
> http://pastebin.com/CaqJpPkn
>
> With a little debugging from eclipse I see
>
> conf.set(GeneratorJob.BATCH_ID, batchId);
>
> in IndexerJob.java createIndexJob method being the root cause.
>
> wrapping it in *if(batchId != null)  *seems to solve the issue.
>
> I wanted to know if this is  a valid patch. It seems from grep-ing no on
> else is reading GeneratorJob.BATCH_ID except indexerJob.
>
> I am always seeing batchId passed as null for createIndexJob for clean
> crawls (empty table), which scenario causes it to be not null? and what is
> the significance generator job batchId for indexing job.
>
> It seems a trivial issue and hence I didnot create a jira. I have attached
> the small patch and would be glad if some one can take a look.
>
> Regards,
> Binoy
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-30 Thread kiran chitturi
Does anyone know what details we need to provide for the new wiki controls
?

I have posted a JIRA [0] to control our spam but the infrabot is asking
more information [1]

[0] - https://issues.apache.org/jira/browse/INFRA-6081
[1] -  http://www.apache.org/dev/infra-contact#what-we-need-to-know


On Thu, Mar 28, 2013 at 3:18 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Kiran,
>
> Yes, my recommendation:
>
> 1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel,
> ask for help. If you don't have IRC, email infrastruct...@apache.org
> and/or file a https://issues.apache.org/jira/browse/INFRA ticket
>
> 2. Request that they enable ASAP ContributorsGroup only acls
>
> I know that many Apache wikis (MoinMon) are being attackedŠ
>
> Cheers,
> Chris
>
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
> -Original Message-
> From: kiran chitturi 
> Reply-To: "dev@nutch.apache.org" 
> Date: Thursday, March 28, 2013 12:15 PM
> To: "dev@nutch.apache.org" 
> Subject: Fwd: Important : Bunch of Spam Created under Nutch Wiki!!
>
> >Thanks to Ken (check message below) for reporting our insecure wiki. I
> >have checked it and anyone can create an fake account and edit any of our
> >wiki pages or create new ones.
> >
> >
> >When I first registered to the wiki, all the pages are immutable and
> >Lewis had to add me to Contributors group to make changes to the wiki.
> >
> >
> >Probably, the setting was hacked for now and that is the reason we are
> >facing lot of spam.
> >
> >
> >Can we contact the infra@apache and request them to lock down the wiki as
> >the other groups did ?
> >
> >
> >
> >
> >-- Forwarded message --
> >From: Ken Krugler 
> >Date: Thu, Mar 28, 2013 at 1:35 PM
> >Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
> >To: dev@nutch.apache.org
> >
> >
> >Hi Kiran,
> >
> >On Mar 28, 2013, at 2:03am, kiran chitturi wrote:
> >
> >
> >Thank you Ken for the information. I think the access is already
> >restricted to Contributors Only. Someone can please confirm, if it is
> >not.
> >
> >
> >
> >
> >
> >It's not, as far as I know. I just created a fake account, logged in with
> >it, and edited the front page.
> >
> >
> >If anyone needs to edit wiki, they would need to ask someone to get
> >access to wiki pages.
> >
> >
> >Do you know if Solr still got hit by spam after locking down the wiki ?
> >
> >
> >
> >
> >
> >
> >I think that change helped cut down most of the spam, but I don't monitor
> >the Solr list that closely, sorry.
> >
> >
> >-- Ken
> >
> >
> >
> >
> >
> >
> >On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler
> > wrote:
> >
> >
> >
> >On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
> >
> >
> >Thank you Binoy for reporting.
> >
> >
> >We have been monitoring the pages and deleting them when we get time but
> >there are more coming up. Today, I have seen a spam editing on the home
> >page of Nutch wiki. It has inserted spam links under tutorials.
> >
> >
> >We need to find a permanent solution to this. I wonder if any other
> >list-servs are facing the same issue.
> >
> >
> >
> >
> >
> >
> >Yes - Solr recently had to lock down editing on their wiki:
> >
> >
> >
> >The wiki at http://wiki.apache.org/solr/ has come under attack by
> >spammers more frequently of late, so the PMC has decided to lock it down
> > in an attempt to reduce the work involved in tracking and removing spam.
> >
> >From now on, only people who appear on
> >http://wiki.apache.org/solr/ContributorsGroup will be able to
> >create/modify/delete wiki pages.
> >
> >Please request either on the solr-u...@lucene.apache.org or on
> >d...@lucene.apache.org to have your wiki username added to the
> >Cont

Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread kiran chitturi
Thank you Chris. I have posted the message in freenode and filed a JIRA
https://issues.apache.org/jira/browse/INFRA-6081


On Thu, Mar 28, 2013 at 3:18 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Kiran,
>
> Yes, my recommendation:
>
> 1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel,
> ask for help. If you don't have IRC, email infrastruct...@apache.org
> and/or file a https://issues.apache.org/jira/browse/INFRA ticket
>
> 2. Request that they enable ASAP ContributorsGroup only acls
>
> I know that many Apache wikis (MoinMon) are being attackedŠ
>
> Cheers,
> Chris
>
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
> -Original Message-
> From: kiran chitturi 
> Reply-To: "dev@nutch.apache.org" 
> Date: Thursday, March 28, 2013 12:15 PM
> To: "dev@nutch.apache.org" 
> Subject: Fwd: Important : Bunch of Spam Created under Nutch Wiki!!
>
> >Thanks to Ken (check message below) for reporting our insecure wiki. I
> >have checked it and anyone can create an fake account and edit any of our
> >wiki pages or create new ones.
> >
> >
> >When I first registered to the wiki, all the pages are immutable and
> >Lewis had to add me to Contributors group to make changes to the wiki.
> >
> >
> >Probably, the setting was hacked for now and that is the reason we are
> >facing lot of spam.
> >
> >
> >Can we contact the infra@apache and request them to lock down the wiki as
> >the other groups did ?
> >
> >
> >
> >
> >-- Forwarded message --
> >From: Ken Krugler 
> >Date: Thu, Mar 28, 2013 at 1:35 PM
> >Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
> >To: dev@nutch.apache.org
> >
> >
> >Hi Kiran,
> >
> >On Mar 28, 2013, at 2:03am, kiran chitturi wrote:
> >
> >
> >Thank you Ken for the information. I think the access is already
> >restricted to Contributors Only. Someone can please confirm, if it is
> >not.
> >
> >
> >
> >
> >
> >It's not, as far as I know. I just created a fake account, logged in with
> >it, and edited the front page.
> >
> >
> >If anyone needs to edit wiki, they would need to ask someone to get
> >access to wiki pages.
> >
> >
> >Do you know if Solr still got hit by spam after locking down the wiki ?
> >
> >
> >
> >
> >
> >
> >I think that change helped cut down most of the spam, but I don't monitor
> >the Solr list that closely, sorry.
> >
> >
> >-- Ken
> >
> >
> >
> >
> >
> >
> >On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler
> > wrote:
> >
> >
> >
> >On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
> >
> >
> >Thank you Binoy for reporting.
> >
> >
> >We have been monitoring the pages and deleting them when we get time but
> >there are more coming up. Today, I have seen a spam editing on the home
> >page of Nutch wiki. It has inserted spam links under tutorials.
> >
> >
> >We need to find a permanent solution to this. I wonder if any other
> >list-servs are facing the same issue.
> >
> >
> >
> >
> >
> >
> >Yes - Solr recently had to lock down editing on their wiki:
> >
> >
> >
> >The wiki at http://wiki.apache.org/solr/ has come under attack by
> >spammers more frequently of late, so the PMC has decided to lock it down
> > in an attempt to reduce the work involved in tracking and removing spam.
> >
> >From now on, only people who appear on
> >http://wiki.apache.org/solr/ContributorsGroup will be able to
> >create/modify/delete wiki pages.
> >
> >Please request either on the solr-u...@lucene.apache.org or on
> >d...@lucene.apache.org to have your wiki username added to the
> >ContributorsGroup
> > page - this is a one-time step.
> >
> >
> >
> >
> >So I think you need to make a request to Infra to lock down the wiki,
&

Fwd: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread kiran chitturi
Thanks to Ken (check message below) for reporting our insecure wiki. I have
checked it and anyone can create an fake account and edit any of our wiki
pages or create new ones.

When I first registered to the wiki, all the pages are immutable and Lewis
had to add me to Contributors group to make changes to the wiki.

Probably, the setting was hacked for now and that is the reason we are
facing lot of spam.

Can we contact the infra@apache and request them to lock down the wiki as
the other groups did ?


-- Forwarded message --
From: Ken Krugler 
Date: Thu, Mar 28, 2013 at 1:35 PM
Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
To: dev@nutch.apache.org


Hi Kiran,

On Mar 28, 2013, at 2:03am, kiran chitturi wrote:

Thank you Ken for the information. I think the access is already restricted
to Contributors Only. Someone can please confirm, if it is not.


It's not, as far as I know. I just created a fake account, logged in with
it, and edited the front page.

If anyone needs to edit wiki, they would need to ask someone to get access
to wiki pages.

Do you know if Solr still got hit by spam after locking down the wiki ?


I think that change helped cut down most of the spam, but I don't monitor
the Solr list that closely, sorry.

-- Ken



On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler wrote:

>
> On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
>
> Thank you Binoy for reporting.
>
> We have been monitoring the pages and deleting them when we get time but
> there are more coming up. Today, I have seen a spam editing on the home
> page of Nutch wiki. It has inserted spam links under tutorials.
>
> We need to find a permanent solution to this. I wonder if any other
> list-servs are facing the same issue.
>
>
> Yes - Solr recently had to lock down editing on their wiki:
>
> The wiki at http://wiki.apache.org/solr/ has come under attack by
> spammers more frequently of late, so the PMC has decided to lock it down in
> an attempt to reduce the work involved in tracking and removing spam.
>
> From now on, only people who appear on
> http://wiki.apache.org/solr/ContributorsGroup will be able to
> create/modify/delete wiki pages.
>
> Please request either on the solr-u...@lucene.apache.org or on
> d...@lucene.apache.org to have your wiki username added to the
> ContributorsGroup page - this is a one-time step.
>
>
> So I think you need to make a request to Infra to lock down the wiki, then
> add people (generally in response to explicit requests) to the
> ContributorsGroup page.
>
> -- Ken
>
>
>
>
> On Thu, Mar 28, 2013 at 12:49 AM, Binoy d  wrote:
>
>> I am quite suprised looking at the notification I am getting for new
>> pages for Nutch Wiki
>> Example :
>> http://wiki.apache.org/nutch/KarlPuent
>>
>> I see at least 25-35 emails regarding such notification.
>>
>> All of the links I got are  rooted under http://wiki.apache.org/nutch/
>>
>>
>> Is some one looking into this , If needed I can gladly forward emails to
>> the person cleaning it up as I am not sure if every one has access to
>> delete the pages.
>>
>> Regards,
>> b
>>
>> -- Forwarded message --
>> From: Apache Wiki 
>> Date: Wed, Mar 27, 2013 at 9:32 PM
>> Subject: [Nutch Wiki] Trivial Update of "EdwinaBro" by EdwinaBro
>> To: Apache Wiki 
>>
>>
>> Dear Wiki user,
>>
>> You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
>> change notification.
>>
>> The "EdwinaBro" page has been changed by EdwinaBro:
>> http://wiki.apache.org/nutch/EdwinaBro
>>
>> New page:
>> I am 24 years old and my name is Edwina Brownlee. I life in Corjolens
>> (Switzerland).<>
>> <>
>> <>
>> Take a look at my web-site ... [[http://modform.org/SolomonKr|Continue]]
>>
>>
>
>
> --
> Kiran Chitturi
>
> <http://www.linkedin.com/in/kiranchitturi>
>
>
>
>--
>  Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>



   --
 Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr








-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread kiran chitturi
Thank you Ken for the information. I think the access is already restricted
to Contributors Only. Someone can please confirm, if it is not.

If anyone needs to edit wiki, they would need to ask someone to get access
to wiki pages.

Do you know if Solr still got hit by spam after locking down the wiki ?

Thanks,
Th


On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler wrote:

>
> On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
>
> Thank you Binoy for reporting.
>
> We have been monitoring the pages and deleting them when we get time but
> there are more coming up. Today, I have seen a spam editing on the home
> page of Nutch wiki. It has inserted spam links under tutorials.
>
> We need to find a permanent solution to this. I wonder if any other
> list-servs are facing the same issue.
>
>
> Yes - Solr recently had to lock down editing on their wiki:
>
> The wiki at http://wiki.apache.org/solr/ has come under attack by
> spammers more frequently of late, so the PMC has decided to lock it down in
> an attempt to reduce the work involved in tracking and removing spam.
>
> From now on, only people who appear on
> http://wiki.apache.org/solr/ContributorsGroup will be able to
> create/modify/delete wiki pages.
>
> Please request either on the solr-u...@lucene.apache.org or on
> d...@lucene.apache.org to have your wiki username added to the
> ContributorsGroup page - this is a one-time step.
>
>
> So I think you need to make a request to Infra to lock down the wiki, then
> add people (generally in response to explicit requests) to the
> ContributorsGroup page.
>
> -- Ken
>
>
>
>
> On Thu, Mar 28, 2013 at 12:49 AM, Binoy d  wrote:
>
>> I am quite suprised looking at the notification I am getting for new
>> pages for Nutch Wiki
>> Example :
>> http://wiki.apache.org/nutch/KarlPuent
>>
>> I see at least 25-35 emails regarding such notification.
>>
>> All of the links I got are  rooted under http://wiki.apache.org/nutch/
>>
>>
>> Is some one looking into this , If needed I can gladly forward emails to
>> the person cleaning it up as I am not sure if every one has access to
>> delete the pages.
>>
>> Regards,
>> b
>>
>> -- Forwarded message --
>> From: Apache Wiki 
>> Date: Wed, Mar 27, 2013 at 9:32 PM
>> Subject: [Nutch Wiki] Trivial Update of "EdwinaBro" by EdwinaBro
>> To: Apache Wiki 
>>
>>
>> Dear Wiki user,
>>
>> You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
>> change notification.
>>
>> The "EdwinaBro" page has been changed by EdwinaBro:
>> http://wiki.apache.org/nutch/EdwinaBro
>>
>> New page:
>> I am 24 years old and my name is Edwina Brownlee. I life in Corjolens
>> (Switzerland).<>
>> <>
>> <>
>> Take a look at my web-site ... [[http://modform.org/SolomonKr|Continue]]
>>
>>
>
>
> --
> Kiran Chitturi
>
> <http://www.linkedin.com/in/kiranchitturi>
>
>
>
>--
>  Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-27 Thread kiran chitturi
Thank you Binoy for reporting.

We have been monitoring the pages and deleting them when we get time but
there are more coming up. Today, I have seen a spam editing on the home
page of Nutch wiki. It has inserted spam links under tutorials.

We need to find a permanent solution to this. I wonder if any other
list-servs are facing the same issue.


On Thu, Mar 28, 2013 at 12:49 AM, Binoy d  wrote:

> I am quite suprised looking at the notification I am getting for new pages
> for Nutch Wiki
> Example :
> http://wiki.apache.org/nutch/KarlPuent
>
> I see at least 25-35 emails regarding such notification.
>
> All of the links I got are  rooted under http://wiki.apache.org/nutch/
>
>
> Is some one looking into this , If needed I can gladly forward emails to
> the person cleaning it up as I am not sure if every one has access to
> delete the pages.
>
> Regards,
> b
>
> -- Forwarded message --
> From: Apache Wiki 
> Date: Wed, Mar 27, 2013 at 9:32 PM
> Subject: [Nutch Wiki] Trivial Update of "EdwinaBro" by EdwinaBro
> To: Apache Wiki 
>
>
> Dear Wiki user,
>
> You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
> change notification.
>
> The "EdwinaBro" page has been changed by EdwinaBro:
> http://wiki.apache.org/nutch/EdwinaBro
>
> New page:
> I am 24 years old and my name is Edwina Brownlee. I life in Corjolens
> (Switzerland).<>
> <>
> <>
> Take a look at my web-site ... [[http://modform.org/SolomonKr|Continue]]
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: [Nutch Wiki] Trivial Update of "PGOSimone" by PGOSimone

2013-03-25 Thread kiran chitturi
I have a feeling someone's account got compromised or spammers found a new
way. I am not sure how they are getting in.


On Mon, Mar 25, 2013 at 4:55 AM, Julien Nioche <
lists.digitalpeb...@gmail.com> wrote:

> I thought we had to have a login / password to modify the Wiki. If so how
> come we got so much spam lately?
>
> Julien
>
>
> On 25 March 2013 04:26, Apache Wiki  wrote:
>
>> Dear Wiki user,
>>
>> You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
>> change notification.
>>
>> The "PGOSimone" page has been changed by PGOSimone:
>> http://wiki.apache.org/nutch/PGOSimone
>>
>> New page:
>> Pleased to meet up with you! My title is Audria Pumphrey.<>
>> One particular of the incredibly finest factors in the earth for me is
>> doing aerobics but I haven't manufactured a dime with it. Illinois is the
>> place I have often been residing but I will have to transfer in a yr or
>> two. My working day career is a postal assistance employee but shortly I am
>> going to be on my own.<>
>> <>
>> Here is my homepage ... [[
>> http://Velocar.dirkhennig.de/index.php?title=Pattaya_Hotels_-_Family_Friendly_Choices|visitthe
>>  following page]]
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>



-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-23 Thread kiran chitturi
Thank you Chris for your interest.

I would love to share my thesis and the work but I am still in
experimenting stage and I will share with you soon once I have a decent UI
running with functionalities.

Regards,
Kiran.


On Sat, Mar 23, 2013 at 2:33 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

>  That is so awesome Kiran.
>
>  Great job and I would love a link to your thesis (or even seeing the
> work in progress)
> if you are willing to share and have the time.
>
>  Good plane reading material for me and congrats again. Looking forward
> to working
> with you.
>
>  Cheers,
> Chris
>
>
>   From: kiran chitturi 
> Reply-To: "dev@nutch.apache.org" 
> Date: Saturday, March 23, 2013 9:54 AM
>
> To: "dev@nutch.apache.org" 
> Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp
>
>   Thanks Chris!
>
>  I am planning to graduate with Masters degree in Computer Science from
> Virginia Tech University and my advisor is Dr.Fox.
>
>  My thesis work mostly relates to building search engine for the 10TB
> crises event data that we have collected over last three years. The data is
> collected using Internet Archive crawler (Archive-it) and I am indexing
> data using LucidWorks Big Data Software. The process also involves finding
> more metadata and clustering. All of this work is related to 'Crisis,
> Tragedy and Recovery Network Project (CTRnet)' (www.ctrnet.net)
>
>  My thesis, library work and Nutch are all closely related. It has been a
> great learning experience so far :)
>
>
>
>
> On Sat, Mar 23, 2013 at 12:23 PM, Mattmann, Chris A (388J) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
>>  Hi Kiran,
>>
>>  Awesome that works fine for me! Happy to have you contribute, and
>> whether you are a formal mentor or not,
>> if we get a GSoC 2013 student for this you can help me, Lewis, (and
>> others) shepherd it in!
>>
>>  Thanks man and congrats on graduating soon! Where are you graduating
>> from and in what subject?
>>
>>  Cheers,
>> Chris
>>
>>   From: kiran chitturi 
>> Reply-To: "dev@nutch.apache.org" 
>>  Date: Saturday, March 23, 2013 8:51 AM
>>
>> To: "dev@nutch.apache.org" 
>> Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp
>>
>>   I am very much interested in the Apache Wicket project but I wouldn't
>> be able to be a student since i am finishing my graduation and looking for
>> full-time jobs. I have discussed with Lewis previously about this, and it
>> wouldn't be ideal for me to be a GSoc 2013 student as I can't devote my
>> full-time work on this.
>>
>>  However, I will be very happy to work on this in my free time. This is
>> something I am interested in for long time and I would try to contribute in
>> anyway possible.
>>
>>  Thank you,
>> Kiran.
>>
>>
>>
>>
>>
>>
>> On Sat, Mar 23, 2013 at 11:23 AM, Mattmann, Chris A (388J) <
>> chris.a.mattm...@jpl.nasa.gov> wrote:
>>
>>>  Hi Kiran,
>>>
>>>  Great, yes the REST services need work for sure. They haven't been
>>> worked on in a while.
>>>
>>>  I'm privy to Apache CXF, but I haven't done anything with it, and
>>> Andrzej did an awesome job
>>> using Restlet, so we've got Reslet for now.
>>>
>>>  If you are interested in documenting the services, then awesome! Do
>>> you want to be a GSoC 2013 student,
>>> and are you interested in this project?
>>>
>>>  Cheers,
>>> Chris
>>>
>>>
>>>   From: kiran chitturi 
>>> Reply-To: "dev@nutch.apache.org" 
>>> Date: Friday, March 22, 2013 9:19 PM
>>> To: "dev@nutch.apache.org" 
>>> Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp
>>>
>>>   Hi Chris,
>>>
>>>  I was just thinking about that this evening. First, to start with this
>>> I want to do well documentation of the Nutch REST API.
>>>
>>>  What is the status of Rest API ? Does it need any fixes and working
>>> examples ?
>>>
>>>  Hopefully my start would be helpful and it be soon.
>>>
>>>  Thanks for opening up the issue.
>>>
>>>  Regards,
>>> kIran.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J) <
>>> chris.a.mattm...@jpl.nasa.gov> wrote:
>>>
>>>> Hey Guys,
>>>>
>>>> I posted:
>>>>
>>>> https://issues.apache.org/jira/browse/NUTCH-841
>>>>
>>>>
>>>> As a potential GSOC 2013 summer project. I'm willing to mentor it,
>>>> since I
>>>> love
>>>> Wicket, and I'm willing to maintain the result as a Nutch committer.
>>>>
>>>> If NUTCH-841 doesn't get selected, I'll start implementing it this
>>>> summer
>>>> if no
>>>> one beats me to it.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>
>>>
>>>  --
>>> Kiran Chitturi
>>>
>>>   <http://www.linkedin.com/in/kiranchitturi>
>>>
>>>
>>>
>>
>>
>>  --
>> Kiran Chitturi
>>
>>   <http://www.linkedin.com/in/kiranchitturi>
>>
>>
>>
>
>
>  --
> Kiran Chitturi
>
>   <http://www.linkedin.com/in/kiranchitturi>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-23 Thread kiran chitturi
Thanks Chris!

I am planning to graduate with Masters degree in Computer Science from
Virginia Tech University and my advisor is Dr.Fox.

My thesis work mostly relates to building search engine for the 10TB crises
event data that we have collected over last three years. The data is
collected using Internet Archive crawler (Archive-it) and I am indexing
data using LucidWorks Big Data Software. The process also involves finding
more metadata and clustering. All of this work is related to 'Crisis,
Tragedy and Recovery Network Project (CTRnet)' (www.ctrnet.net)

My thesis, library work and Nutch are all closely related. It has been a
great learning experience so far :)




On Sat, Mar 23, 2013 at 12:23 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

>  Hi Kiran,
>
>  Awesome that works fine for me! Happy to have you contribute, and
> whether you are a formal mentor or not,
> if we get a GSoC 2013 student for this you can help me, Lewis, (and
> others) shepherd it in!
>
>  Thanks man and congrats on graduating soon! Where are you graduating
> from and in what subject?
>
>  Cheers,
> Chris
>
>   From: kiran chitturi 
> Reply-To: "dev@nutch.apache.org" 
> Date: Saturday, March 23, 2013 8:51 AM
>
> To: "dev@nutch.apache.org" 
> Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp
>
>   I am very much interested in the Apache Wicket project but I wouldn't
> be able to be a student since i am finishing my graduation and looking for
> full-time jobs. I have discussed with Lewis previously about this, and it
> wouldn't be ideal for me to be a GSoc 2013 student as I can't devote my
> full-time work on this.
>
>  However, I will be very happy to work on this in my free time. This is
> something I am interested in for long time and I would try to contribute in
> anyway possible.
>
>  Thank you,
> Kiran.
>
>
>
>
>
>
> On Sat, Mar 23, 2013 at 11:23 AM, Mattmann, Chris A (388J) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
>>  Hi Kiran,
>>
>>  Great, yes the REST services need work for sure. They haven't been
>> worked on in a while.
>>
>>  I'm privy to Apache CXF, but I haven't done anything with it, and
>> Andrzej did an awesome job
>> using Restlet, so we've got Reslet for now.
>>
>>  If you are interested in documenting the services, then awesome! Do you
>> want to be a GSoC 2013 student,
>> and are you interested in this project?
>>
>>  Cheers,
>> Chris
>>
>>
>>   From: kiran chitturi 
>> Reply-To: "dev@nutch.apache.org" 
>> Date: Friday, March 22, 2013 9:19 PM
>> To: "dev@nutch.apache.org" 
>> Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp
>>
>>   Hi Chris,
>>
>>  I was just thinking about that this evening. First, to start with this
>> I want to do well documentation of the Nutch REST API.
>>
>>  What is the status of Rest API ? Does it need any fixes and working
>> examples ?
>>
>>  Hopefully my start would be helpful and it be soon.
>>
>>  Thanks for opening up the issue.
>>
>>  Regards,
>> kIran.
>>
>>
>>
>>
>>
>>
>> On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J) <
>> chris.a.mattm...@jpl.nasa.gov> wrote:
>>
>>> Hey Guys,
>>>
>>> I posted:
>>>
>>> https://issues.apache.org/jira/browse/NUTCH-841
>>>
>>>
>>> As a potential GSOC 2013 summer project. I'm willing to mentor it, since
>>> I
>>> love
>>> Wicket, and I'm willing to maintain the result as a Nutch committer.
>>>
>>> If NUTCH-841 doesn't get selected, I'll start implementing it this summer
>>> if no
>>> one beats me to it.
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>
>>
>>  --
>> Kiran Chitturi
>>
>>   <http://www.linkedin.com/in/kiranchitturi>
>>
>>
>>
>
>
>  --
> Kiran Chitturi
>
>   <http://www.linkedin.com/in/kiranchitturi>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-23 Thread kiran chitturi
I am very much interested in the Apache Wicket project but I wouldn't be
able to be a student since i am finishing my graduation and looking for
full-time jobs. I have discussed with Lewis previously about this, and it
wouldn't be ideal for me to be a GSoc 2013 student as I can't devote my
full-time work on this.

However, I will be very happy to work on this in my free time. This is
something I am interested in for long time and I would try to contribute in
anyway possible.

Thank you,
Kiran.






On Sat, Mar 23, 2013 at 11:23 AM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

>  Hi Kiran,
>
>  Great, yes the REST services need work for sure. They haven't been
> worked on in a while.
>
>  I'm privy to Apache CXF, but I haven't done anything with it, and
> Andrzej did an awesome job
> using Restlet, so we've got Reslet for now.
>
>  If you are interested in documenting the services, then awesome! Do you
> want to be a GSoC 2013 student,
> and are you interested in this project?
>
>  Cheers,
> Chris
>
>
>   From: kiran chitturi 
> Reply-To: "dev@nutch.apache.org" 
> Date: Friday, March 22, 2013 9:19 PM
> To: "dev@nutch.apache.org" 
> Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp
>
>   Hi Chris,
>
>  I was just thinking about that this evening. First, to start with this I
> want to do well documentation of the Nutch REST API.
>
>  What is the status of Rest API ? Does it need any fixes and working
> examples ?
>
>  Hopefully my start would be helpful and it be soon.
>
>  Thanks for opening up the issue.
>
>  Regards,
> kIran.
>
>
>
>
>
>
> On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Hey Guys,
>>
>> I posted:
>>
>> https://issues.apache.org/jira/browse/NUTCH-841
>>
>>
>> As a potential GSOC 2013 summer project. I'm willing to mentor it, since I
>> love
>> Wicket, and I'm willing to maintain the result as a Nutch committer.
>>
>> If NUTCH-841 doesn't get selected, I'll start implementing it this summer
>> if no
>> one beats me to it.
>>
>> Cheers,
>> Chris
>>
>>
>
>
>  --
> Kiran Chitturi
>
>   <http://www.linkedin.com/in/kiranchitturi>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-22 Thread kiran chitturi
Hi Chris,

I was just thinking about that this evening. First, to start with this I
want to do well documentation of the Nutch REST API.

What is the status of Rest API ? Does it need any fixes and working
examples ?

Hopefully my start would be helpful and it be soon.

Thanks for opening up the issue.

Regards,
kIran.






On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hey Guys,
>
> I posted:
>
> https://issues.apache.org/jira/browse/NUTCH-841
>
>
> As a potential GSOC 2013 summer project. I'm willing to mentor it, since I
> love
> Wicket, and I'm willing to maintain the result as a Nutch committer.
>
> If NUTCH-841 doesn't get selected, I'll start implementing it this summer
> if no
> one beats me to it.
>
> Cheers,
> Chris
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: wiki update on Nutch Tutorial with crawl script

2013-03-21 Thread kiran chitturi
I have kept the crawl command but notified the users that it is deprecated.
I have added the crawl script in section 3.3 [0]

The wiki looks a bit updated and I hope all the basic questions by Nutch
Users can be redirected to wiki pointers.

*Few things still need to be updated:*
1. How to choose Nutch parameters for optimal configuration
2. A full tutorial for Nutch 2 with Hbase. Notify users of current bugs
with MySql and others stores.

Please add here if someone feels any section is updated

[0] - http://wiki.apache.org/nutch/NutchTutorial


On Thu, Mar 21, 2013 at 3:43 AM, kiran chitturi
wrote:

> Hi Feng, I have created a wiki page for (bin/crawl) thinking about this.
> Please feel free to edit any of the wiki's and update the documentation.
>
>
>
> [0] http://wiki.apache.org/nutch/bin/crawl
>
>
> On Thu, Mar 21, 2013 at 1:18 AM, feng lu  wrote:
>
>> <<
>> Second, for a user running Nutch on a single node or local mode the
>> default size of topN (50,000) makes the crawl run for a long time. Can we
>> make the topN parameter configurable through the script ?
>> >>
>>
>> May be i agree with Tejas that let user to modify the parameters below to
>> their needs. But we can add some detail information into the bin/crawl
>> wiki to tell users how to modify these parameters and what is the meaning
>> of these parameters.
>>
>>
>> On Thu, Mar 21, 2013 at 3:01 AM, kiran chitturi <
>> chitturikira...@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> I want to update the Nutch tutorials in the wiki with the crawl script
>>> (./bin/crawl). The presence of the crawl command in the tutorials makes
>>> users use these crawl command run in to issues which makes us suggest them
>>> use the crawl script instead of the command.
>>>
>>> Can we make it uniform all over wiki that crawl command is deprecated
>>> and it is recommended to use crawl script ?
>>>
>>> Second, for a user running Nutch on a single node or local mode the
>>> default size of topN (50,000) makes the crawl run for a long time. Can we
>>> make the topN parameter configurable through the script ?
>>>
>>> Thank you,
>>>
>>> --
>>> Kiran Chitturi
>>>
>>> <http://www.linkedin.com/in/kiranchitturi>
>>>
>>>
>>>
>>
>>
>> --
>> Don't Grow Old, Grow Up... :-)
>>
>
>
>
> --
> Kiran Chitturi
>
> <http://www.linkedin.com/in/kiranchitturi>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: wiki update on Nutch Tutorial with crawl script

2013-03-21 Thread kiran chitturi
Hi Feng, I have created a wiki page for (bin/crawl) thinking about this.
Please feel free to edit any of the wiki's and update the documentation.



[0] http://wiki.apache.org/nutch/bin/crawl


On Thu, Mar 21, 2013 at 1:18 AM, feng lu  wrote:

> <<
> Second, for a user running Nutch on a single node or local mode the
> default size of topN (50,000) makes the crawl run for a long time. Can we
> make the topN parameter configurable through the script ?
> >>
>
> May be i agree with Tejas that let user to modify the parameters below to
> their needs. But we can add some detail information into the bin/crawl
> wiki to tell users how to modify these parameters and what is the meaning
> of these parameters.
>
>
> On Thu, Mar 21, 2013 at 3:01 AM, kiran chitturi  > wrote:
>
>> Hi!
>>
>> I want to update the Nutch tutorials in the wiki with the crawl script
>> (./bin/crawl). The presence of the crawl command in the tutorials makes
>> users use these crawl command run in to issues which makes us suggest them
>> use the crawl script instead of the command.
>>
>> Can we make it uniform all over wiki that crawl command is deprecated and
>> it is recommended to use crawl script ?
>>
>> Second, for a user running Nutch on a single node or local mode the
>> default size of topN (50,000) makes the crawl run for a long time. Can we
>> make the topN parameter configurable through the script ?
>>
>> Thank you,
>>
>> --
>> Kiran Chitturi
>>
>> <http://www.linkedin.com/in/kiranchitturi>
>>
>>
>>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: wiki update for nutch commands

2013-03-20 Thread kiran chitturi
This is one of way of updating the differences for commandLine options
between 1.x and 2.x. Please check [0]

We can maintain the difference between the 2 versions like this but my
question is to whether the top paragraph that details the fetcher, Is it
the same way it works for 1.x and 2.x?

If it is very different from 1.x and 2.x, we would be better off maintaing
separate pages.

[0] - http://wiki.apache.org/nutch/bin/nutch%20fetch#preview


On Wed, Mar 20, 2013 at 3:09 PM, kiran chitturi
wrote:

> Hi Tejas,
>
> +1 for keeping the pages separate for 1.x and 2.x.
>
> I was fixing only few versions issues and renaming page links until now in
> wiki. You brought up a good point, that I have been intending to ask the
> Nutch devs.
>
> I feel the 2.x should have its own page with its own set of links
> regarding the architecture and everything. The home wiki page looks like a
> mix of 1.x and 2.x and it is easy to get confused with parameters and
> options in 1.x and 2.x.
>
> There are significant differences in the other commands too in 1.x and 2.x
> and I think we need to take up the task of remaking the whole command line
> argument page, the table.
>
> The command line arguments page is quite important for users as you have
> mentioned and I am up for keeping the pages separate for 1.x and 2.x.
>
>
>  On Wed, Mar 20, 2013 at 2:52 PM, Tejas Patil wrote:
>
>> Hi Kiran,
>>
>> The command line arguments to the fetch command shown on wiki page [2]
>> doesn't seem to be in sync with what is implemented in [0] and [1].
>>
>> For 1.x [0]
>> Usage: Fetcher  [-threads n]
>>
>> For 2.x [1]
>> Usage: FetcherJob ( | -all) [-crawlId ] [-threads N]
>> [-resume] [-numTasks N]
>>
>> On wiki page [2]:
>> Usage: bin/nutch fetch  [-threads n] [-noParsing]
>>
>> I strongly feel that these params must be mentioned in the wiki page.
>> Also, people have been pondering over @user for the differences wrt 1.x and
>> 2.x. As the options are different for both these versions, providing usage
>> for both these versions would make things easy for users. What say ?
>>
>> There were lot of updates for other wiki pages too which might also need
>> similar change.
>>
>> [0]
>> http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java?view=markup
>> [1]
>> http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/fetcher/FetcherJob.java?view=markup
>> [2] http://wiki.apache.org/nutch/bin/nutch%20fetch
>>
>> Thanks,
>> Tejas
>>
>
>
>
> --
> Kiran Chitturi
>
> <http://www.linkedin.com/in/kiranchitturi>
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: wiki update for nutch commands

2013-03-20 Thread kiran chitturi
Hi Tejas,

+1 for keeping the pages separate for 1.x and 2.x.

I was fixing only few versions issues and renaming page links until now in
wiki. You brought up a good point, that I have been intending to ask the
Nutch devs.

I feel the 2.x should have its own page with its own set of links regarding
the architecture and everything. The home wiki page looks like a mix of 1.x
and 2.x and it is easy to get confused with parameters and options in 1.x
and 2.x.

There are significant differences in the other commands too in 1.x and 2.x
and I think we need to take up the task of remaking the whole command line
argument page, the table.

The command line arguments page is quite important for users as you have
mentioned and I am up for keeping the pages separate for 1.x and 2.x.


On Wed, Mar 20, 2013 at 2:52 PM, Tejas Patil wrote:

> Hi Kiran,
>
> The command line arguments to the fetch command shown on wiki page [2]
> doesn't seem to be in sync with what is implemented in [0] and [1].
>
> For 1.x [0]
> Usage: Fetcher  [-threads n]
>
> For 2.x [1]
> Usage: FetcherJob ( | -all) [-crawlId ] [-threads N]
> [-resume] [-numTasks N]
>
> On wiki page [2]:
> Usage: bin/nutch fetch  [-threads n] [-noParsing]
>
> I strongly feel that these params must be mentioned in the wiki page.
> Also, people have been pondering over @user for the differences wrt 1.x and
> 2.x. As the options are different for both these versions, providing usage
> for both these versions would make things easy for users. What say ?
>
> There were lot of updates for other wiki pages too which might also need
> similar change.
>
> [0]
> http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java?view=markup
> [1]
> http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/fetcher/FetcherJob.java?view=markup
> [2] http://wiki.apache.org/nutch/bin/nutch%20fetch
>
> Thanks,
> Tejas
>



-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Nutch : Wiki Section updates

2013-03-13 Thread kiran chitturi
Hi!

I have noticed that there are certain sections of Nutch wiki that are
not up to date.

I am planning to update these pages with some pointers to the mailing list
discussion which give valuable information and also JIRA's.

First thing,  I have created an account in wiki but i am not able to see
'edit' button for any page. Can someone point me in the right direction ?

Second, Does anyone have suggestions on improving/updating certain pages ?
Is anyone willing to update the 'Tasklist' and 'Features' section in the
Wiki ?

Third, Do we have any updates on the public servers running Nutch ? We have
dead links here [0] and this needs a major update.

I am willing to start this and update in my free time. It would be great if
someone can proofread to check that i did not write something incorrect.

I am new to this. Please let me know what are things I need to be know
before starting the work.

Please let me know your suggestions.

[0] - http://wiki.apache.org/nutch/PublicServers

Thank you
-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


Re: [ANNOUNCEMENT] Welcome Kiran Chitturi as Apache Nutch PMC and Committer

2013-03-11 Thread kiran chitturi
Thank you Julien.

I made me first commit and added myself to the list of committers. It all
went smooth :)




On Mon, Mar 11, 2013 at 5:26 AM, Julien Nioche <
lists.digitalpeb...@gmail.com> wrote:

> Hi Kiran,
>
> Your account has been created and added to the Nutch group so you should
> be able to commit. Your first task is to add yourself to the list of
> committers on the Nutch website. the instructions on how to do this should
> be somewhere on the Wiki.
>
> Thanks
>
> Julien
>
> On 10 March 2013 01:27, kiran chitturi  wrote:
>
>> Thanks a lot guys for inviting me and for the wishes.
>>
>> I am a graduate student in Virginia Tech University doing my Masters in
>> Computer Science. I have been using Apache Nutch for the last one year as
>> part of my assistantship with our University Library.
>>
>> The Digital Libraries and Archives division of our libraries was using
>> Google Mini Search Engine for their website that hosts 600k files but
>> Google Mini was no longer supported and we want to try building Search
>> Engine using Open Source technologies.
>>
>> That is when i started my journey with Nutch and we were able to
>> successfully achieve our Goals using Nutch and Solr. The library was
>> pleased with the project and they are more interested now to work with Open
>> Source software whenever possible.
>>
>> I liked working with Nutch community and it has been a great learning
>> experience for me. I would like to learn and contribute back even after my
>> graduation.
>>
>> Few things that I have in my mind right now other than committing patches
>> are to improve our documentation (Wiki), helping users to my best and also
>> to start the Apache Wicket UI work soon for 2.x in Nutch.
>>
>> Regards,
>> Kiran.
>>
>>
>>
>>
>> On Sat, Mar 9, 2013 at 4:06 PM, Tejas Patil wrote:
>>
>>> Welcome aboard Kiran :)
>>>
>>>
>>> On Sat, Mar 9, 2013 at 12:56 PM, lewis john mcgibbney <
>>> lewi...@apache.org> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Over the last while we have been aware of Kiran's ongoing contribution
>>>> to the Nutch community.
>>>> It is with great pleasure that we invite Kiran to join the Nutch PMC
>>>> and also take up Committer role.
>>>> @Kiran, please feel free to say a bit about yourself and introduce what
>>>> brought you to Apache Nutch.
>>>> Have a great weekend.
>>>> Best
>>>> Lewis
>>>
>>>
>>>
>>
>>
>> --
>> Kiran Chitturi
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>



-- 
Kiran Chitturi


Re: [ANNOUNCEMENT] Welcome Kiran Chitturi as Apache Nutch PMC and Committer

2013-03-09 Thread kiran chitturi
Thanks a lot guys for inviting me and for the wishes.

I am a graduate student in Virginia Tech University doing my Masters in
Computer Science. I have been using Apache Nutch for the last one year as
part of my assistantship with our University Library.

The Digital Libraries and Archives division of our libraries was using
Google Mini Search Engine for their website that hosts 600k files but
Google Mini was no longer supported and we want to try building Search
Engine using Open Source technologies.

That is when i started my journey with Nutch and we were able to
successfully achieve our Goals using Nutch and Solr. The library was
pleased with the project and they are more interested now to work with Open
Source software whenever possible.

I liked working with Nutch community and it has been a great learning
experience for me. I would like to learn and contribute back even after my
graduation.

Few things that I have in my mind right now other than committing patches
are to improve our documentation (Wiki), helping users to my best and also
to start the Apache Wicket UI work soon for 2.x in Nutch.

Regards,
Kiran.




On Sat, Mar 9, 2013 at 4:06 PM, Tejas Patil wrote:

> Welcome aboard Kiran :)
>
>
> On Sat, Mar 9, 2013 at 12:56 PM, lewis john mcgibbney 
> wrote:
>
>> Hi All,
>>
>> Over the last while we have been aware of Kiran's ongoing contribution to
>> the Nutch community.
>> It is with great pleasure that we invite Kiran to join the Nutch PMC and
>> also take up Committer role.
>> @Kiran, please feel free to say a bit about yourself and introduce what
>> brought you to Apache Nutch.
>> Have a great weekend.
>> Best
>> Lewis
>
>
>


-- 
Kiran Chitturi


Re: [DISCUSS] Google Summer of Code

2013-03-04 Thread kiran chitturi
Hi Lewis,

I am interested in Wicket webapp for Nutch. Sadly, I never got to work on
that in the last three months.

I am planning on graduation in May 2013. Can i still be a student for GSoC
project ? I do not know much about GSoC since i never participated before.

Thanks,
Kiran


On Mon, Mar 4, 2013 at 3:23 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi All,
>
> I thought I would ask the question as to who (if anyone) is intending on
> engaging as a mentor (or student if you are one) within this years GSoC
> project.
> There are plenty of projects we could do within Nutch.
> Obvious ones that come to mind are
> - Wicket webapp for Nutch 2.x
> - Integration of Giraph with Nutch
> We already have one proposal which I would consider mentoring over on
> Apache Gora, but I will certainly not back down from any proposals in Nutch.
> Would the Giraph project be welcomed here? If so I can head over to 
> user@Giraph in an attempt to attract interest.
> Of course this is a discussion based on what folks want to do and the list
> above should be added to.
> Thanks for now
> Lewis
>
> --
> *Lewis*
>



-- 
Kiran Chitturi


Re: Eclipse Error

2013-02-26 Thread kiran chitturi
;
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [*ivy:resolve*]   unknown resolver main
>>
>> [ivy:resolve] 
>>
>> [*ivy:resolve*] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS***
>> *
>>
>>   [*taskdef*] Could not load definitions from resource
>> org/sonar/ant/antlib.xml. It could not be found.
>>
>> *copy-libs*:
>>
>> *compile-core*:
>>
>> [*javac*] C:\Users\Danilo\workspace\Nutch\build.xml:96: warning:
>> 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set
>> to false for repeatable builds
>>
>>  
>>
>> BUILD FAILED
>>
>> *C:\Users\Danilo\workspace\Nutch\build.xml:96:
>> java.lang.UnsupportedClassVersionError: com/sun/tools/javac/Main :
>> Unsupported major.minor version 51.0*
>>
>>  
>>
>> Total time: 8 seconds
>>
>>  
>>
>>  
>>
>> *I’m new with Nutch dev and a little bit nuts with Eclipse. Can somebody
>> help me?
>>
>> Thanks a lot.
>> Danilo Fernandes*
>>
>>
>>
>> 
>>
>> ** **
>>
>> --
>> Don't Grow Old, Grow Up... :-) 
>>
>
>


-- 
Kiran Chitturi


Re: [jira] [Commented] (NUTCH-1511) Metadata in MYSQL updated with 'garbage'

2013-01-01 Thread kiran chitturi
Hi Jaap,

It has worked previously for me with mysql. I am using Hbase now and
everything is going quite well too.

I am gonna try working with mysql to solve this issue,  i need little more
details.

Did you try to crawl nutch website or anything more ?

Did you define index.parse.md in the nutch-site.xml and also the fields in
the schema ?

Did you restart Solr once you created the schema ? Which nutch version are
you using ?

Did you check the Solr logs ?

Thank you,
Kiran.

On Tue, Jan 1, 2013 at 1:22 PM, J. Gobel (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/NUTCH-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541896#comment-13541896]
>
> J. Gobel commented on NUTCH-1511:
> -
>
> Hi Kiran,
>
> I never got it to work in Solr4. No matter what I tried, the fields
> metadata never shows up in Solr4. Do you index using HBase or Mysql? If
> times allows, please try it with MYSQL.
>
> Just add the table below in MYSQL. Or alternatively for a more thorough
> explanation check the guide on http://nlp.solutions.asia/?p=180
>
> CREATE TABLE `webpage` (
> `id` varchar(767) NOT NULL,
> `headers` blob,
> `text` mediumtext DEFAULT NULL,
> `status` int(11) DEFAULT NULL,
> `markers` blob,
> `parseStatus` blob,
> `modifiedTime` bigint(20) DEFAULT NULL,
> `score` float DEFAULT NULL,
> `typ` varchar(32) CHARACTER SET latin1 DEFAULT NULL,
> `baseUrl` varchar(767) DEFAULT NULL,
> `content` longblob,
> `title` varchar(2048) DEFAULT NULL,
> `reprUrl` varchar(767) DEFAULT NULL,
> `fetchInterval` int(11) DEFAULT NULL,
> `prevFetchTime` bigint(20) DEFAULT NULL,
> `inlinks` mediumblob,
> `prevSignature` blob,
> `outlinks` mediumblob,
> `fetchTime` bigint(20) DEFAULT NULL,
> `retriesSinceFetch` int(11) DEFAULT NULL,
> `protocolStatus` blob,
> `signature` blob,
> `metadata` blob,
> PRIMARY KEY (`id`)
> ) ENGINE=InnoDB
> ROW_FORMAT=COMPRESSED
> DEFAULT CHARSET=utf8mb4;
>
> rgds,
>
> Jaap
>
> > Metadata in MYSQL updated with 'garbage'
> > 
> >
> > Key: NUTCH-1511
> > URL: https://issues.apache.org/jira/browse/NUTCH-1511
> > Project: Nutch
> >  Issue Type: Bug
> >  Components: fetcher
> >Affects Versions: 2.1
> > Environment: Ubuntu 12.04
> >Reporter: J. Gobel
> >  Labels: metadata, mysql, nutch
> >
> > After applying patch for Metadata parser (NUTCH-1478) I notice that the
> metadata field just before the crawl ends is populated with the correct
> information. However when the crawl is completely finished the metadata
> field is populated with 'garbage' _csh_ �
> > last few lines of my logfile:
> > p.s. I use : bin/nutch crawl urls -depth 1 -topN 5 ..
> > 013-01-01 11:55:53,177 INFO crawl.SignatureFactory - Using Signature
> impl: org.apache.nutch.crawl.MD5Signature
> > 2013-01-01 11:55:53,903 INFO parse.ParserJob - Parsing
> http://nutch.apache.com/
> > 2013-01-01 11:55:54,589 WARN parse.MetaTagsParser - Found meta tag :
> robots index, follow
> > 2013-01-01 11:55:54,589 WARN parse.MetaTagsParser - Found meta tag :
> keywords .com.nl .net.nl com.nl net.nl sld, tld, domain, registry, domain
> registry, nic, extention, icann
> > 2013-01-01 11:55:54,590 WARN parse.MetaTagsParser - Found meta tag :
> description Registreer nu uw .com.nl of .net.nl extentie.
> > 2013-01-01 11:55:54,619 INFO regex.RegexURLNormalizer - can't find rules
> for scope 'outlink', using default
> > 2013-01-01 11:55:55,240 WARN mapred.FileOutputCommitter - Output path is
> null in cleanup
> > 2013-01-01 11:55:56,652 INFO mapreduce.GoraRecordReader -
> gora.buffer.read.limit = 1
> > 2013-01-01 11:55:59,574 INFO mapreduce.GoraRecordWriter -
> gora.buffer.write.limit = 1
> > 2013-01-01 11:55:59,575 INFO crawl.FetchScheduleFactory - Using
> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> > 2013-01-01 11:55:59,575 INFO crawl.AbstractFetchSchedule -
> defaultInterval=2592000
> > 2013-01-01 11:55:59,575 INFO crawl.AbstractFetchSchedule -
> maxInterval=7776000
> > 2013-01-01 11:56:02,554 WARN mapred.FileOutputCommitter - Output path is
> null in cleanup
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>



-- 
Kiran Chitturi


Re: patches to parse-metatag plugin to save mutliValues

2012-10-10 Thread kiran chitturi
Thank you for the help. I am almost done with patching up parse-metatags
plugin I made another post about the plugin and multipleValues in metadata.

I will also check other plugins and see if they need any fixes. The patch
you made might be enough. I will check it out again in eclipse.

Regards,
Kiran.

On Wed, Oct 10, 2012 at 6:57 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Kiran,
>
> I made the patch to remove these classes you highlight. The patch
> passes tests so I will commit to 2.x head.
>
> Thank you for your contrib
>
> Lewis
>
> On Wed, Oct 10, 2012 at 3:01 PM, Lewis John Mcgibbney
>  wrote:
> > Hi Kiran,
> >
> > On Wed, Oct 10, 2012 at 12:53 PM, kiran chitturi
> >  wrote:
> >
> >> This is the problem i observed with few of the plugins as i have
> explained
> >> in my last email. They use code which is compatible with 1.5 but not
> with
> >> 2.0. Right now, i am almost done with porting parse-metatags and
> >> index-metadata to nutch 2.x. I can look in to other plugins after this
> to
> >> fix the code.
> >
> > Nice one, thank you for keeping us updated with this.
> >
> > Lewis
>
>
>
> --
> Lewis
>



-- 
Kiran Chitturi


Re: patches to parse-metatag plugin to save mutliValues

2012-10-10 Thread kiran chitturi
Hi Lewis,

This is the problem i observed with few of the plugins as i have explained
in my last email. They use code which is compatible with 1.5 but not with
2.0. Right now, i am almost done with porting parse-metatags and
index-metadata to nutch 2.x. I can look in to other plugins after this to
fix the code.

Regards,
Kiran.

On Wed, Oct 10, 2012 at 6:42 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Kiran,
>
> There is an issue open in Jira for this [0], it would be really
> appreciated if you could add your observations/discoveries to it and
> we can get it logged and hopefully fixed.
>
> Thanks again
>
> Lewis
>
> [0] https://issues.apache.org/jira/browse/NUTCH-874
>
> On Thu, Oct 4, 2012 at 7:20 PM, kiran chitturi
>  wrote:
> > Hi Lewis,
> >
> > I am checking out the 2.x branch in eclipse and i came across some
> errors in
> > the plugins. The errors show some classes that are used in plugins and
> are
> > not present in 2.x
> >
> > SWFParser.java
> > org.apache.nutch.util.LogUtil
> >
> > ZipParser.java, ZipTextExtractor.java, TextExtParser.java,
> > FeedIndexingFilter.java, FeedParser.java, TestFeedParser.java,
> > TestZipParser.java, ExtParser.java
> > (
> http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/parse/
> )
> > import org.apache.nutch.parse.ParseData;
> > import org.apache.nutch.parse.ParseImpl;
> > import org.apache.nutch.parse.ParseResult;
> > import org.apache.nutch.parse.ParseStatus;
> > import org.apache.nutch.parse.ParseText;
> >
> > TestExtParser.java, FeedIndexingFilter.java, TestFeedParser.java,
> > TestZipParser.java, TestSWFParser.java
> > (
> http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/crawl/
> )
> > import org.apache.nutch.crawl.CrawlDatum;
> > import org.apache.nutch.crawl.Inlinks;
> >
> > TikaParser.java
> > import org.apache.tika.parser.html.HtmlMapper;
> >
> > The classes i listed use some or all of the classes below.
> >
> > Am i wrong or are there any plugins that are present in 2.x that are
> using
> > the old 1.x series classes ? If i am true, then this looks like a issue
> > among plugins to be compatible with 2.x series.
> >
> > Many Thanks,
> > Kiran.
> >
> > On Thu, Oct 4, 2012 at 12:09 PM, Lewis John Mcgibbney
> >  wrote:
> >>
> >> Hi Kiran,
> >>
> >> On Thu, Oct 4, 2012 at 3:25 PM, kiran chitturi
> >>  wrote:
> >> > Hi,
> >>
> >> > Thank you for your inputs. I am gonna try to start working on the
> plugin
> >> > to
> >> > make it work for 2.x branches.
> >>
> >> Great
> >>
> >> >
> >> > I have noticed that the current released version is 2.1 and i am
> >> > wondering
> >> > which version should i start working on?  2.0 or 2.1 ?
> >>
> >> If you could begin work on the source available at the following link
> >> that would be excellent.
> >>
> >> http://svn.apache.org/repos/asf/nutch/branches/2.x/
> >>
> >> thank you
> >>
> >> Lewis
> >
> >
> >
> >
> > --
> > Kiran Chitturi
> >
>
>
>
> --
> Lewis
>



-- 
Kiran Chitturi


Re: patches to parse-metatag plugin to save mutliValues

2012-10-04 Thread kiran chitturi
Hi Lewis,

I am checking out the 2.x branch in eclipse and i came across some errors
in the plugins. The errors show some classes that are used in plugins and
are not present in 2.x
*
*
*SWFParser.java*
org.apache.nutch.util.LogUtil

*ZipParser.java, ZipTextExtractor.java, TextExtParser.java,
**FeedIndexingFilter.java,
FeedParser.java, TestFeedParser.java, TestZipParser.java, ExtParser.java *(
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/parse/
)
import org.apache.nutch.parse.ParseData;
import org.apache.nutch.parse.ParseImpl;
import org.apache.nutch.parse.ParseResult;
import org.apache.nutch.parse.ParseStatus;
import org.apache.nutch.parse.ParseText;

*TestExtParser.java, FeedIndexingFilter.java, TestFeedParser.java,
TestZipParser.java, TestSWFParser.java (
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/crawl/
)*
import org.apache.nutch.crawl.CrawlDatum;
import org.apache.nutch.crawl.Inlinks;

*TikaParser.java*
import org.apache.tika.parser.html.HtmlMapper;

The classes i listed use some or all of the classes below.

Am i wrong or are there any plugins that are present in 2.x that are using
the old 1.x series classes ? If i am true, then this looks like a issue
among plugins to be compatible with 2.x series.

Many Thanks,
Kiran.

On Thu, Oct 4, 2012 at 12:09 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Kiran,
>
> On Thu, Oct 4, 2012 at 3:25 PM, kiran chitturi
>  wrote:
> > Hi,
>
> > Thank you for your inputs. I am gonna try to start working on the plugin
> to
> > make it work for 2.x branches.
>
> Great
>
> >
> > I have noticed that the current released version is 2.1 and i am
> wondering
> > which version should i start working on?  2.0 or 2.1 ?
>
> If you could begin work on the source available at the following link
> that would be excellent.
>
> http://svn.apache.org/repos/asf/nutch/branches/2.x/
>
> thank you
>
> Lewis
>



-- 
Kiran Chitturi