date:20151001

Re: Request for inclusion in the Nutch email list

2015-10-01 Thread Sujen Shah

Hi Pramod,
To subscribe to the list you need to send a mail to
dev-subscr...@nutch.apache.org.

For more instructions have a look at -
http://nutch.apache.org/mailing_lists.html

Cheers,
Sujen Shah

On Tue, Sep 29, 2015 at 10:22 PM, Pramod Nagarajarao 
wrote:

> Hello Team,
>
> I'm Pramod and am a graduate student studying Computer Science at USC. I
> want to be a part of Nutch mailing lists and I request you to add me on it.
> Thanks.
>
> Regards,
> Pramod Nagarajarao
>

Request for inclusion in the Nutch email list

2015-10-01 Thread Pramod Nagarajarao

Hello Team,

I'm Pramod and am a graduate student studying Computer Science at USC. I
want to be a part of Nutch mailing lists and I request you to add me on it.
Thanks.

Regards,
Pramod Nagarajarao

Re: Team 18: Selenium handler question

2015-10-01 Thread Joyce, Michael J (398M)

Hi folks,

The handler interface requires you to implement two functions:
Void processDriver(..)
Boolean shouldProcessURL

The processDriver function can do any manipulation of the web driver that
you’d like. The content will be pulled out of the body tag of the document
when this function returns. It is given to your handler preloaded with the
URL for the current page being fetched. You should be able to take that
and do the manipulations necessary.

shouldProcessURL is used to check whether the handler should be loaded for
a particular URL. If you want the handler to run over every URL then just
have it return true. If you want to have it run on only certain URLs then
you can implement that logic in there.

As for documentation, the Selenium docs [1] are pretty good. If you need
to handle authentication that can be a pain. I don’t have too many
recommendations there. You’ll have to just search around and figure out
best recommendations. Stackoverflow is always good =D [2]

[1] http://www.seleniumhq.org/docs/03_webdriver.jsp
[2] 
https://stackoverflow.com/questions/24304752/how-to-handle-authentication-p
opup-with-selenium-webdriver-using-java

Hope that helps
--
Michael J. Joyce
Scientific Applications Software Engineer
Instrument Software and Science Data Systems
NASA Jet Propulsion Laboratory
California Institute of Technology
4800 Oak Grove Drive
Pasadena, California 91109
Mail Stop: 158-242
Cel: (626) 788-7511
Tel: (818) 354-7550
Fax: (818) 393-1370





On 10/1/15, 6:50 AM, "Christian Alan Mattmann"  wrote:

>Hi Team 18,
>
>This is great and you are headed in the right direction.
>
>MikeJ - can you suggest a sample reference to take a look
>at for the team?
>
>Cheers,
>Chris
>
>+
>Chris Mattmann, Ph.D.
>Adjunct Associate Professor, Computer Science Department
>University of Southern California
>Los Angeles, CA 90089 USA
>Email: mattm...@usc.edu
>WWW: http://sunset.usc.edu/
>+
>
>
>
>
>-Original Message-
>From: Mithun Maragiri 
>Date: Thursday, October 1, 2015 at 12:21 AM
>To: jpluser 
>Cc: "ramac...@usc.edu" , Charan Shampur
>, Sharan Kadagad 
>Subject: Team 18: Selenium handler question
>
>>Hello Professor,
>>
>>
>>We want some help in writing selenium handler code.
>>We crawled the URLs for 30 rounds and we ended up with a few URLs which
>>were not fetched.
>>We wrote a python script to filter these URLs whose status code is not
>>OK/SUCCESS. 
>>Once we had these URLs we manually checked any one of the URLs as to why
>>it is not fetched.
>>We discovered that the website was behind the form and needed
>>authentication to access its web pages.
>>All the fetch requests made by crawler are http GET requests but for
>>these unfetched URLs we need to make POST request. We are thinking of
>>this approach
>>
>>
>>Approach:
>>> Write a script which filters all the URLs whose status code is not
>>>success.
>>> create a webDriver for each of these URLs in the DefaultHandler()
>>> manually sign up to each of these unfetched URLs with the same login
>>>credentials: Example: login= Team18; Password= team18Password
>>> once driver is created, create a POST request with the URL and append
>>>our login credentials and then make an AJAX call
>>> after studying materials online we realized that the purpose of
>>>selenium is exactly the same. But we cannot find any examples online
>>>where someone has written a handler. We are finding it hard to
>>>understand how to write the handler.
>>
>>
>>Can you please provide some example code writing the handler? we will use
>>that as the reference and try to write as per our need
>>
>>
>>Thanks,
>>
>>Team 18
>>
>>
>>
>>
>

Re: [VOTE] Release Apache Nutch 2.3.1

2015-10-01 Thread Drulea, Sherban

Hi Lewis,

-1 until I verify nutch actually crawls. Right now it finds 0 URLs with no
errors.

2.3.1 is an improvement over 2.3.0 which didn¹t work with Mongo at all.

Cheers,
Sherban



On 9/30/15, 5:35 PM, "Lewis John Mcgibbney" 
wrote:

>Hi Folks,
>Is anyone else able to test and run the release candidate for 2.3.1?
>It would be great to get a release if we can get the VOTE's and the RC is
>suitable.
>Thanks in advance.
>Best
>Lewis
>
>On Wed, Sep 23, 2015 at 9:46 PM, Lewis John Mcgibbney <
>lewis.mcgibb...@gmail.com> wrote:
>
>> Hi Folks,
>> It turns out the formatting for the original email below was terrible.
>> Sorry about that.
>> I've hopefully corrected formatting now. Please VOTE away!
>>
>> On Tue, Sep 22, 2015 at 6:45 PM, Lewis John Mcgibbney <
>> lewis.mcgibb...@gmail.com> wrote:
>>
>>> Hi user@ & dev@,
>>>
>>> This thread is a VOTE for releasing Apache Nutch 2.3.1 RC#1.
>>>
>>> We addressed 32 issues in all which can been see at the release report
>>> http://s.apache.org/nutch_2.3.1
>>>
>>> The release candidate comprises the following components.
>>>
>>> * A staging repository [0] containing various Maven artifacts
>>> * A branch-2.3.1 of the 2.x code [1]
>>> * The tagged source upon which we are VOTE'ing [2]
>>> * Finally, the release artifacts [3] which i would encourage you to
>>> verify for signatures and test.
>>>
>>> You should use the following KEYS [4] file to verify the signatures of
>>> all release artifacts.
>>>
>>> Please VOTE as follows
>>>
>>> [ ] +1 Push the release, I am happy :)
>>> [ ] +/-0 I am not bothered either way
>>> [ ] -1 I am not happy with this release candidate (please state why)
>>>
>>> Firstly thank you to everyone that contributed to Nutch. Secondly,
>>>thank
>>> you to everyone that VOTE's. It is appreciated.
>>>
>>> Thanks
>>> Lewis
>>> (on behalf of Nutch PMC)
>>>
>>> p.s. Here's my +1
>>>
>>> [0]
>>> https://repository.apache.org/content/repositories/orgapachenutch-1005
>>> [1] https://svn.apache.org/repos/asf/nutch/branches/branch-2.3.1
>>> [2] https://svn.apache.org/repos/asf/nutch/tags/release-2.3.1
>>> [3] https://dist.apache.org/repos/dist/dev/nutch/2.3.1
>>> [4] http://www.apache.org/dist/nutch/KEYS
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>>
>> --
>> *Lewis*
>>
>
>
>
>-- 
>*Lewis*


__

This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.

[Nutch Wiki] Update of "Nutch_1.X_RESTAPI" by SujenShah

2015-10-01 Thread Apache Wiki

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Nutch_1.X_RESTAPI" page has been changed by SujenShah:
https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI?action=diff&rev1=7&rev2=8

  = Nutch 1.x REST API v1.0 =
  
- <>
+ <>
  
  == Introduction ==
  This page documents the Nutch 1.X REST API v1.0. 
@@ -222, +222 @@

  __Response__ is created job's id.
  
  job-id-43243
+ 
+ 
+ === Seed List creation ===
+ 
+ The /seed/create endpoint enables the user to create a seedlist and return 
the temporary path of the file created. This path should be passed to the 
url_dir parameter of the INJECT job.
+ 
+ {{{
+ POST /seed/create
+ {
+ "name":"name-of-seedlist", 
+ "seedUrls":["http://www.example.com";,]
+ }
+ }}}
+ 
+ __Response__ is the file directory path
+ 
+ /var/folders/m9/hsls1krx12x968plt2brlhr0gn/T/1443721976324-0
  
  
  === Database ===

[jira] [Updated] (NUTCH-2123) Seed List REST API returns Text but headers indicate/require JSON

2015-10-01 Thread Sujen Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujen Shah updated NUTCH-2123:
--
Attachment: NUTCH-2123.patch

Patch for correcting the response headers.

> Seed List REST API returns Text but headers indicate/require JSON
> -
>
> Key: NUTCH-2123
> URL: https://issues.apache.org/jira/browse/NUTCH-2123
> Project: Nutch
>  Issue Type: Bug
>  Components: REST_api
>Affects Versions: 1.11
>Reporter: Aron Ahmadia
>Priority: Minor
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-2123.patch
>
>
> nutch.py: POST Endpoint: /seed/create
> nutch.py: POST Request data: {'seedUrls': [{'id': 0, 'url': 
> 'http://aron.ahmadia.net', 'seedList': None}], 'id': '12345', 'name': 'aron'}
> nutch.py: POST Request headers: {'Accept': 'application/json'}
> nutch.py: Response headers: {'content-type': 'application/json', 'server': 
> 'Jetty(8.1.15.v20140411)', 'content-length': '64', 'date': 'Fri, 25 Sep 2015 
> 05:49:09 GMT'}
> nutch.py: Response status: 200
> resp.headers
> {'content-type': 'application/json', 'server': 'Jetty(8.1.15.v20140411)', 
> 'content-length': '64', 'date': 'Fri, 25 Sep 2015 05:49:09 GMT'}
> resp.text
> '/var/folders/3s/pw2prx7n7vd22qqrlssmtn90gp/T/1443160149187-0'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2128) Refactor configuration end point

2015-10-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940064#comment-14940064
 ] 

ASF GitHub Bot commented on NUTCH-2128:
---

GitHub user sujen1412 opened a pull request:

https://github.com/apache/nutch/pull/69

fix for NUTCH-2128 Refactor config endpoint by Sujen shah



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujen1412/nutch NUTCH-2128

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/69.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #69


commit f9c80a4bba43c0a117804d4997303a5a974f4cc2
Author: Sujen Shah 
Date:   2015-09-29T19:07:13Z

Refactor config endpoint




> Refactor configuration end point
> 
>
> Key: NUTCH-2128
> URL: https://issues.apache.org/jira/browse/NUTCH-2128
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>Priority: Minor
> Fix For: 1.11
>
>
> To better define the endpoint to create a new configuration and add a new 
> endpoint to update a particular property value of a configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] nutch pull request: fix for NUTCH-2128 Refactor config endpoint by...

2015-10-01 Thread sujen1412

GitHub user sujen1412 opened a pull request:

https://github.com/apache/nutch/pull/69

fix for NUTCH-2128 Refactor config endpoint by Sujen shah



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujen1412/nutch NUTCH-2128

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/69.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #69


commit f9c80a4bba43c0a117804d4997303a5a974f4cc2
Author: Sujen Shah 
Date:   2015-09-29T19:07:13Z

Refactor config endpoint




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Assigned] (NUTCH-2128) Refactor configuration end point

2015-10-01 Thread Sujen Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujen Shah reassigned NUTCH-2128:
-

Assignee: Sujen Shah

> Refactor configuration end point
> 
>
> Key: NUTCH-2128
> URL: https://issues.apache.org/jira/browse/NUTCH-2128
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>Priority: Minor
> Fix For: 1.11
>
>
> To better define the endpoint to create a new configuration and add a new 
> endpoint to update a particular property value of a configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-10-01 Thread Michael Joyce (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940036#comment-14940036
 ] 

Michael Joyce commented on NUTCH-2108:
--

Good stuff [~asitang], glad to see the workaround proved fruitful and great 
example handlers!

> Add a function to the selenium interactive plugin interface to do multiple 
> manipulation of driver and then return the data
> --
>
> Key: NUTCH-2108
> URL: https://issues.apache.org/jira/browse/NUTCH-2108
> Project: Nutch
>  Issue Type: Sub-task
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Asitang Mishra
>  Labels: memex
>
> In the interactive selenium plugin we have to create handler classes for each 
> manipulation of a page. Sometimes we need to manipulate a page in many ways 
> and keep track of those manipulations. Like clicking on say each link in a 
> table and then refreshing to get the original page back as even one click can 
> make all other links go away. This can be done in a single loop. Which will 
> be a little too much work and way complicated using multiple handlers. So, I 
> am proposing a new function "String multiProcessDriver(WebDriver driver)"  
> that takes the driver and returns a concatenated String along with the 
> already present "void processDriver(WebDriver driver)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-10-01 Thread Asitang Mishra (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940025#comment-14940025
 ] 

Asitang Mishra commented on NUTCH-2108:
---

[~chrismattmann]

> Add a function to the selenium interactive plugin interface to do multiple 
> manipulation of driver and then return the data
> --
>
> Key: NUTCH-2108
> URL: https://issues.apache.org/jira/browse/NUTCH-2108
> Project: Nutch
>  Issue Type: Sub-task
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Asitang Mishra
>  Labels: memex
>
> In the interactive selenium plugin we have to create handler classes for each 
> manipulation of a page. Sometimes we need to manipulate a page in many ways 
> and keep track of those manipulations. Like clicking on say each link in a 
> table and then refreshing to get the original page back as even one click can 
> make all other links go away. This can be done in a single loop. Which will 
> be a little too much work and way complicated using multiple handlers. So, I 
> am proposing a new function "String multiProcessDriver(WebDriver driver)"  
> that takes the driver and returns a concatenated String along with the 
> already present "void processDriver(WebDriver driver)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-10-01 Thread Michael Joyce (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939939#comment-14939939
 ] 

Michael Joyce commented on NUTCH-2129:
--

Thanks Julien. I figured there would probably be a few thoughts on this, so I 
appreciate the feedback. I'll checkout the stuff you mentioned. Thanks for the 
ideas.

> Track Protocol Status in Crawl Datum
> 
>
> Key: NUTCH-2129
> URL: https://issues.apache.org/jira/browse/NUTCH-2129
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.3, 1.10
>Reporter: Michael Joyce
> Fix For: 2.4, 1.11
>
>
> It's become necessary on a few crawls that I run to get protocol status code 
> stats. After speaking with [~lewismc] it seemed that there might not be a 
> super convenient way of doing this as is, but it would be great to be able to 
> add the functionality necessary to pull this information out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Atomic update and optimistic concurrency in Solr

2015-10-01 Thread Roannel Fernández Hernández

Hi all:

I'm trying to make an atomic update or optimistic concurrency update in Solr. 
Anyone can help me?

Re: [MASSMAIL]Re: Fetch failed : java.lang.NullPointerException

2015-10-01 Thread Roannel Fern�ndez Hern�ndez

Hi Taichi: 

Which plugins you have enabled in nutch-site.xml? 

- Mensaje original -

De: "Taichi Ho"  
Para: dev@nutch.apache.org 
Enviados: Miércoles, 30 de Septiembre 2015 16:57:39 
Asunto: [MASSMAIL]Re: Fetch failed : java.lang.NullPointerException 

Hi, I have the same problem. The following is part of my log: 
http://pastebin.com/JjkJ1qe6 

It seems there is a read time out. But I paste the url in the browser and it 
works fine. 

Any ideas what could be causing this problem? 

Thanks. 

On Mon, Sep 28, 2015 at 7:46 AM Michael Joyce < jo...@apache.org > wrote: 



I don't see any null pointer exceptions coming up in your log. Do you have any 
more info or perhaps I'm missing something? 


-- Jimmy 

On Sun, Sep 27, 2015 at 3:04 PM, mithun < mithun626...@gmail.com > wrote: 



Hi All 

While crawling my seed list, I bumped into this Null Pointer Exception for few 
urls. What could be the problem. 

Please find paste.bin link of my hadoop.log file 

http://pastebin.com/SyyybtEx 


Thanks 
Mithun

[jira] [Created] (NUTCH-2131) Problem running nutch(crawl) with selenium

2015-10-01 Thread Ashwini (JIRA)

Ashwini created NUTCH-2131:
--

 Summary: Problem running nutch(crawl) with selenium
 Key: NUTCH-2131
 URL: https://issues.apache.org/jira/browse/NUTCH-2131
 Project: Nutch
  Issue Type: Bug
  Components: nutch server
Affects Versions: 1.10
 Environment: Ubuntu 12.04 32-bit 
Reporter: Ashwini


Hello,

I had a few issues with running selenium on Ubuntu.
I am trying to follow the tutorial that has a  description to install the nutch 
selenium plugin, 
https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-selenium
I was successfully able to include the plugin and build nutch again.

But during the crawling process,
I get the error "Unable to connect to host 127.0.0.1 on port 7055 after 45000 
ms" .
I tried to do research on this and I think that the Firefox version I am using 
and Selenium jars are incompatible.(I'm not sure if this is the issue)

So I downgraded my Firefox to version(41 downgraded to 33), but I am still 
getting the same error.
Is there a compatible version of firefox that I need to install or is there any 
other problem?

I am using selenium that is integrated in nutch-1.10 and nutch version is 1.10.

I have used 2.44.0 selenium standalone software with firefox version 33 and 
everything works fine. 


Please help me with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-10-01 Thread Julien Nioche (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939503#comment-14939503
 ] 

Julien Nioche commented on NUTCH-2129:
--

I'd rather keep it simple and not modify the CrawlDatum so much. Why don't you 
simply add a config element and optionally store the code in the metadata?
BTW we already have the option to store the response headers - see 
[https://github.com/apache/nutch/commit/23c7761aff830db82a1e44b84bf81265639c9a26].
 You could use that and simply reparse the first line to get the code.


> Track Protocol Status in Crawl Datum
> 
>
> Key: NUTCH-2129
> URL: https://issues.apache.org/jira/browse/NUTCH-2129
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.3, 1.10
>Reporter: Michael Joyce
> Fix For: 2.4, 1.11
>
>
> It's become necessary on a few crawls that I run to get protocol status code 
> stats. After speaking with [~lewismc] it seemed that there might not be a 
> super convenient way of doing this as is, but it would be great to be able to 
> add the functionality necessary to pull this information out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2086) Nutch 1.X Webui

2015-10-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939406#comment-14939406
 ] 

ASF GitHub Bot commented on NUTCH-2086:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/61


> Nutch 1.X Webui 
> 
>
> Key: NUTCH-2086
> URL: https://issues.apache.org/jira/browse/NUTCH-2086
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api, web gui
>Reporter: Sujen Shah
>Assignee: Chris A. Mattmann
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-2086.patch
>
>
> To port the Apache Wicket based webui in Nutch 2.X to 1.X



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] nutch pull request: Fix for NUTCH-2086 Contributed by Sujen Shah

2015-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/61


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Request for inclusion in the Nutch email list

Request for inclusion in the Nutch email list

Re: Team 18: Selenium handler question

Re: [VOTE] Release Apache Nutch 2.3.1

[Nutch Wiki] Update of "Nutch_1.X_RESTAPI" by SujenShah

[jira] [Updated] (NUTCH-2123) Seed List REST API returns Text but headers indicate/require JSON

[jira] [Commented] (NUTCH-2128) Refactor configuration end point

[GitHub] nutch pull request: fix for NUTCH-2128 Refactor config endpoint by...

[jira] [Assigned] (NUTCH-2128) Refactor configuration end point

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

Atomic update and optimistic concurrency in Solr

Re: [MASSMAIL]Re: Fetch failed : java.lang.NullPointerException

[jira] [Created] (NUTCH-2131) Problem running nutch(crawl) with selenium

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

[jira] [Commented] (NUTCH-2086) Nutch 1.X Webui

[GitHub] nutch pull request: Fix for NUTCH-2086 Contributed by Sujen Shah

18 matches

Site Navigation

Mail list logo

Footer information